IRF Posted October 26, 2017 Report Share Posted October 26, 2017 This type of modification has no affect if no value is placed in (ix+4). So as long as the sprite data has (ix+4)=0 on all horizontal sprites, the game will run as normal. In a project I'm working on, some of the guardians briefly wrap around the vertical screen-edge, so setting byte 4 to 00 would accidentally cause a match. However, I think that setting Bit 5 or 6 for Bytes 4 of such a guardian's definition should prevent a match from occurring unintentionally. Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted October 26, 2017 Author Report Share Posted October 26, 2017 Part 3 The 'extending/retracting' platform at the top-left of The Bathroom reminds me of the moving platform in one of the Geoff Mode patches (in 'Willy Takes a Trip's room 10: 'A Quiet Corner to Rest in'), except that in The Bathroom it has 'asymmetrical logic':- After the Geoff Mode moving platform has been moved along by a cell-column in either direction, the room cell which it has just vacated is restored with an Air block;- In Norman's The Bathroom, the cells to the right of the platform are reverted to Air as the platform retracts leftwards, but when the platform is extending rightwards, the cells to the left of the rightmost end of the platform remain as floor/Water cells. A quick explanation of terms Master screen. Where the basic screen, minus sprites etc. is drawn Working screen. The master screen is copied here and the sprites added Real screen. Where the working screen is copied and is visible to the player Moving a one cell platform left or right, takes up very little effort or code. Because the rooms are redrawn from the master Attribute copy, and the master screen copy on each game loop. Any additional graphics etc. written just to the working copy screens will be deleted on each game loop, by the copying from the master to the working screens. Thus to move a single square requires just writing to the working screen a single graphic on each loop. Rather than work on the working copy screens. The platform data was written to the master screens. And as above writes just one graphic either a background (erase) or a platform (write) after any necessary delay. The code was written to have expanding contracting floors and not a single moving block. NOTE the difference in action, it looks the same, but the first method has to write the graphic on each and every game loop. The second method only writes when there is a change. The accumulative addition of unneeded code is a major factor in the speed of game play. For example. Unneeded game loop code. Why is the object count displayed on each game loop? It only changes on object collection Why is the time printed on each game loop? It only changes every 256 game loops Why are the dancing willies drawn when they do not dance? And the biggest problem the game has. Why move the data using LDIR? There are numerous methods that are quicker. --------------------------------------------------------- Side track on LDIR. Each byte moved using LDIR takes 22 T-states. To remove all instances of this slow block movement is very easy. First ignore the stack copy method, too many blocks of data and not flexible enough to slot into existing code. Use the simpler LDI method which only uses around 68 bytes and is very easy to insert into the code. start with the typical layout for block move using LDI e.g. set aside 68 bytes like so. BLOCK32 LDI BLOCK31 LDI BLOCK30 LDI BLOCK29 LDI ETC TILL BLOCK1 LDI DEC A JR NZ,BLOCK32 RET Next go through the original code and remove/change the LDIR code in this manner typical block move LD HL,COPY LD DE,SCREEN LD BC,1024 LDIR Change to LD HL,COPY LD DE,SCREEN LD A,1024/32 CALL BLOCK32 A block fill gets changed from LD HL,SCREEN LD DE,SCREEN+1 LD BC,4095 LD (HL),0 LDIR To LD HL,SCREEN LD DE,SCREEN+1 LD A,4096/32 ---- NOTE THE VALUE IS 4096/32 LD (HL),0 CALL BLOCK31 ---- NOTE THIS CALLS BLOCK31 it is possible to change all the current LDIR's in the game with the above style code. This will increase speed by over 20% The smaller block moves of 6 bytes etc are left. (not worth the effort- no speed improvement) jetsetdanny, IRF and Spider 3 Quote Link to comment Share on other sites More sharing options...
IRF Posted October 26, 2017 Report Share Posted October 26, 2017 In order to conserve bytes, could those 32 consecutive LDI commands be placed within a sub-loop? (With the shadow register A' used for the count.) Or would doing that effectively undo the speed increase that you achieved when you wrote out the LDIR loops? Spider 1 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted October 26, 2017 Author Report Share Posted October 26, 2017 (edited) In order to conserve bytes, could those 32 consecutive LDI commands be placed within a sub-loop? (With the shadow register A' used for the count.) Or would doing that effectively undo the speed increase that you achieved when you wrote out the LDIR loops? The ldir block move of data takes 22-T-states to move a single byte of data. When the LDIR is used to move say 4096 bytes of data it takes 4096*22 T-states (90112) T-states in total. Using a long line of 32 LDI's changes the timing to around 16 T-states per byte. Note this is per byte. There is an overhead of dec "A" every 32 bytes and the JR every 32 bytes, this extra overhead is around 16-t-states. But for every 32 bytes you have saved the difference between LDIR (22) and LDI (16) which is 6 T-states times 32. = 192 t-states The saving per 32 bytes is 192 T-states (minus) the loop over head of 16 T-states. so a saving of 176 T-states for every move of 32 bytes. Going back to the initial 4096 bytes moved with LDIR taking 90112 T-states. This is replaced by a repeating loop over the Block LDI code. In this case it will loop 128 times giving an overall saving of 128*176 T-states. or =22528 T-states . The call and the ret to the routine are insignificant compared to these figures. block move 32 bytes using LDIR =32*22 T-states=704 T-states block move using a long line of LDI= 32*16+16 =528 T-states LDIR of 4096 bytes=90112 T-states LDI of 4096 byes =128*528 T-states =67584 T-states a saving of 22528 T-states. Enough time saving in the one loop to execute around another two thousand op-codes Since every game loop:- It copies the master Att screen to the working Att screen it copies the working Att screen to the real Att_screen it copies the Master screen to the working screen It copies the working screen to the real screen. The game moves every loop an enormous amount of data. The jagged finger code for screen copy that I use, incorporates the block move into its code. Getting back to the original statement can I use some sort of sub loop? Short answer is No . We are dealing with tiny timing differences that accumulate into big differences due to the number of times they are executed 32LDI's in line was a compromise between speed and size. It also happens to be the amount of data in one raster line, so I settled on that figure just for that reason. It is not unknown for games to use a vastly larger piece of code to try and improve the speed even more. But then we start to move into the realms of using Stack copy and the associated amount of memory that uses. LDI is simple, easy to slot in, and does a major change in speed. ( since the figures listed above do not tally, there are mistakes in the arithmetic. This should not distract from the overall message conveyed) (edited yet again to get the figures on the arithmetic to match) Edited October 27, 2017 by Norman Sword IRF, jetsetdanny and Spider 3 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted October 26, 2017 Author Report Share Posted October 26, 2017 (edited) This is a deleted copy of the above. Edited October 26, 2017 by Norman Sword Quote Link to comment Share on other sites More sharing options...
IRF Posted October 26, 2017 Report Share Posted October 26, 2017 Getting back to the original statement can I use some sort of sub loop? Short answer is No . I thought that would be the case. Thanks for the considered response though. Spider 1 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted October 26, 2017 Author Report Share Posted October 26, 2017 (edited) Text has been edited to correct lots of errors --- see post following Which highlighted a problem, which has been edited twice. Also edited to include a missing opcode ------------------------------------------------------------------------------------ Space to spare. I read recently in a post that you were using the rope table space for data/code.It is an very easy matter to delete most of the data and use the space saved to implement the LDI table as mentioned abovereplace the data table with this data. ROPE_TABLEx8300 DEFB $60,$60,$60,$60,$60,$60,$60,$60x8308 DEFB $60,$60,$60,$60,$60,$60,$60,$60x8310 DEFB $60,$60,$60,$60,$60,$60,$60,$60x8318 DEFB $60,$60,$60,$60,$60,$60,$60,$60x8320 DEFB $61,$61,$61,$61,$61,$61,$61,$61x8328 DEFB $61,$61,$61,$61,$62,$62,$62,$62x8330 DEFB $42,$62,$62,$42,$62,$42,$62,$42x8338 DEFB $62,$42,$42,$42,$62,$42,$42,$42x8340 DEFB $42,$42,$41,$42,$42,$41,$41,$42x8348 DEFB $41,$41,$42,$42,$43,$42,$43,$42x8350 DEFB $43,$43,$43,$43,$43,$43 X8355 DEFB $21,$21 The continuous data space from $8358 up to $8400 becomes free to use. Change the code as listed below to use this changed data table (assuming it will fit between $9316 and $9327 If it wont fit there is plenty of space created by deleting the second half of the rope table X9316 ADD A,(IX+rope1) ;$01 From here ;remove the rope swing direction bit (bit 7) res 7,a ; it helps to include the full modifications ld l,a ld H,High ROPE_TABLE ; $83 ; extract the data, high nibble is Y-shift ld a,(hl) ;grab data hl=pointer to rope data rrca rrca rrca rrca and $0e ;$0e=14=00001110B ; ;instant crash is this value is odd (easier to just remove possibility) ; add Y-shift onto the Y-table offset add a,iyl ld iyl,a ;extract the data, low nibble is X-shift ld a,(hl) and $0f ;$0f=15=0000111B ;To here X9327 JR Z,L9350 ;Jump if so X9329 LD B,A ;B is the count for rotations of the drawing byte (the rope drawing data bit) ;-------------------------------- from a casual look it would appear the extra code Res 7,a make this mod bigger than the available space. If it MUST fit and you can be bothered altering the data. Then the rope data can be changed to use 2 bits for the x offset and 2 bits for the y offset. eg. 00000011b for xoffset and 00001100b for the y offset. This mod changes the code to remove probably enough opcode to make it fit the y-offset is always even and so has only values of 6 and 4 . two bits allow for values 0,2,4,6 x is extracted with and 3 (not and $0f) The y is extracted with and 1100b rrca ; note only one rrca needed so 3 bytes shorter For me the extra effort was not worth while. I had no intention of only modyifing the original code and restricting my self to the available space Edited October 27, 2017 by Norman Sword Spider and IRF 2 Quote Link to comment Share on other sites More sharing options...
IRF Posted October 27, 2017 Report Share Posted October 27, 2017 (edited) Ah, I see what you've done there - very cunning! Although I think a few of the lines of data were misaligned when you merged them - should it be this?: X8328 DEFB $61,$61,$61,$61,$62,$62,$62,$62 X8330 DEFB $42,$62,$62,$42,$62,$42,$62,$42 X8338 DEFB $62,$42,$42,$42,$62,$42,$42,$42 **** (And just to clarify - the AND commands in your post above have operands expressed in that antiquated numbering system that I believe some luddites still stick to, known as 'decimal'? ;) In hexadecimal, we're talking about: AND #0E / AND #0F i.e. pick out the lower nybble in one instance, and Bits 1-3 in the other.) Edited October 27, 2017 by IRF Spider 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted October 27, 2017 Report Share Posted October 27, 2017 I think the compressed Rope Animation Table, in full, should be as follows: x8300 DEFB $60,$60,$60,$60,$60,$60,$60,$60 x8308 DEFB $60,$60,$60,$60,$60,$60,$60,$60 x8310 DEFB $60,$60,$60,$60,$60,$60,$60,$60 x8318 DEFB $60,$60,$60,$60,$60,$60,$60,$60 x8320 DEFB $61,$61,$61,$61,$61,$61,$61,$61 x8328 DEFB $61,$61,$61,$61,$62,$62,$62,$62 x8330 DEFB $42,$62,$62,$42,$62,$42,$62,$42 x8338 DEFB $62,$42,$42,$42,$62,$42,$42,$42 x8340 DEFB $42,$42,$41,$42,$42,$41,$41,$42 x8348 DEFB $41,$41,$42,$42,$43,$42,$43,$42 x8350 DEFB $43,$43,$43,$43,$43,$43 Note that the first 32 (#20) values should all be '$60'. This corresponds to the situation where the rope is hanging straight down (the Animation Frame Index = 00), so all 32 (#20) segments of the rope have zero horizontal displacement (i.e. the lower nybble of entries x8300 to x831F are all zero). jetsetdanny 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted October 27, 2017 Report Share Posted October 27, 2017 (edited) Actually, prior to this command: X9319 LD L,A wouldn't you now need to have an AND #7F operation? Otherwise, when the rope is left-of-centre, the Animation Frame Index (which is added to the Segment Counter to point at the appropriate entries in the rope table) would have values greater than #80. The original code accounts for this by using SET 7, L and RES 7, L commands to ensure that it is always accessing the correct half of the table, but with the two halves merged, there is a need to force the routine to always look up the lower half of the table. Edited October 27, 2017 by IRF Spider 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.