IRF Posted April 4, 2017 Report Share Posted April 4, 2017 I can't download the the files right now, I'll have a look at them later :). Danny, I've removed the files for now, so I can carry out further tweaking, and then re-upload a (hopefully) new and improved version. That'll save you from looking at various iterations, with slight improvements each time - it would be better for you to see the final article, and observe the greatest 'Before vs After' contrast in one hit! jetsetdanny 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted April 4, 2017 Report Share Posted April 4, 2017 (edited) This should do it - with this 'regime' in place, the maximum delay between a graphic byte being drawn, and its associated attribute byte being copied to the screen, should be the length of time it takes to draw 128 graphic bytes (equivalent to four whole pixel-rows). So it should minimise the 'Delayed [or Premature] Attribute Effect' (as well as the 'Jagged Finger Effect'). It's a bit more efficient than my previous attempt so it also restores the three bytes spare for a CALL to the Screen Flash routine at the start. I'll try it out later: (Norman Edited April 4, 2017 by IRF jetsetdanny 1 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted April 4, 2017 Author Report Share Posted April 4, 2017 When the subject of updating the attributes at the same time was mentioned. I jotted down the code below. The +code is the code added to do the attributes at the same time. org 35317 35317 LD HL,05C00H ; +3 LD DE,05800H ; +6 EXX : +7 XOR A ; +8 ld hl,08200H ;3 loop ld e,(hl) ;4 inc l ;5 push hl ;6 ld d,(hl) ;7 ld l,e ;8 ld h,d ;9 res 5,d ;11 ld bc,32 ;14 ldir ;16 pop hl ;17 inc l ;18 inc a ; +9 ;** extra over head 1 and 7 ; +11 ;** extra overhead 2 jr nz loop ; +13 exx ; +14 ld bc,32 ; +17 ldir ; +19 exx ; +20 dec l ; +21 inc l : +22 jr nz,loop ;20 ;---------------------------- ld a,(34271) ;23 ; this code is moved down in memory and 2 ;25 rrca ;26 ld hl,34258 ;29 or (hl) ;30 ld (hl),a ;31 jr 35377 ;33 my main concern was that this loop is executed 128 times per game loop. Which makes any checking done, needs to not introduce too much overhead. The extra overhead needed on (128-16) lines of code. (the extra code lines that are executed but do not result in anything extra being done) is a mere two instructions (indicated by **) The other 16 times through the loop handle copying the atrributes. jetsetdanny and IRF 2 Quote Link to comment Share on other sites More sharing options...
IRF Posted April 4, 2017 Report Share Posted April 4, 2017 (edited) This should do it - with this 'regime' in place, the maximum delay between a graphic byte being drawn, and its associated attribute byte being copied to the screen, should be the length of time it takes to draw 128 graphic bytes (equivalent to four whole pixel-rows). So it should minimise the 'Delayed [or Premature] Attribute Effect' (as well as the 'Jagged Finger Effect'). In contrast, in Matthew's original regime, there were considerable delays involved, which varied according to the position on the screen. Considering two extreme examples: - after the final graphic byte was drawn at the bottom-right corner of the playing area, there was a relatively short delay whilst the 512 attribute bytes were copied (as well as the time taken to enact the Toilet Dash double-speed effect and, if applicable, the 'Screen Flash' routine); - after the first graphic byte was drawn at the top-left corner of the screen, its associated attribute wouldn't get copied until every other graphic byte had been drawn - that's 32*16*8=4096! Edited May 10, 2017 by IRF Quote Link to comment Share on other sites More sharing options...
IRF Posted April 4, 2017 Report Share Posted April 4, 2017 (edited) Norman, thanks for your latest post (which crossed over with one of mine). Looking at my latest code, I believe there are four instructions that you would term 'extra overhead' (#89FB-#89FF in my latest disassembly, in post no 32). Would that slow down the game considerably? (I haven't tried it out yet.) Balanced against that, my understanding of your latest code which copies the attributes at the same time, is that all eight rows of graphic-bytes would be copied to a particular cell-row before its attributes are distributed. That means that there would be a delay caused by up to 256 graphic bytes being copied, before a bitmap in the top pixel-row of its cell-row is united with its associated attribute. Of course, the situation would be much better for the bitmaps in the bottom pixel-row of a cell-row, because only 32 bytes would have to be copied in the intervening period. But my latest effort is an attempt to 'evenly spread' the delays, because last night I came up with a similar solution to yours [EDIT: similar in terms of the sequencing, not the execution], but I noticed that there was still a residual (albeit improved) 'Delayed Attribute Effect' - particularly for (fast-moving) Arrows located in the top pixel-rows of their host cell-rows. EDIT: Would replacing your XOR A (at 35317+7) with a LD A, #04 achieve what I'm after? i.e. Copying the attributes after the four pixel-rows but before the bottom four pixel-rows, for each cell-row in turn? ****** Looking at the length of your code, I believe it pips mine to the post by one byte in terms of efficiency. :) EDIT: Or it would be exactly the same length if we were to set the initial value of A to #04. Edited April 4, 2017 by IRF jetsetdanny 1 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted April 4, 2017 Author Report Share Posted April 4, 2017 (edited) The delay before attribute update, is a swings and roundabout affair. Update before the graphics are updated and then the arrows will be in their old position and have their colours shifted , Update after and the reverse happens. Update in the middle and you might introduce another effect. When I wrote the code I was aware that the original "xor a" could be changed to permit the attribute copying to be set to anywhere needed. You have include the Jr at the bottom in the size of the code- which is missing from yours. The speed of execution will be faster overall for mine, because the attribute calculations are not needed in each attribute copy. The overhead then is exx 4 t-states - copy attribute ld bc,32-ldir exx 4 t-states 8 dec l 4 t states 12 inc l 4 t states 16 which is 4 sets of 4 Tstate operations ... 16 in total yours PUSH HL 10 t-states 10 LD D, #58 7 t states 17 SLA E 8 25 LD L, E 4 29 JR NC, #01 7/12 36 41 INC D 4 40 45 LD H, D 4 44 49 SET 2, H 7 51 55 copy attribute ld bc,32-ldir POP HL 10 61 66 the extra overheads are 61+ Tstates- So mine is faster and smaller. (but this is not a contest) Edited April 4, 2017 by Norman Sword jetsetdanny 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted April 4, 2017 Report Share Posted April 4, 2017 (edited) My replies in red: The delay before attribute update, is a swings and roundabout affair. Update before the graphics are updated and then the arrows will be in their old position and have their colours shifted , Update after and the reverse happens. Update in the middle and you might introduce another effect. By "another effect", are you referring to something specific, or do you mean in an 'unintended consequences' way? When I wrote the code I was aware that the original "xor a" could be changed to permit the attribute copying to be set to anywhere needed. I'll try a starting value of #04 because, going by my recent experience, it should almost entirely eliminate the arrows flickering in a colour than the one intended. Barring any unforseen problems, I think that would be the optimum solution. You have include the Jr at the bottom in the size of the code- which is missing from yours. I omitted the JR because it would no longer be needed if the unused 'Screen Flash' routine and its preceding check of the Screen Flash variable are removed from the Main Loop (with scope for it to be CALLed from elsewhere, with the CALL command being inserted just after the CALL to the item-drawing routine). However, you're right that in my quick comparison, I omitted to deduct two bytes from your total as well (for the same reason). The speed of execution will be faster overall for mine, because the attribute calculations are not needed in each attribute copy. The overhead then is exx 4 t-states - copy attribute ld bc,32-ldir exx 4 t-states 8 dec l 4 t states 12 inc l 4 t states 16 which is 4 sets of 4 Tstate operations ... 16 in total yours PUSH HL 10 t-states 10 LD D, #58 7 t states 17 SLA E 8 25 LD L, E 4 29 JR NC, #01 7/12 36 41 INC D 4 40 45 LD H, D 4 44 49 SET 2, H 7 51 55 copy attribute ld bc,32-ldir POP HL 10 61 66 the extra overheads are 61+ Tstates- I haven't got my head fully around the T-States stuff, but I can see in general terms why yours would be faster. So mine is faster and smaller. (but this is not a contest) Indeed! But any spare bytes in this tight spot in the Main Loop could come in handy for other purposes, so I would probably opt for the most byte-efficient solution (i.e. yours). Edited April 4, 2017 by IRF jetsetdanny 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted April 4, 2017 Report Share Posted April 4, 2017 (edited) Okay, please see the three attached test files. (Ignore the fact that Willy jumps upon start-up, and walks backwards; those are relics from some earlier testing.) The 'Ian' fixed file has my latest iteration of a patch for the 'Delayed Attribute Effect', woven into Norman's original fix for the 'Jagged Finger Effect' as per my disassembly in post no. 32. I've also attached an unfixed file to allow for a Before vs. After contrast. The difference is striking, and the 'Delayed Attribute' is almost entirely eliminated. N.B. I tried to create an additional file, based on Norman's latest code (post no. 33), but that is still work in progress. Following Norman's code exactly (with the XOR A at the start) it worked okay, but the suppression of the 'Delayed Attribute' wasn't as effective as it is in my version (see previous discussion). So I tried to swap the XOR A for a LD A, #04. Alas, that caused the file to crash soon after the game started. So I need to look into what's going wrong (I know that it arose out of my minor departure from Norman's code, but I haven't yet figured out exactly why it doesn't work). EDIT: The screen is actually drawn prior to the crash, and I think that the problem is arising because the drawing loop doesn't know when to come to an end. That in turn is because the copying of attributes to the bottom cell-row doesn't follow on immediately after the final pixel-row has been drawn (i.e. the point at which L, the index to the table at #8200, has wrapped back round to zero). So the end of the outer loop might need to be tweaked a bit... UPDATE: I've also attached a 'Norman' file in which I've implemented Norman's fix from post no. 40 33, only I've modified it slightly so that As with my fix, the attributes are copied for each cell-row, mid-way through the process of copying the graphic bytes for that cell-row (i.e. after the first four pixel-rows have been drawn). Norman's fix required three (UPDATE: five) fewer bytes than mine, but I get the impression that the improvement in terms of the Delayed Attributes is slightly greater with my fix in place (EDIT: or maybe not?) However, there's not much in it, and both fixes offer a vast improvement in comparison with the 'unfixed' file. :) Jagged Finger & Delayed Attributes Ian Fix.z80 Jagged Finger & Delayed Attributes Bugs.z80 Jagged Finger & Delayed Attributes Norman Fix.z80 Edited April 5, 2017 by IRF jetsetdanny and Spider 2 Quote Link to comment Share on other sites More sharing options...
IRF Posted April 4, 2017 Report Share Posted April 4, 2017 (edited) For the record, this is how I tweaked Norman's latest code (one additional byte compared with post no. 33, changes in bold): EDIT: Note that this method leaves six spare bytes very usefully located in the spare loop, just prior to the screen-drawing code (at #89F5-#89FA). That's enough to insert a CALL to the Screen Flash Routine (which would have to be reinstated elsewhere) AND a CALL to the Main Loop Patch Vector. :) org #89FB LD HL, 05C00H LD DE, 05800H EXX LD A, #04 ld HL, 08200H loop ld e,(hl) inc l push hl ld d,(hl) ld l,e ld h,d res 5,d ld bc,32 ldir pop hl inc l JR Z, #0E If L has reached zero, then jump forward to the code which doubles Willy's speed during the Toilet Dash inc a and 7 jr nz loop exx ld bc,32 ldir exx dec L inc L JR loop Jump back to draw the next pixel-row (always necessary at this point, as the attributes are copied midway through the drawing of each cell-row) ;---------------------------- ld a,(34271) ;23 ; this code is moved down in memory and 2 ;25 rrca ;26 ld hl,34258 ;29 or (hl) ;30 ld (hl),a ;31 jr 35377 ;33 Edited April 5, 2017 by IRF jetsetdanny and Spider 2 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted April 5, 2017 Author Report Share Posted April 5, 2017 Spectacular crash when I tried modifying the xor a. This code is shorter and fixes the problem. I have to admit I do not like the additional Jr inserted into the main loop to overcome this small change. (XOR A changed to LD A,4) In loops of this nature, each change in flow is repeated and repeated. So just this one small change is the equivalent of inserting 255 1 byte opcodes. In itself not a visible factor in JSW's speed. but slowdown is accumulative. org #89FB LD HL, 05C00H LD DE, 05800H EXX LD A, #04 ld HL, 08200H loop ld e,(hl) inc l push hl ld d,(hl) ld l,e ld h,d res 5,d ld bc,32 ldir pop hl inc a and 7 jr nz not_attrib exx ld bc,32 ldir exx not_attrib inc L JR nz,loop IRF, jetsetdanny and Spider 3 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.