Jump to content


Photo

[File] JSW jagged finger effect (demo)


  • Please log in to reply
49 replies to this topic

#31 IRF

IRF

    Advanced Member

  • Contributor
  • 4,276 posts

Posted 04 April 2017 - 09:43 AM

I can't download the the files right now, I'll have a look at them later :).

 

Danny, I've removed the files for now, so I can carry out further tweaking, and then re-upload a (hopefully) new and improved version.  That'll save you from looking at various iterations, with slight improvements each time - it would be better for you to see the final article, and observe the greatest 'Before vs After' contrast in one hit!



#32 IRF

IRF

    Advanced Member

  • Contributor
  • 4,276 posts

Posted 04 April 2017 - 10:49 AM

This should do it - with this 'regime' in place, the maximum delay between a graphic byte being drawn, and its associated attribute byte being copied to the screen, should be the length of time it takes to draw 128 graphic bytes (equivalent to four whole pixel-rows).  So it should minimise the 'Delayed [or Premature] Attribute Effect' (as well as the 'Jagged Finger Effect').

 

It's a bit more efficient than my previous attempt so it also restores the three bytes spare for a CALL to the Screen Flash routine at the start.

 

I'll try it out later:

 

(Norman’s code in bold)

 

89F5-7 NOP                 Three spare bytes

 

89F8   LD HL, #8200    Point HL at the Screen Buffer Address Lookup Table

 

89FB   LD A, L             Check if we have drawn the first four pixel-rows in the current cell-row

89FC   SUB A, #08

89FE   LD E, A

89FF   AND #0F

8A01   JR NZ, #8A15  If not, jump to consider the next pixel-row; otherwise, proceed to copy the attributes for this cell-row

8A03   PUSH HL

8A04   LD D, #58

8A06   SLA E               This sets the Carry Flag if (and only if) we are now considering a character-row in the bottom half of the playing area

8A08   LD L, E             E and L both start off pointing at the left-hand cell in the current row

8A09   JR NC, #01      This jump only occurs if we are in the bottom half of the playing area

8A0B   INC D               Point D at the bottom half of the playing area

8A0C   LD H, D            DE now points at the left-hand end of the appropriate character-row in the screen's attribute file

8A0D   SET 2, H          HL now points at the left-hand end of the appropriate character-row in the attribute buffer

8A0F   LD BC, #0020  There are 32 bytes to copy

8A12   LDIR                 Copy the attributes along the current character-row

8A14   POP HL

 

8A15   LD E, (HL)        Start of LOOP, each iteration of which copies 32 graphic bytes across a single pixel-row (raster line) of the screen
8A16   INC L
8A17   PUSH HL
8A18   LD D, (HL)
8A19   LD L, E
8A1A   LD H, D
8A1B   RES 5, D
8A1D   LD BC, #0020
8A20   LDIR
8A22   POP HL
8A23   INC L                Have we finished copying the whole of the playing area (i.e. the graphic bytes and attributes for all sixteen character-rows)
8A24   JR NZ, #89FB  If not, then jump back to consider the next pixel-row

 

That would then be followed by the original code which makes Willy run at double-speed during the Toilet Dash (which fortuitously fits exactly into the space where the original attribute-copying code was located):

 

8A26   LD A, (#85DF)
8A29   AND #02
8A2B   RRCA
8A2C   LD HL, #85D2
8A2F   OR (HL)
8A30   LD A, (HL)


Edited by IRF, 04 April 2017 - 11:17 AM.


#33 Norman Sword

Norman Sword

    Advanced Member

  • Member
  • PipPipPip
  • 237 posts

Posted 04 April 2017 - 11:23 AM

When the subject of updating the attributes at the same time was mentioned. I jotted down the code below. 
 
The +code is the code added to do the attributes at the same time.
 
org 35317
 
35317              LD HL,05C00H ;  +3
                        LD DE,05800H ;  +6
                        EXX                  :  +7
                        XOR A              ;  +8
        ld hl,08200H                    ;3
loop
        ld      e,(hl)                       ;4
        inc    l                               ;5
        push  hl                            ;6
        ld      d,(hl)                       ;7
        ld      l,e                           ;8
        ld      h,d                          ;9
        res    5,d                         ;11
        ld      bc,32                     ;14
        ldir                                  ;16
        pop     hl                         ;17
        inc     l                            ;18
                        inc     a           ;  +9               ;** extra over head 1
                        and     7         ;  +11              ;** extra overhead 2
                        jr       nz loop  ;  +13
                        exx                 ;  +14
                        ld       bc,32   ;  +17
                        ldir                 ;  +19
                        exx                 ;  +20
                        dec     l           ;  +21
                        inc     l            :  +22
        jr      nz,loop                   ;20
;----------------------------
        ld      a,(34271)             ;23 ; this code is moved down in memory
        and    2                         ;25
        rrca                               ;26
        ld      hl,34258               ;29
        or      (hl)                       ;30
        ld      (hl),a                     ;31
        jr      35377                    ;33
 
 
my main concern was that this loop is executed 128 times per game loop.
 
Which makes any checking done, needs to not introduce too much overhead.
 
The extra overhead needed on (128-16) lines of code. (the extra code lines that are executed but do not result in anything extra being done)
is a mere two instructions (indicated by **)
 
The other 16 times through the loop handle copying the atrributes.


#34 IRF

IRF

    Advanced Member

  • Contributor
  • 4,276 posts

Posted 04 April 2017 - 11:25 AM

This should do it - with this 'regime' in place, the maximum delay between a graphic byte being drawn, and its associated attribute byte being copied to the screen, should be the length of time it takes to draw 128 graphic bytes (equivalent to four whole pixel-rows).  So it should minimise the 'Delayed [or Premature] Attribute Effect' (as well as the 'Jagged Finger Effect').

 

In contrast, in Matthew's original regime, there were considerable delays involved, which varied according to the position on the screen.  Considering two extreme examples:

 

- after the final graphic byte was drawn at the bottom-right corner of the playing area, there was a relatively short delay whilst the 512 attribute bytes were copied (as well as the time taken to enact the Toilet Dash double-speed effect and, if applicable, the 'Screen Flash' routine);

 

- after the first graphic byte was drawn at the top-left corner of the screen, its associated attribute wouldn't get copied until every other graphic byte had been drawn - that's 32*16*8=4096!


Edited by IRF, 10 May 2017 - 11:48 AM.


#35 IRF

IRF

    Advanced Member

  • Contributor
  • 4,276 posts

Posted 04 April 2017 - 11:44 AM

Norman, thanks for your latest post (which crossed over with one of mine).

 

Looking at my latest code, I believe there are four instructions that you would term 'extra overhead' (#89FB-#89FF in my latest disassembly, in post no 32).  Would that slow down the game considerably?  (I haven't tried it out yet.)

 

Balanced against that, my understanding of your latest code which copies the attributes at the same time, is that all eight rows of graphic-bytes would be copied to a particular cell-row before its attributes are distributed.  That means that there would be a delay caused by up to 256 graphic bytes being copied, before a bitmap in the top pixel-row of its cell-row is united with its associated attribute.

 

Of course, the situation would be much better for the bitmaps in the bottom pixel-row of a cell-row, because only 32 bytes would have to be copied in the intervening period.  But my latest effort is an attempt to 'evenly spread' the delays, because last night I came up with a similar solution to yours [EDIT: similar in terms of the sequencing, not the execution], but I noticed that there was still a residual (albeit improved) 'Delayed Attribute Effect' - particularly for (fast-moving) Arrows located in the top pixel-rows of their host cell-rows.

 

EDIT: Would replacing your XOR A (at 35317+7) with a LD A, #04 achieve what I'm after?  i.e. Copying the attributes after the four pixel-rows but before the bottom four pixel-rows, for each cell-row in turn?

 

******

 

Looking at the length of your code, I believe it pips mine to the post by one byte in terms of efficiency. :)  EDIT: Or it would be exactly the same length if we were to set the initial value of A to #04.


Edited by IRF, 04 April 2017 - 12:38 PM.


#36 Norman Sword

Norman Sword

    Advanced Member

  • Member
  • PipPipPip
  • 237 posts

Posted 04 April 2017 - 12:59 PM

The delay before attribute update, is a swings and roundabout affair.  Update before the graphics are updated and then the arrows will be in their old position and have their colours shifted , Update after and the reverse happens.  Update in the middle and you might introduce another effect.

 

When I wrote the code I was aware that the original "xor a" could be changed to permit the attribute copying to be set to anywhere needed. 

 

You have include the Jr at the bottom in the size of the code- which is missing from yours.   

 

The speed of execution will be faster overall for mine, because the attribute calculations are not needed in each attribute copy. The overhead then is

 

exx                   4 t-states

- copy attribute                        ld bc,32-ldir

exx                   4 t-states   8

dec l                 4 t states  12

inc l                  4 t states  16

 

which is 4 sets of 4 Tstate operations ... 16 in total

 

yours  

 

 PUSH HL        10 t-states       10

 LD D, #58        7 t states        17
 SLA E              8                     25

 LD L, E             4                    29
 JR NC, #01       7/12               36   41
 INC D                4                    40   45
 LD H, D             4                    44   49
 SET 2, H           7                    51   55

 

copy attribute                              ld bc,32-ldir  

                 

 POP HL             10                  61   66

 

the extra overheads are 61+ Tstates- 

 

So mine is faster and smaller. (but this is not a contest)


Edited by Norman Sword, 04 April 2017 - 01:02 PM.


#37 IRF

IRF

    Advanced Member

  • Contributor
  • 4,276 posts

Posted 04 April 2017 - 01:19 PM

My replies in red:

 

The delay before attribute update, is a swings and roundabout affair.  Update before the graphics are updated and then the arrows will be in their old position and have their colours shifted , Update after and the reverse happens.  Update in the middle and you might introduce another effect.

 

By "another effect", are you referring to something specific, or do you mean in an 'unintended consequences' way?

 

When I wrote the code I was aware that the original "xor a" could be changed to permit the attribute copying to be set to anywhere needed. 

 

I'll try a starting value of #04 because, going by my recent experience, it should almost entirely eliminate the arrows flickering in a colour than the one intended.  Barring any unforseen problems, I think that would be the optimum solution.

 

You have include the Jr at the bottom in the size of the code- which is missing from yours.   

 

I omitted the JR because it would no longer be needed if the unused 'Screen Flash' routine and its preceding check of the Screen Flash variable are removed from the Main Loop (with scope for it to be CALLed from elsewhere, with the CALL command being inserted just after the CALL to the item-drawing routine).

 

However, you're right that in my quick comparison, I omitted to deduct two bytes from your total as well (for the same reason).

 

The speed of execution will be faster overall for mine, because the attribute calculations are not needed in each attribute copy. The overhead then is

 

exx                   4 t-states

- copy attribute                        ld bc,32-ldir

exx                   4 t-states   8

dec l                 4 t states  12

inc l                  4 t states  16

 

which is 4 sets of 4 Tstate operations ... 16 in total

 

yours  

 

 PUSH HL        10 t-states       10

 LD D, #58        7 t states        17
 SLA E              8                     25

 LD L, E             4                    29
 JR NC, #01       7/12               36   41
 INC D                4                    40   45
 LD H, D             4                    44   49
 SET 2, H           7                    51   55

 

copy attribute                              ld bc,32-ldir  

                 

 POP HL             10                  61   66

 

the extra overheads are 61+ Tstates- 

 

I haven't got my head fully around the T-States stuff, but I can see in general terms why yours would be faster.

 

So mine is faster and smaller. (but this is not a contest)

 

Indeed!  But any spare bytes in this tight spot in the Main Loop could come in handy for other purposes, so I would probably opt for the most byte-efficient solution (i.e. yours).


Edited by IRF, 04 April 2017 - 01:57 PM.


#38 IRF

IRF

    Advanced Member

  • Contributor
  • 4,276 posts

Posted 04 April 2017 - 10:43 PM

Okay, please see the three attached test files.  (Ignore the fact that Willy jumps upon start-up, and walks backwards; those are relics from some earlier testing.)

 

The 'Ian' fixed file has my latest iteration of a patch for the 'Delayed Attribute Effect', woven into Norman's original fix for the 'Jagged Finger Effect' as per my disassembly in post no. 32.  I've also attached an unfixed file to allow for a Before vs. After contrast.  The difference is striking, and the 'Delayed Attribute' is almost entirely eliminated.

 

N.B. I tried to create an additional file, based on Norman's latest code (post no. 33), but that is still work in progress.  Following Norman's code exactly (with the XOR A at the start) it worked okay, but the suppression of the 'Delayed Attribute' wasn't as effective as it is in my version (see previous discussion).  So I tried to swap the XOR A for a LD A, #04.  Alas, that caused the file to crash soon after the game started.  So I need to look into what's going wrong (I know that it arose out of my minor departure from Norman's code, but I haven't yet figured out exactly why it doesn't work).

 

EDIT: The screen is actually drawn prior to the crash, and I think that the problem is arising because the drawing loop doesn't know when to come to an end.  That in turn is because the copying of attributes to the bottom cell-row doesn't follow on immediately after the final pixel-row has been drawn (i.e. the point at which L, the index to the table at #8200, has wrapped back round to zero).  So the end of the outer loop might need to be tweaked a bit...

 

UPDATE: I've also attached a 'Norman' file in which I've implemented Norman's fix from post no. 40 33, only I've modified it slightly so that  As with my fix, the attributes are copied for each cell-row, mid-way through the process of copying the graphic bytes for that cell-row (i.e. after the first four pixel-rows have been drawn).

 

Norman's fix required three (UPDATE: five) fewer bytes than mine, but I get the impression that the improvement in terms of the Delayed Attributes is slightly greater with my fix in place (EDIT: or maybe not?)  However, there's not much in it, and both fixes offer a vast improvement in comparison with the 'unfixed' file.  :)

Attached Files


Edited by IRF, 05 April 2017 - 04:02 PM.


#39 IRF

IRF

    Advanced Member

  • Contributor
  • 4,276 posts

Posted 04 April 2017 - 11:19 PM

For the record, this is how I tweaked Norman's latest code (one additional byte compared with post no. 33, changes in bold):

 

EDIT: Note that this method leaves six spare bytes very usefully located in the spare loop, just prior to the screen-drawing code (at #89F5-#89FA).  That's enough to insert a CALL to the Screen Flash Routine (which would have to be reinstated elsewhere) AND a CALL to the Main Loop Patch Vector.  :)

 

org #89FB
 
              LD HL, 05C00H 
              LD DE, 05800H 
              EXX                  
              LD A, #04         
       
        ld HL, 08200H                    
 
loop
        ld      e,(hl)                   
        inc    l                           
        push  hl                        
        ld      d,(hl)                    
        ld      l,e                         
        ld      h,d                        
        res    5,d                        
        ld      bc,32                    
        ldir                                 
        pop     hl                        
        inc     l                            
        JR Z, #0E   If L has reached zero, then jump forward to the code which doubles Willy's speed during the Toilet Dash
 
                        inc     a          
                        and     7         
                        jr       nz loop 
                        exx                
                        ld       bc,32   
                        ldir                 
                        exx                
                        dec     L           
                        inc      L          
        JR      loop                   Jump back to draw the next pixel-row (always necessary at this point, as the attributes are copied                                                                     midway through the drawing of each cell-row)
 
;----------------------------
        ld      a,(34271)             ;23 ; this code is moved down in memory
        and    2                         ;25
        rrca                               ;26
        ld      hl,34258               ;29
        or      (hl)                       ;30
        ld      (hl),a                     ;31
        jr      35377                    ;33

Edited by IRF, 05 April 2017 - 08:49 AM.


#40 Norman Sword

Norman Sword

    Advanced Member

  • Member
  • PipPipPip
  • 237 posts

Posted 05 April 2017 - 02:16 PM

Spectacular crash when I tried modifying the xor a. This code is shorter and fixes the problem. I have to admit I do not like the additional Jr inserted into the main loop to overcome this small change. (XOR A changed to LD A,4)  In loops of this nature, each change in flow is repeated and repeated. So just this one small change is the equivalent of inserting 255 1 byte opcodes. In itself not a visible factor in JSW's  speed. but slowdown is accumulative.

 

 

org #89FB

              LD HL, 05C00H
              LD DE, 05800H
              EXX                 
              LD A, #04        
      
        ld HL, 08200H                   

loop
        ld      e,(hl)                  
        inc    l                          
        push  hl                       
        ld      d,(hl)                   
        ld      l,e                        
        ld      h,d                       
        res    5,d                       
        ld      bc,32                   
        ldir                                
        pop     hl                       
                        inc     a         
                        and     7        
                        jr       nz not_attrib
                        exx               
                        ld       bc,32  
                        ldir                
                        exx               
      not_attrib          
                        inc      L         
        JR     nz,loop    






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users