Norman Sword Posted July 17, 2019 Author Report Share Posted July 17, 2019 (edited) Yes the BLOCKX_MOVE32 will clear the Parity Even Flag (PE flag) if and only if the last LDI counts the register pair "BC" down to zero.Which is why the line that loads BC with 128 is specifically indicating that "BC" must be set to a multiple of 32. If this was changed to a value which is not divisible by 32. Then the attributes would never be drawn, because the ( "JP PE,n_raster" ) would always branch to ( "n_raster" ) Edited July 17, 2019 by Norman Sword IRF 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted July 17, 2019 Report Share Posted July 17, 2019 (edited) N.B. My method may complicate things in cases where a chunk of code is being overwritten with a single value - where the first byte is overwritten directly and then the number of bytes to which the same values is to be copied in a loop is minus one. e.g. for attribute update with a single value (such as for a screen flash effect), use #01FF instead of #0200 to define the size of the loop.  Norman's code deals with such cases by CALLing a late entry point into his subroutine, coinciding with the second LDI command in the subroutine. (But the JR NZ at the end of the subroutine jumps back to the first LDI in the subroutine.)  In such cases, I think my method would unavoidably end up 'overshooting', and overwriting one more byte than it should. (But in the aforementioned project, I didn't actually use an LDI method for 'block fill' purposes, only for 'block move'.)  On reflection, if I did want to use the same subroutine in the context of overwriting a block of code with a single value (e.g. to implement the Screen Flash effect), then it could be done like this (at the cost of five more bytes):  ; 16 character rows to colour, so use A=#0F (15 in decimal) for the later loop.  LD HL, xxxx LD DE. xxxx +1 ; No need to define BC; it's not used now LD (HL), colour_value          ; Either copied from A, or a fixed value specified here  LD A, #01 CALL subroutine_late_entry LD A, #0F loop: CALL subroutine DEC A JR NZ, loop  subroutine: LDI subroutine_late_entry: rept 31                                     ; I presume "rept y" means 'repeat the following code (up to the end marker) y times'?    LDI   endm                                    ; I presume this means 'end marker'?    RET Edited July 17, 2019 by IRF Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted July 17, 2019 Author Report Share Posted July 17, 2019 (edited) Standard assembler directives:-One of the biggest helps is the macro language that you can build into pieces of standard code that are often used.  The standard macro is as such:  silly:   Macro    param1,param2,param3      ld hl,param1     ld de,param2     ld bc,param3     ldir     endm.  The macro here is defined by the label silly. Every time the word silly is seen in the source code it will replace it with all the code between the MACRO and endm. The word endm is shorthand for "end macro"  So How do I use silly in a piece of code  ( don't forget this example is called silly, because it is a silly example)  I simply write in to the assemble code       silly   ATT0,ATT1,32  when the code is assembled the above will be substituted with      ld hl,ATT0     ld de,ATT1     ld bc,32     LDIR  which is the macro expanded and each of the labels re placed with the parameter passed to the macro.e.g    ld hl,ATT0    ; here param1 is replaced with ATT0      ld de,ATT1    ; here param2 is replaced with ATT1      ld bc,32      ; here param3 is replaced with 32      LDIR  Macros are very helpful to expand code out that is repetitive.  another form of macros. is the rept directive. (rept = repeat). so getting back to to the original querry      rept 32            ;repeat 32 times     ldi     endm              ; end macro  means repeat the line of code between the rept directive and then endm directive 32 times   we can create big blocks of code by repeating and nesting : (these examples are just examples not taken from any code)   so lets look atmove     macro    count           rept  count           ld a,(hl)          ld (de),a          inc hl          inc d          endm         endm we can in the assembler now write       move 8  and this will generate the inline code           ld a,(hl)                 ld (de),a                 inc hl                 inc d                   ld a,(hl)                  ld (de),a                 inc hl                 inc d           ld a,(hl)                  ld (de),a                 inc hl                 inc d           ld a,(hl)                  ld (de),a                 inc hl                 inc d           ld a,(hl)                  ld (de),a                 inc hl                 inc d           ld a,(hl)                  ld (de),a                 inc hl                 inc d                  ld a,(hl)                  ld (de),a                 inc hl                 inc d           ld a,(hl)                  ld (de),a                 inc hl                 inc d  ; which is the quickest way of doing this operation possible. we have no loop counter.  The above probably seems to be pointless, but consider a piece of code I have mentioned several times and written out an example of once. which is the stack copy. To speed up a stack copy we set up a nest macro similar to the example above. When the macro is expanded we end up with a big block of inline code. (the expansion can end up with 500 or more lines of code)  we can also pass counters that can be used and acted upon.Macro's are also literally expanded and can cause no end of problems when the expansion does not seem to do what is wanted.;-------------------------- TOO MUCH INFORMATION -------------------------------------   short answer  REPT is short for REPeaT  ENDM  is short for END Macro Edited July 17, 2019 by Norman Sword IRF 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted July 17, 2019 Report Share Posted July 17, 2019 (edited) Thanks for that explanation, Norman!  Going back to your version of the LDI method, did you manage to get it to work in conjunction with the Jagged Finger fix (with the rows of attributes being updated alongside the associated pixel-rows)?  Because if your subroutine is in the format (as you explained previously):  BLOCK_MOVE32:   rept 32    LDI   endm    DEC A   JR NZ, BLOCK_MOVE32   RET  ...then the DEC A would affect the Overflow Flag, and therefore the JP PE,n_raster in the main routine wouldn't be responding to BC having counted down to zero, but to the decrement of A to zero (and thus the Overflow Flag would always have the same status when the code RETurns back to the main routine).  How do you get around that? Edited July 17, 2019 by IRF Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted July 17, 2019 Author Report Share Posted July 17, 2019 (edited) Each example I write is self contained unless otherwise stated.The last example I wrote uses no check on the block of 32 ldi's and just returns. The above code "as used in post 54" is an old example which does use "a" as a counter.Give me five minutes and I will test the example as I wrote it and get back to you. But I think it will work as written   ;-----------------------------------------------------------------------------------------------------------------------   Works exactly as I said it would. Edited July 17, 2019 by Norman Sword IRF 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted July 17, 2019 Report Share Posted July 17, 2019 (edited) I was just thinking in terms of the code which copies the 'master copies' to the 'working copies' [for pixels and for attributes], which would run faster* as you originally wrote it, with the countdown of raster lines embedded within the subroutine containing the chain of LDIs.  (*For the reason which I outlined earlier today - despite my initial thoughts last night - that you only have to CALL the subroutine once for the pixels and once for the attributes.  Whereas putting the loop counter commands in the Main Loop, as I suggested in post 54 44, means that you have to have execute multiple CALLs and RETs for copying each raster line in turn.)  I assume that your 32-LDI subroutine is available as common code for both purposes [master buffers -> working buffers, and then working buffers -> physical screen]? It would seem wasteful to have 32 x LDI / RET in one subroutine, and a separate subroutine which goes 32 x LDI / DEC A / JR NZ / RET. Edited July 17, 2019 by IRF Quote Link to comment Share on other sites More sharing options...
IRF Posted July 17, 2019 Report Share Posted July 17, 2019 (edited) as I suggested in post 54 44  Sorry if the above typo caused any confusion! :wacko:  BTW, thanks for checking your code in post 49 works okay. :) Edited July 17, 2019 by IRF Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted July 17, 2019 Author Report Share Posted July 17, 2019 (edited) The original with the "a" register passing the amount of copy was in response to the limited space available in the source code.  ld hl,source ld de,destin ld bc,count ldir  being 11 bytes in size.. modified to be  ld hl,source ld de,destin ld a,count/32 call BLOCK_MOVE32    ; which uses "a" as a counter  which is also 11 bytes in size and can be easily fitted into the same size space.   IGNORING the raster copy routine for now. Lets look at the format of the original BLOCK_MOVE32 which was   BLOCK_MOVE32:     ldi BLOCK_MOVE31    ldi    rept 30    ldi   endm    dec a    jr nz,BLOCK_MOVE32   ret  This has two labels BLOCK_MOVE32 and BLOCK_MOVE31. It could be written out to have every label from BLOCK_MOVE32 down to BLOCK_MOVE01. It doesn't because the other labels would be used infrequently and I am lazy. I can not be bothered writing them all out.Problems using the above routine to move odd amounts........  The bulk of block moves in the game are multiples of 32 and a call to BLOCK_MOVE32 does the job. However we also have cases where we copy a block in a block copy sequence such as.  : this code will clear the screen attributes to black on black  ld hl,ATT0 ld de,ATT0+1 ld bc,$2ff ld (hl),0 ldir  this might seem to not be of the same format. we are copying here $2ff of data which is not a multiple of 32. But look again at the amount. I can re write $2ff in a different way, such as $300-1. Clearly the same value and clearly this is very similar to the format being used in the standard block move. so lets adapt that code  ld hl,ATT0 ld de,ATT0+1 ld (hl),0 ld a,$300/32           ; MOVE $300 bytes call BLOCK_MOVE31    ;MOVE $300 bytes -1   e.g. move $2ff bytes <<<<<<< NOTE BLOCK_MOVE31 this has moved $2ff bytes which is what we wanted.... In jsw the vast majority of block moves want to move either a multiple of 32 bytes or one short of a multiple of 32 bytes. Which is why I only expanded BLOCK_MOVE32 and BLOCK_MOVE31 out in the routine to move the data.  Going into jsw we have ... And this is just a quick grab from the source code. For all the big block moves  LD HL,CHAR0 ;L869F ;$4000  LD DE,CHAR0+1 ; 86A2 ;$4001LD BC,$1AFF ; 86A5 ;              ld a,$1b00/32LD (HL),$00 ; 86A8 ;LDIR  ; 86AA ;                    call BLOCK_MOVE31  LD HL,code_att  ; 86E8 ;L9B80LD DE,ATT8 ; 86EB ;$5900 LD BC,$0080  ; 86EE ;              ld a,$80/32LDIR  ; 86F1 ;                    call BLOCK_MOVE32  LD HL,CHAR0 ;L8813 ;$4000  LD DE,CHAR0+1 ; 8816 ;$4001LD BC,$17FF ; 8819 ;              ld a,$1800/32LD (HL),$00 ; 881C ;LDIR                           call BLOCK_MOVE31  LD HL,logo_att  ; 8820 ;L9800 LD BC,$0300 ; 8823 ;              ld a,$300/32LDIR  ; 8826 ;                    call BLOCK_MOVE32;recolour line 19  LD HL,ATT19 ; 8828 ;$5A60  LD DE,ATT19+1 ; 882B ;$5A61LD BC,$001F ; 882E ;             ld a,$20/32LD (HL),$46 ; 8831 ;LDIR                          call BLOCK_MOVE31  LD HL,ATT19 ; 88B8 ;$5A60  LD DE,ATT19+1 ; 88BB ;$5A61LD BC,$001F ; 88BE ;             ld a,$20/32LD (HL),$4F ; 88C1 ;LDIR  ; 88C3 ;                   call BLOCK_MOVE31  LD HL,bottom_att ; 8907 ;L9A00  LD DE,ATT16 ; 890A ;$5A00LD BC,$0100 ; 890D ;             ld a,$100LDIR                          call BLOCK_MOVE32  LD DE,room_layout ; 891A ;L8000 LD BC,$0100 ; 891D ;              ld a,$100/32LDIR                          call BLOCK_MOVE32  LD HL,CHAR16 ; 8958 ;$5000 LD DE,CHAR16+1  ; 895B ;$5001LD BC,$07FF ; 895E ;              ld a,$800LD (HL),$00 ; 8961 ; LDIR                           call BLOCK_MOVE31  LD HL,att_master ; 89B0 ;$5E00  LD DE,att_work  ; 89B3 ;$5C00 LD BC,$0200 ; 89B6 ;              ld a,$200/32LDIR                           call BLOCK_MOVE32  LD HL,char_master ; 89BB ;$7000  LD DE,char_work ; 89BE ;$6000LD BC,$1000 ; 89C1 ;              ld a,$1000/32LDIR                          call BLOCK_MOVE32  LD HL,char_work ;L89F5 ;6000 LD DE,CHAR0 ; 89F8 ;$4000LD BC,$1000 ; 89FB ;              ld a,$1000/32  LDIR  ; 89FE ;                   call BLOCK_MOVE32  LD HL,att_work  ; 8A1A ;$5C00  LD DE,att_work+1 ; 8A1D ;$5C01LD BC,$01FF ; 8A20 ;             ld a,$200/32LD (HL),A ; 8A23 ;               <<<<<< a reg problemLDIR                          call BLOCK_MOVE31  LD HL,att_work  ;L8A26 ;$5C00  LD DE,ATT0 ; 8A29 ;$5800LD BC,$0200 ; 8A2C ;              ld a,$200/32LDIR  ; 8A2F ;                   call BLOCK_MOVE32  LD HL,bottom_att ;L8B07 ;L9A00  LD DE,ATT16 ; 8B0A ;$5A00LD BC,$0100 ; 8B0D ;             ld a,$100/32LDIR                          call BLOCK_MOVE32  LD HL,ATT0 ;L8C03 ;$5800LD DE,ATT0+1 ; 8C06 ;$5801LD BC,$01FF ; 8C09 ;              ld a,$200/32 LD (HL),A ; 8C0C ;                <<<<<< problems here with the a registerLDIR                           call BLOCK_MOVE31  LD HL,CHAR0 ;L8C4A ;$4000  LD DE,CHAR0+1 ; 8C4D ;$4001LD BC,$0FFF ; 8C50 ;             ld a,$1000/32LD (HL),$00 ; 8C53 ;    LDIR                          call BLOCK_MOVE31  LD HL,att_master ; 96F4 ;$5E00  LD DE,ATT0 ; 96F7 ;$5800LD BC,$0200 ; 96FA ;              ld a,$200/32LDIR  ; 96FD ;                   call BLOCK_MOVE32  LD HL,CHAR0 ; 96FF ;$4000 LD DE,CHAR0+1 ; 9702 ;$4001LD BC,$0FFF ; 9705 ;              ld a,$1000/32LD (HL),$18 ; 9708 ; LDIR                           call BLOCK_MOVE31    The above illustrates why we have BLOCK_MOVE32 and BLOCK_MOVE31 . In a game like JSW we are moving multiples of 32 in the vast majority of cases. Only two of the above cases causes a pause and a need to work out how to preserve the a register. (it might contain more instances where the "A" register needs to be preserved)    very easy to change and very easy to work out.   The problem then comes with ,  how we manage the raster copy. Which in the recent posting uses a differnt BLOCK_MOVE Edited July 17, 2019 by Norman Sword IRF 1 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted July 17, 2019 Author Report Share Posted July 17, 2019 (edited) Since the last posted raster copy is a lot faster without using the "a" register and we don't want to supply two lots of LDI routines..... Lets just chop up the old LDI routine and make that return without modifying the "a" register. Then we have only one LDI routine.  The chopping up of the routine takes 5 bytes to modify and five bytes to put it back. Since these two modifications are executed only once, it is a lot faster than having to re-use the "a" register inside the loop.  The Block move from post #58  BLOCK_MOVE32:      ldiBLOCK_MOVE31    ldi    rept 30     ldi   endm S_M_C_fast: equ $     dec a                  ; change this opcode    jr nz,BLOCK_MOVE32   ret  ;----------------------------------------------- The raster copy is expanded to become  ;modify  ld hl,S_M_C_fast  ;position to modify  ld (hl),$C9        ;opcode  value for ret  >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ;copy work and attribute screens   ld hl,att_work   ld de,ATT0  ;;;;;; ld b,0                        ; this was set for usage in a different routine   exx   ld hl,ytable   ld bc,128  ; must be a multiple of 32 ; this is 4*32 ;- that is 4 raster lines before the attributes are written in;loop executed 128 times on each game loopraster: ld e,(hl) inc l push hl ld h,(hl) ld l,e ld d,h res 5,d call BLOCK_MOVE32       ;executed 128 times on each game loop jp pe,n_raster exx                                 ; this code is executed 16 times on each game loop ;;;;ld c,32                            ; this was set for usage in a different routine call BLOCK_MOVE32        exx inc bn_raster: pop hl inc l jr nz,raster  >>>>>>>>>>>>>>>>>>>>>>>>>>>> ;restore back to old   ld hl,S_M_C_fast  ld (hl),$3d        ;opcode value for dec a  ; rest of code   which leaves only one ldi routine.   That's today's version..... Edited July 18, 2019 by Norman Sword IRF 1 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted July 17, 2019 Author Report Share Posted July 17, 2019 (edited) since the source code for me is just a blank canvas to edit. I am not restricted to worrying about will a piece of code fit in here or there. If I delete a byte in the source code, I immediately have that byte and it can be used anywhere I want in the limits of the memory I am editing. The restrictions are removed.Block move  ld hl,source ld de,destination ld bc,size ldir  looking at this yet again, but from the no restrictions point of view. The best solution is slightly different  Shift_block32:                ; new label to distinguish from all the other code     ldi Shift_block31    rept31    ldi    endm    jp   pe,Shift_block32   ret  we write the code out exactly as specified in JSW but we now use one more byte on each instance and rather than LDIR we use an extra byte and call Shift_block32 or Shift_block31 ; Depending on the value being divisible by 32 or one short This also does not use the "a" register- similar to  LDIR on its own. This method also is as close to a replacement LDIR as we can generate in code,  returning with the same parameters as would be set with LDIR.   so LD HL,CHAR0 ;L869F ;$4000 LD DE,CHAR0+1 ; 86A2 ;$4001LD BC,$1AFF ; 86A5 ;   LD (HL),$00 ; 86A8 ;LDIR  ; 86AA ;                                     call shift_block31   This is faster again (marginally) but uses the same registers as the original  I suppose you could modify this routine  to use a similar raster copy routine as the one posted above. e.g.  -----------------------------------------------------------------------------------------------------------------------------------------------------------;copy work and attribute screens    ld hl, S_M_C_New_Mod      ;place to modify   ld (hl),$c9                ; modify in a "ret"     ld hl,att_work   ld de,ATT0  ;;;;ld b,0                        ; this was set for usage in a different routine   exx   ld hl,ytable   ld bc,128  ; must be a multiple of 32 ; this is 4*32 ;- that is 4 raster lines before the attributes are written in;loop executed 128 times on each game loopraster: ld e,(hl) inc l push hl ld h,(hl) ld l,e ld d,h res 5,d call Shift_block32       ;executed 128 times on each game loop jp pe,n_raster exx                                 ; this code is executed 16 times on each game loop ;;;;ld c,32                            ; this was set for usage in a different routine call Shift_block32        exx inc bn_raster: pop hl inc l jr nz,raster   ld hl,S_M_C_New_Mod   ;place to midify  Ld (hl),$ea            ;opcode for Jp PE,xx  rest of code -----------------------------------------------------------------------------------------Shift_block32:                             ; new label to distinguish from all the other code       ldiShift_block31:     rept31     ldi     endm S_M_C_New_Mod Equ $    jp   pe,Shift_block32        ;opcode toggled between ( "ret" ) and ( "jp pe,xx" )  ret Edited July 18, 2019 by Norman Sword IRF 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.