IRF Posted July 11, 2019 Report Share Posted July 11, 2019 (edited) For consistency and elegance, wouldn't it be best, in the case of operations which are self-modified by the code, to insert NOP command(s) (opcode #00) wherever they appear in the source code listing? That way the default value held at the pertinent address(es) would be zero, as is the case with the operands that are self-modified. e.g. For your example of a direction label, list it in the source code as: S_M_C_direction: NOP And then use: LD A, #3C [for INC A] or LD A, #3D [for DEC A] or XOR A [to restore the default NOP] followed by: LD (S_M_C_direction), A for movement in whichever direction (or no direction). Edited July 11, 2019 by IRF jetsetdanny and Spider 2 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted July 11, 2019 Author Report Share Posted July 11, 2019 (edited) I would imagine you would end up with a large rule book. example 1:-S_M_C_counter1: equ $+1 ld a,12 inc a and 7 or 8 ld (S_M_C_counter1),ahere the value varies between 8 and 15:- your example has failed, we never have a value of zero in the variable example 2:- S_M_C_opcode: inc a direction_switch equ $3c xor $3d ; this is ("inc a") xor ("dec a") ld hl,S_M_C_opcode ld a,(hl) xor direction_switch ld (hl),a here the opcode varies between either "inc a" or "dec a". the code is switching direction. again never zero ----------------------------------------------------- The circumstances can change from once instance to another. The S_M_C_ is alerting you to code that is modifying. The $-$ is making the statement that the value will be changed before the opcode is executed. In a lot of instances we must have an opcode or an initial value. In those cases the value is inserted or the opcode written out. I suppose it is similar to saying a block move is always in this format:-ld hl,source ld de,destination ld bc, count LDIR when the reality says it is a lot of the time, but the variations are vast. Edited July 11, 2019 by Norman Sword Spider, jetsetdanny and IRF 3 Quote Link to comment Share on other sites More sharing options...
IRF Posted July 12, 2019 Report Share Posted July 12, 2019 (edited) Thanks Norman. As it happens, in example 1, if the initial value held by S_M_C_counter1 was zero, then it would quickly be overwritten by 8 and then it would return to the intended pattern of the operand incrementing during each pass through the code (looping back from 15 to 8). (The initial value of zero might have an adverse impact though, depending on the context - especially if the variable is picked up by the program before it is first modified. e.g. an out-of-range guardian crashing into a wall at the edge of a room?) But I can see that in example 2, if you had a default value of zero stored at S_M_C_opcode, then execution of the code would never cause the labelled address to reach either of its intended operations (INC A or DEC A). Instead, the address S_M_C_opcode would toggle between acting as a NOP (00), and the 01 opcode - which would have the unintended effect of picking up the next pair of bytes which follow on from S_M_C_opcode, and loading those values into the BC register-pair! Edited July 12, 2019 by IRF Spider and jetsetdanny 2 Quote Link to comment Share on other sites More sharing options...
IRF Posted July 16, 2019 Report Share Posted July 16, 2019 (edited) A query [EDIT: Which I think I've answered myself in subsequent posts!]:In the Main Loop of a recent project, I have this arrangement (repeated four times, for copying pixels twice and attributes twice - primary to secondary buffer and then buffer to physical screen): LD HL, source LD DE, destination ; No need to define BC; it's not used now, so LD BC, xxxx command has been deletedLD A, #80 or LD A, #10 ; For copying the pixels (128 raster lines) or attributes (16 character rows) respectivelyloop:CALL subroutineDEC AJR NZ, loop; Once A reaches zero, flow of execution continues through the Main LoopThe subroutine which is CALLed consists of 32 consecutive LDI commands, followed by a RET.This was obviously based on one of Norman Sword's suggestions (duly credited in the readme file for the project in question). However, there is a slight difference - Norman's subroutine incorporates the DEC A and JR NZ commands (after the final LDI and before the RET), whereas in my version, those commands are located in the Main Loop.In terms of memory, Norman's version is obviously more efficient (because I have to repeat the DEC A and JR NZ commands four times within the Main Loop, rather than just once in Norman's subroutine).However - and here is my query - would my version be slightly faster? [i don't mean the game as a whole - Norman has done lots of other things to speed up the game - I mean purely in terms of comparing the two variants of the LDI method like-for-like.]My thinking is that the number of T-States which it takes to perform a relative jump is proportional to the distance through the code which has to be jumped - 67 bytes in Norman's case, and only 5 bytes in mine.?****N.B. My method may complicate things in cases where a chunk of code is being overwritten with a single value - where the first byte is overwritten directly and then the number of bytes to which the same values is to be copied in a loop is minus one. e.g. for attribute update with a single value (such as for a screen flash effect), use #01FF instead of #0200 to define the size of the loop.Norman's code deals with such cases by CALLing a late entry point into his subroutine, coinciding with the second LDI command in the subroutine. (But the JR NZ at the end of the subroutine jumps back to the first LDI in the subroutine.)In such cases, I think my method would unavoidably end up 'overshooting', and overwriting one more byte than it should. (But in the aforementioned project, I didn't actually use an LDI method for 'block fill' purposes, only for 'block move'.)EDIT: For reference:http://jswmm.co.uk/topic/375-a-total-rewrite-of-jsw-in-48k-using-matthews-core-code/page-4?do=findComment&comment=7745Note also my comment/query here about a couple of presumed typos:http://jswmm.co.uk/topic/375-a-total-rewrite-of-jsw-in-48k-using-matthews-core-code/page-6?do=findComment&comment=9047 Edited July 17, 2019 by IRF jetsetdanny 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted July 17, 2019 Report Share Posted July 17, 2019 (edited) My thinking is that the number of T-States which it takes to perform a relative jump is proportional to the distance through the code which has to be jumped - 67 bytes in Norman's case, and only 5 bytes in mine. ? On reflection, my variant might not be faster after all - my subroutine is CALLed #10 or #80 times during every pass through each part of the Main Loop that performs a block copy operation. The number of T-States for that many CALL/RET commands (versus just one CALL/RET in Norman's code) may well outweigh the saving in T-States achieved by shortening the length of the relative jump! Further investigation is required... Edited July 17, 2019 by IRF Spider 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted July 17, 2019 Report Share Posted July 17, 2019 (edited) Further investigation is required... ... And it seems I got completely the wrong end of the stick! The number of T-States for a conditional relative jump loop is based on how many times the relative jump has to be executed (here determined by counting down the value of A, which doesn't change between Norman's method and mine), rather than the distance back through the code that each relative jump spans (the operand of the JR command), as I had previously understood to be the case. :blush: So Norman's code (featuring only one CALL and RET per chunk of code copied) is certainly faster than my version! Edited July 17, 2019 by IRF Quote Link to comment Share on other sites More sharing options...
IRF Posted July 17, 2019 Report Share Posted July 17, 2019 (edited) So Norman's code (featuring only one CALL and RET) is certainly faster than my version! Compare and contrast (for each pass through the Main Loop): Primary to secondary pixel buffer = 128 raster lines Primary to secondary attribute buffer = 16 character rows So my method uses 144 CALLs and RETs. Norman's method only requires 2 CALLs and RETs (for the pixel loop and for the attribute loop). Unconditional CALL = 17 T-States Unconditional RET = 10 T-States So the difference in T-States is 142 x 27 = 3888 T-States (the amount by which Norman's method is faster than mine). [There should be no difference in terms of the copying of the secondary buffers to the physical screen, because the Jagged Finger fix means that the data isn't copied contiguously (in terms of the way that it is stored in memory). So there are separate CALLs to the subroutine for each individual raster line, in both Norman's and my method.] **** However, that 3888 is only a modest difference when you compare it with the overall saving achieved by abandoning LDIR in favour of the 32-consecutive-LDI method. Norman worked out that copying the pixels (4096 bytes) between buffers is faster by 22528 T-States. For the 512 bytes of attributes across 16 character rows of the playable screen, there is an additional saving of 2816 T-States. So the total saving (per Main Loop pass) achieved is 25344 T-States before you account for the time taken to perform CALLs and RETs. Edited July 17, 2019 by IRF Spider 1 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted July 17, 2019 Author Report Share Posted July 17, 2019 (edited) Re branching/ jumping and calling.The program counter is loaded during one of the clock cycles with data. This is the same with call's, Jump's and JR's. The number of clock cycles needed to set the data up is different. Once the program counter is loaded the next clock cycle we move to the new address. What this means is that the speed is fixed no matter where the Program Counter is asked to move to. A relative jump of 0 bytes is executed at the same speed as a relative jump of 127 bytes. Calls and jump's and I will also include ret's are similar the Program counter is loaded and the next clock cycle we execute the operand pointed at by the (possibly) changed Program counter. Each is acted on with no consideration of the amount of relative displacement from the old value.---------------------------------------------------------------------------- I will re read the posts above this one.... And perhaps comment further. Edited July 17, 2019 by Norman Sword IRF 1 Quote Link to comment Share on other sites More sharing options...
Norman Sword Posted July 17, 2019 Author Report Share Posted July 17, 2019 (edited) Using a call and a ret to 32 consecutive LDI's . A variation on my last version would probably do what you want.... And this assumption is based on a quick scan of all the changes listed in the above posts.;copy work and attribute screens ld hl,att_work ld de,ATT0 ;;;; ld b,0 ; this was set for usage in a different routine exx ld hl,ytable ld bc,128 ; must be a multiple of 32 ; this is 4*32 ;- that is 4 raster lines before the attributes are written in;loop executed 128 times on each game loopraster: ld e,(hl) inc l push hl ld h,(hl) ld l,e ld d,h res 5,d call BLOCKX_MOVE32 ;executed 128 times on each game loop jp pe,n_raster exx ; this code is executed 16 times on each game loop;;;; ld c,32 ; this was set for usage in a different routine call BLOCKX_MOVE32 exx inc bn_raster: pop hl inc l jr nz,raster ;Note the a register is not used in either routine --------------------------------------------------------------------- BLOCKX_MOVE32: rept 32 ldi endm ret ADDENDUM:- multiple reference through out these posts to BLOCK_MOVE32 or BLOCK_MOVE31.........I will go through all the posts and change the conflicting labels......In this post labels now called BLOCKX_MOVE32 Edited July 18, 2019 by Norman Sword Spider and IRF 2 Quote Link to comment Share on other sites More sharing options...
IRF Posted July 17, 2019 Report Share Posted July 17, 2019 Thanks Norman! I believe that relies on the fact that the LDI command resets the Overflow Flag if (and only if) the value of BC reaches zero after the operation? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.