Jump to content


Photo

A total rewrite of JSW in 48k using Matthews core code


  • Please log in to reply
52 replies to this topic

#31 IRF

IRF

    Advanced Member

  • Contributor
  • 4,309 posts

Posted 26 October 2017 - 08:25 AM

This type of modification has no affect if no value is placed in (ix+4).

So as long as the sprite data has (ix+4)=0 on all horizontal sprites, the game will run as normal.


In a project I'm working on, some of the guardians briefly wrap around the vertical screen-edge, so setting byte 4 to 00 would accidentally cause a match.

However, I think that setting Bit 5 or 6 for Bytes 4 of such a guardian's definition should prevent a match from occurring unintentionally.

#32 Norman Sword

Norman Sword

    Advanced Member

  • Member
  • PipPipPip
  • 239 posts

Posted 26 October 2017 - 09:40 AM

Part 3

 

The 'extending/retracting' platform at the top-left of The Bathroom reminds me of the moving platform in one of the Geoff Mode patches (in 'Willy Takes a Trip's room 10: 'A Quiet Corner to Rest in'), except that in The Bathroom it has 'asymmetrical logic':

- After the Geoff Mode moving platform has been moved along by a cell-column in either direction, the room cell which it has just vacated is restored with an Air block;

- In Norman's The Bathroom, the cells to the right of the platform are reverted to Air as the platform retracts leftwards, but when the platform is extending rightwards, the cells to the left of the rightmost end of the platform remain as floor/Water cells.
 

 

A quick explanation of terms

 

Master screen. Where the basic screen, minus sprites etc. is drawn

Working screen. The master screen is copied here and the sprites added

Real screen. Where the working screen is copied and is visible to the player

 

 

Moving a one cell platform left or right, takes up very little effort or code. Because the rooms are redrawn from the master Attribute copy, and the master screen copy on each game loop. Any additional graphics etc.  written just to the working copy screens will be deleted on each game loop, by the copying from the master to the working screens. Thus to move a single square requires just writing to the working screen a single graphic on each loop.

 

Rather than work on the working copy screens. The platform data was written to the master screens. And as above writes just one graphic either a background (erase) or a platform (write)  after any necessary delay. The code was written to have expanding contracting floors and not a single moving block. 

NOTE the difference in action, it looks the same, but the first method has to write the graphic on each and every game loop. The second method only writes when there is a change.  The accumulative addition of unneeded code is a major factor in the speed of game play.

 

 

For example. Unneeded game loop code.

 

Why is the object count displayed on each game loop? It only changes on object collection

Why is the time printed on each game loop? It only changes every 256 game loops

Why are the dancing willies drawn when they do not dance? 

 

And the biggest problem the game has.

Why move the data using LDIR? There are numerous methods that are quicker. 

 

---------------------------------------------------------

 

Side track on LDIR.

 

Each byte moved using LDIR takes 22 T-states. To remove all instances of this slow block movement is very easy.

 

First ignore the stack copy method, too many blocks of data and not flexible enough to slot into existing code.

 

Use the simpler LDI  method which only uses around 68 bytes and is very easy to insert into the code.

 

start with the typical layout for block move using LDI    e.g. set aside 68 bytes like so.

 

 

BLOCK32 LDI

BLOCK31 LDI

BLOCK30 LDI

BLOCK29 LDI

ETC TILL

 

BLOCK1 LDI

 DEC A

 JR NZ,BLOCK32

 RET

 

Next go through the original code and remove/change the LDIR code in this manner

 

typical block move

 

 LD HL,COPY

 LD DE,SCREEN

 LD BC,1024

 LDIR

 

Change to 

 

 LD HL,COPY

 LD DE,SCREEN

 LD A,1024/32

 CALL BLOCK32

 

A block fill gets changed from

 

 LD HL,SCREEN

 LD DE,SCREEN+1

 LD BC,4095

 LD (HL),0

 LDIR

 

To

 

 LD HL,SCREEN

 LD DE,SCREEN+1

 LD A,4096/32 ---- NOTE THE VALUE IS 4096/32

 LD (HL),0

 CALL BLOCK31 ---- NOTE THIS CALLS BLOCK31

 

it is possible to change all the current LDIR's in the game with the above style code. This will increase speed by over 20%

 

The smaller block moves of 6 bytes etc are left. (not worth the effort-  no speed improvement)

 

 

 

 

 

 

 

 

 

 

 

 



#33 IRF

IRF

    Advanced Member

  • Contributor
  • 4,309 posts

Posted 26 October 2017 - 11:00 AM

In order to conserve bytes, could those 32 consecutive LDI commands be placed within a sub-loop? (With the shadow register A' used for the count.)

Or would doing that effectively undo the speed increase that you achieved when you wrote out the LDIR loops?

#34 Norman Sword

Norman Sword

    Advanced Member

  • Member
  • PipPipPip
  • 239 posts

Posted 26 October 2017 - 01:13 PM

In order to conserve bytes, could those 32 consecutive LDI commands be placed within a sub-loop? (With the shadow register A' used for the count.)

Or would doing that effectively undo the speed increase that you achieved when you wrote out the LDIR loops?

 

The ldir block move of data takes 22-T-states to move a single byte of data. When the LDIR is used to move say 4096 bytes of data it takes 4096*22 T-states (90112) T-states in total.

 

Using a long line of 32 LDI's changes the timing to around 16 T-states per byte. Note this is per byte.

 

There is an overhead of  dec "A" every 32 bytes and the JR every 32 bytes, this extra overhead is around 16-t-states.

But for every 32 bytes you have saved the difference between LDIR (22) and LDI (16) which is 6 T-states times 32. = 192 t-states

 

The saving per 32 bytes is 192 T-states (minus) the loop over head of 16 T-states. so a saving of 176 T-states for every move of 32 bytes.

 

Going back to the initial 4096 bytes moved with LDIR taking 90112 T-states. This is replaced by a repeating loop over the Block LDI code. In this case it will loop 128 times giving an overall saving of 128*176 T-states. or =22528 T-states . The call and the ret to the routine are insignificant compared to these figures. 

 

block move 32 bytes using LDIR =32*22 T-states=704 T-states

 

block move using a long line of LDI= 32*16+16 =528 T-states

 

 

LDIR of 4096 bytes=90112 T-states

LDI of 4096 byes =128*528 T-states =67584 T-states      a saving of 22528 T-states.

 

Enough time saving in the one loop to execute around another two  thousand op-codes

 

Since every game loop:-

 

It copies the master Att screen to the working Att screen

it copies the working Att screen to the real Att_screen

it copies the Master screen to the working screen

It copies the working screen to the real screen.

 

The game moves every loop an enormous amount of data.

 

The jagged finger code for screen copy that I use, incorporates the block move into its code.

 

Getting back to the original statement can I use some sort of sub loop?

Short answer is No .

We are dealing with tiny timing differences that accumulate into big differences due to the number of times they are executed

 

32LDI's in line was a compromise between speed and size. It also happens to be the amount of data in one raster line, so I settled on that figure just for that reason.

 

It is not unknown for games to use a vastly larger piece of code to try and improve the speed even more. But then we start to move into the realms of using Stack copy and the associated amount of memory that uses.

 

LDI is simple, easy to slot in, and does a major change in speed. 

 

 

( since the figures listed above do not tally, there are mistakes in the arithmetic. This should not distract from the overall message conveyed)

 

(edited yet again to get the figures on the arithmetic to match)


Edited by Norman Sword, 27 October 2017 - 09:29 AM.


#35 Norman Sword

Norman Sword

    Advanced Member

  • Member
  • PipPipPip
  • 239 posts

Posted 26 October 2017 - 01:14 PM

 

 

This is a deleted copy of the above.


Edited by Norman Sword, 26 October 2017 - 01:16 PM.


#36 IRF

IRF

    Advanced Member

  • Contributor
  • 4,309 posts

Posted 26 October 2017 - 06:10 PM

Getting back to the original statement can I use some sort of sub loop?
Short answer is No .


I thought that would be the case. Thanks for the considered response though.

#37 Norman Sword

Norman Sword

    Advanced Member

  • Member
  • PipPipPip
  • 239 posts

Posted 26 October 2017 - 11:35 PM

Text has been edited to correct lots of errors --- see post following

 

Which highlighted a problem, which has been edited twice. 

 

Also edited to include a missing opcode

 

------------------------------------------------------------------------------------

 

 

Space to spare.

 

I read recently in a post that you were using the rope table space for data/code.

It is an very easy matter to delete most of the data and use the space saved to implement the LDI table as mentioned above

replace the data table with this data.

 

ROPE_TABLE
x8300 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8308 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8310 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8318 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8320 DEFB $61,$61,$61,$61,$61,$61,$61,$61
x8328 DEFB $61,$61,$61,$61,$62,$62,$62,$62
x8330 DEFB $42,$62,$62,$42,$62,$42,$62,$42
x8338 DEFB $62,$42,$42,$42,$62,$42,$42,$42
x8340 DEFB $42,$42,$41,$42,$42,$41,$41,$42
x8348 DEFB $41,$41,$42,$42,$43,$42,$43,$42
x8350 DEFB $43,$43,$43,$43,$43,$43

X8355 DEFB $21,$21

 

 

The continuous data space from $8358 up to $8400 becomes free to use.

 

Change the code as listed below to use this changed data table (assuming it will fit between $9316 and $9327

If it wont fit there is plenty of space created by deleting the second half of the rope table

 

X9316 ADD A,(IX+rope1)                    ;$01

 

From here 

 

;remove the rope swing direction bit (bit 7)

 res 7,a                                ; it helps to include the full modifications

 ld l,a

 ld H,High ROPE_TABLE     ; $83

 

; extract the data, high nibble is Y-shift 

 ld a,(hl)                               ;grab data   hl=pointer to rope data

 rrca
 rrca
 rrca
 rrca
 and $0e                            ;$0e=14=00001110B 

;                                         ;instant crash is this value is odd (easier to just remove possibility) 

 

; add Y-shift onto the Y-table offset 

 add  a,iyl
 ld iyl,a

 

;extract the data, low nibble is X-shift

 ld a,(hl)
 and $0f                             ;$0f=15=0000111B 

 

;To here

 

X9327 JR Z,L9350            ;Jump if so

X9329 LD B,A                   ;B is the count for rotations of the drawing byte (the rope drawing data bit)

 

;--------------------------------

 

from a casual look it would appear the extra code Res 7,a make this mod bigger than the available space.

 

If it MUST fit and you can be bothered altering the data. Then the rope data can be changed to use 2 bits for the x offset and 2 bits for the y offset. eg. 00000011b for xoffset and 00001100b for the y offset.  This mod changes the code to remove probably enough opcode to make it fit

 

the y-offset is always even and so has only values of 6 and 4 . two bits allow for values 0,2,4,6

 

 

x is extracted with

      and 3 (not and $0f)

The y is extracted with

     and 1100b

     rrca                        ; note only one rrca needed so 3 bytes shorter

 

For me the extra effort was not worth while. I had no intention of only modyifing the original code and restricting my self to the available space 


Edited by Norman Sword, 27 October 2017 - 01:42 PM.


#38 IRF

IRF

    Advanced Member

  • Contributor
  • 4,309 posts

Posted 27 October 2017 - 12:03 AM

Ah, I see what you've done there - very cunning!

Although I think a few of the lines of data were misaligned when you merged them - should it be this?:

X8328 DEFB $61,$61,$61,$61,$62,$62,$62,$62
X8330 DEFB $42,$62,$62,$42,$62,$42,$62,$42
X8338 DEFB $62,$42,$42,$42,$62,$42,$42,$42

****

(And just to clarify - the AND commands in your post above have operands expressed in that antiquated numbering system that I believe some luddites still stick to, known as 'decimal'? ;) In hexadecimal, we're talking about: AND #0E / AND #0F i.e. pick out the lower nybble in one instance, and Bits 1-3 in the other.)

Edited by IRF, 27 October 2017 - 08:43 AM.


#39 IRF

IRF

    Advanced Member

  • Contributor
  • 4,309 posts

Posted 27 October 2017 - 12:11 PM

I think the compressed Rope Animation Table, in full, should be as follows:

x8300 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8308 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8310 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8318 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8320 DEFB $61,$61,$61,$61,$61,$61,$61,$61
x8328 DEFB $61,$61,$61,$61,$62,$62,$62,$62
x8330 DEFB $42,$62,$62,$42,$62,$42,$62,$42
x8338 DEFB $62,$42,$42,$42,$62,$42,$42,$42
x8340 DEFB $42,$42,$41,$42,$42,$41,$41,$42
x8348 DEFB $41,$41,$42,$42,$43,$42,$43,$42
x8350 DEFB $43,$43,$43,$43,$43,$43

Note that the first 32 (#20) values should all be '$60'. This corresponds to the situation where the rope is hanging straight down (the Animation Frame Index = 00), so all 32 (#20) segments of the rope have zero horizontal displacement (i.e. the lower nybble of entries x8300 to x831F are all zero).

#40 IRF

IRF

    Advanced Member

  • Contributor
  • 4,309 posts

Posted 27 October 2017 - 12:24 PM

Actually, prior to this command:

X9319 LD L,A


wouldn't you now need to have an AND #7F operation? Otherwise, when the rope is left-of-centre, the Animation Frame Index (which is added to the Segment Counter to point at the appropriate entries in the rope table) would have values greater than #80.

The original code accounts for this by using SET 7, L and RES 7, L commands to ensure that it is always accessing the correct half of the table, but with the two halves merged, there is a need to force the routine to always look up the lower half of the table.

Edited by IRF, 27 October 2017 - 12:47 PM.





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users