A total rewrite of JSW in 48k using Matthews core code

IRF · October 26, 2017

This type of modification has no affect if no value is placed in (ix+4).

So as long as the sprite data has (ix+4)=0 on all horizontal sprites, the game will run as normal.

In a project I'm working on, some of the guardians briefly wrap around the vertical screen-edge, so setting byte 4 to 00 would accidentally cause a match.

However, I think that setting Bit 5 or 6 for Bytes 4 of such a guardian's definition should prevent a match from occurring unintentionally.

Norman Sword · October 26, 2017

Part 3

The 'extending/retracting' platform at the top-left of The Bathroom reminds me of the moving platform in one of the Geoff Mode patches (in 'Willy Takes a Trip's room 10: 'A Quiet Corner to Rest in'), except that in The Bathroom it has 'asymmetrical logic':

- After the Geoff Mode moving platform has been moved along by a cell-column in either direction, the room cell which it has just vacated is restored with an Air block;

- In Norman's The Bathroom, the cells to the right of the platform are reverted to Air as the platform retracts leftwards, but when the platform is extending rightwards, the cells to the left of the rightmost end of the platform remain as floor/Water cells.

A quick explanation of terms

Master screen. Where the basic screen, minus sprites etc. is drawn

Working screen. The master screen is copied here and the sprites added

Real screen. Where the working screen is copied and is visible to the player

Moving a one cell platform left or right, takes up very little effort or code. Because the rooms are redrawn from the master Attribute copy, and the master screen copy on each game loop. Any additional graphics etc. written just to the working copy screens will be deleted on each game loop, by the copying from the master to the working screens. Thus to move a single square requires just writing to the working screen a single graphic on each loop.

Rather than work on the working copy screens. The platform data was written to the master screens. And as above writes just one graphic either a background (erase) or a platform (write) after any necessary delay. The code was written to have expanding contracting floors and not a single moving block.

NOTE the difference in action, it looks the same, but the first method has to write the graphic on each and every game loop. The second method only writes when there is a change. The accumulative addition of unneeded code is a major factor in the speed of game play.

For example. Unneeded game loop code.

Why is the object count displayed on each game loop? It only changes on object collection

Why is the time printed on each game loop? It only changes every 256 game loops

Why are the dancing willies drawn when they do not dance?

And the biggest problem the game has.

Why move the data using LDIR? There are numerous methods that are quicker.

---------------------------------------------------------

Side track on LDIR.

Each byte moved using LDIR takes 22 T-states. To remove all instances of this slow block movement is very easy.

First ignore the stack copy method, too many blocks of data and not flexible enough to slot into existing code.

Use the simpler LDI method which only uses around 68 bytes and is very easy to insert into the code.

start with the typical layout for block move using LDI e.g. set aside 68 bytes like so.

BLOCK32 LDI

BLOCK31 LDI

BLOCK30 LDI

BLOCK29 LDI

ETC TILL

BLOCK1 LDI

DEC A

JR NZ,BLOCK32

RET

Next go through the original code and remove/change the LDIR code in this manner

typical block move

LD HL,COPY

LD DE,SCREEN

LD BC,1024

LDIR

Change to

LD HL,COPY

LD DE,SCREEN

LD A,1024/32

CALL BLOCK32

A block fill gets changed from

LD HL,SCREEN

LD DE,SCREEN+1

LD BC,4095

LD (HL),0

LDIR

To

LD HL,SCREEN

LD DE,SCREEN+1

LD A,4096/32 ---- NOTE THE VALUE IS 4096/32

LD (HL),0

CALL BLOCK31 ---- NOTE THIS CALLS BLOCK31

it is possible to change all the current LDIR's in the game with the above style code. This will increase speed by over 20%

The smaller block moves of 6 bytes etc are left. (not worth the effort- no speed improvement)

IRF · October 26, 2017

In order to conserve bytes, could those 32 consecutive LDI commands be placed within a sub-loop? (With the shadow register A' used for the count.)

Or would doing that effectively undo the speed increase that you achieved when you wrote out the LDIR loops?

Norman Sword · October 26, 2017

In order to conserve bytes, could those 32 consecutive LDI commands be placed within a sub-loop? (With the shadow register A' used for the count.)

Or would doing that effectively undo the speed increase that you achieved when you wrote out the LDIR loops?

The ldir block move of data takes 22-T-states to move a single byte of data. When the LDIR is used to move say 4096 bytes of data it takes 4096*22 T-states (90112) T-states in total.

Using a long line of 32 LDI's changes the timing to around 16 T-states per byte. Note this is per byte.

There is an overhead of dec "A" every 32 bytes and the JR every 32 bytes, this extra overhead is around 16-t-states.

But for every 32 bytes you have saved the difference between LDIR (22) and LDI (16) which is 6 T-states times 32. = 192 t-states

The saving per 32 bytes is 192 T-states (minus) the loop over head of 16 T-states. so a saving of 176 T-states for every move of 32 bytes.

Going back to the initial 4096 bytes moved with LDIR taking 90112 T-states. This is replaced by a repeating loop over the Block LDI code. In this case it will loop 128 times giving an overall saving of 128*176 T-states. or =22528 T-states . The call and the ret to the routine are insignificant compared to these figures.

block move 32 bytes using LDIR =32*22 T-states=704 T-states

block move using a long line of LDI= 32*16+16 =528 T-states

LDIR of 4096 bytes=90112 T-states

LDI of 4096 byes =128*528 T-states =67584 T-states a saving of 22528 T-states.

Enough time saving in the one loop to execute around another two thousand op-codes

Since every game loop:-

It copies the master Att screen to the working Att screen

it copies the working Att screen to the real Att_screen

it copies the Master screen to the working screen

It copies the working screen to the real screen.

The game moves every loop an enormous amount of data.

The jagged finger code for screen copy that I use, incorporates the block move into its code.

Getting back to the original statement can I use some sort of sub loop?

Short answer is No .

We are dealing with tiny timing differences that accumulate into big differences due to the number of times they are executed

32LDI's in line was a compromise between speed and size. It also happens to be the amount of data in one raster line, so I settled on that figure just for that reason.

It is not unknown for games to use a vastly larger piece of code to try and improve the speed even more. But then we start to move into the realms of using Stack copy and the associated amount of memory that uses.

LDI is simple, easy to slot in, and does a major change in speed.

( since the figures listed above do not tally, there are mistakes in the arithmetic. This should not distract from the overall message conveyed)

(edited yet again to get the figures on the arithmetic to match)

Edited October 27, 2017 by Norman Sword

Norman Sword · October 26, 2017

This is a deleted copy of the above.

Edited October 26, 2017 by Norman Sword

IRF · October 26, 2017

Getting back to the original statement can I use some sort of sub loop?

Short answer is No .

I thought that would be the case. Thanks for the considered response though.

Norman Sword · October 26, 2017

Text has been edited to correct lots of errors --- see post following

Which highlighted a problem, which has been edited twice.

Also edited to include a missing opcode

------------------------------------------------------------------------------------

Space to spare.

I read recently in a post that you were using the rope table space for data/code.

It is an very easy matter to delete most of the data and use the space saved to implement the LDI table as mentioned above

replace the data table with this data.

ROPE_TABLE
x8300 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8308 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8310 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8318 DEFB $60,$60,$60,$60,$60,$60,$60,$60
x8320 DEFB $61,$61,$61,$61,$61,$61,$61,$61
x8328 DEFB $61,$61,$61,$61,$62,$62,$62,$62
x8330 DEFB $42,$62,$62,$42,$62,$42,$62,$42
x8338 DEFB $62,$42,$42,$42,$62,$42,$42,$42
x8340 DEFB $42,$42,$41,$42,$42,$41,$41,$42
x8348 DEFB $41,$41,$42,$42,$43,$42,$43,$42
x8350 DEFB $43,$43,$43,$43,$43,$43

X8355 DEFB $21,$21

The continuous data space from $8358 up to $8400 becomes free to use.

Change the code as listed below to use this changed data table (assuming it will fit between $9316 and $9327

If it wont fit there is plenty of space created by deleting the second half of the rope table

X9316 ADD A,(IX+rope1) ;$01

From here

;remove the rope swing direction bit (bit 7)

res 7,a ; it helps to include the full modifications

ld l,a

ld H,High ROPE_TABLE ; $83

; extract the data, high nibble is Y-shift

ld a,(hl) ;grab data hl=pointer to rope data

rrca
rrca
rrca
rrca
and $0e ;$0e=14=00001110B

; ;instant crash is this value is odd (easier to just remove possibility)

; add Y-shift onto the Y-table offset

add a,iyl
ld iyl,a

;extract the data, low nibble is X-shift

ld a,(hl)
and $0f ;$0f=15=0000111B

;To here

X9327 JR Z,L9350 ;Jump if so

X9329 LD B,A ;B is the count for rotations of the drawing byte (the rope drawing data bit)

;--------------------------------

from a casual look it would appear the extra code Res 7,a make this mod bigger than the available space.

If it MUST fit and you can be bothered altering the data. Then the rope data can be changed to use 2 bits for the x offset and 2 bits for the y offset. eg. 00000011b for xoffset and 00001100b for the y offset. This mod changes the code to remove probably enough opcode to make it fit

the y-offset is always even and so has only values of 6 and 4 . two bits allow for values 0,2,4,6

x is extracted with

and 3 (not and $0f)

The y is extracted with

and 1100b

rrca ; note only one rrca needed so 3 bytes shorter

For me the extra effort was not worth while. I had no intention of only modyifing the original code and restricting my self to the available space

Edited October 27, 2017 by Norman Sword

IRF · October 27, 2017

Ah, I see what you've done there - very cunning!

Although I think a few of the lines of data were misaligned when you merged them - should it be this?:

X8328 DEFB $61,$61,$61,$61,$62,$62,$62,$62

X8330 DEFB $42,$62,$62,$42,$62,$42,$62,$42

X8338 DEFB $62,$42,$42,$42,$62,$42,$42,$42

****

(And just to clarify - the AND commands in your post above have operands expressed in that antiquated numbering system that I believe some luddites still stick to, known as 'decimal'? ;) In hexadecimal, we're talking about: AND #0E / AND #0F i.e. pick out the lower nybble in one instance, and Bits 1-3 in the other.)

Edited October 27, 2017 by IRF

IRF · October 27, 2017

I think the compressed Rope Animation Table, in full, should be as follows:

x8300 DEFB $60,$60,$60,$60,$60,$60,$60,$60

x8308 DEFB $60,$60,$60,$60,$60,$60,$60,$60

x8310 DEFB $60,$60,$60,$60,$60,$60,$60,$60

x8318 DEFB $60,$60,$60,$60,$60,$60,$60,$60

x8320 DEFB $61,$61,$61,$61,$61,$61,$61,$61

x8328 DEFB $61,$61,$61,$61,$62,$62,$62,$62

x8330 DEFB $42,$62,$62,$42,$62,$42,$62,$42

x8338 DEFB $62,$42,$42,$42,$62,$42,$42,$42

x8340 DEFB $42,$42,$41,$42,$42,$41,$41,$42

x8348 DEFB $41,$41,$42,$42,$43,$42,$43,$42

x8350 DEFB $43,$43,$43,$43,$43,$43

Note that the first 32 (#20) values should all be '$60'. This corresponds to the situation where the rope is hanging straight down (the Animation Frame Index = 00), so all 32 (#20) segments of the rope have zero horizontal displacement (i.e. the lower nybble of entries x8300 to x831F are all zero).

IRF · October 27, 2017

Actually, prior to this command:

X9319 LD L,A

wouldn't you now need to have an AND #7F operation? Otherwise, when the rope is left-of-centre, the Animation Frame Index (which is added to the Segment Counter to point at the appropriate entries in the rope table) would have values greater than #80.

The original code accounts for this by using SET 7, L and RES 7, L commands to ensure that it is always accessing the correct half of the table, but with the two halves merged, there is a need to force the routine to always look up the lower half of the table.

Edited October 27, 2017 by IRF

Sign In

A total rewrite of JSW in 48k using Matthews core code

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Norman Sword

Norman Sword

Norman Sword

Posted Images

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Important Information