..:: PCSX2 Forums ::..

Full Version: PS1 -> PS2 shared problems?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi - I've spent some time (probably >50% of it reversing ePSXe) with PS1 development.

Rather unfamiliar with the PS2 internals but I have a short list of questions about whether the PS2 engineers kept some of the (bastard) things that gave PS1 devs trouble.

(These are probably not important)


1) Cpu division by zero

I saw that r5900 has the div 0 --> 1:-1 + remainder. But not in the r3000 core.

PS1 Threads of Fate uses div 0 (infamous emulation bug). Does the PS2 r3000 ignore this?


2) Cpu I-cache

I think I read in your r5900 core that writes to main memory flush the I-cache line with correct opcodes?

On the PS1 r3000, main memory writes do not update the I-cache lines. Normally I wouldn't care but Gameshark CDX + Formula One 2001 take advantage of this (running stale code in cpu while dynamically changing the code in memory).


simple ex.
$80000000
beq r0,r0,$80000000
sw r0,0(r0)

jal start_cd
nop


Never calls jal because the cpu I-cache always runs beq-sw. Never sees the updated nop-sw in memory.

Just wondering if this affects the PS2 at all.



I've got more to ask - must dig up my notes though.
(stuff like beq - beq; jr - jr - addi sp; lbu - beq - nop; jr - lbu --> sb)

Thanks for any insight!
Branches in branch delay slots aren't allowed on the R5900 and there aren't load delay slots, we have no known instances of a game using self modifying code (cache instruction based recompiler clearing is planned for other reasons though).

For the divide by zero... I don't know much about that. If we already have the code for the R5900 and it's the same for the R3000, we could port it I suppose.
shalma the ps2's r3000 should behave the same way as the one on the ps1 does. if the current code isn't doing that then its wrong. Our IOP (r3000) rec is pretty outdated and afaik its likely an old version of pcsx's recompiler at-heart. we had someone working on a new IOP rec but the project was never finished.

i have seen at least one game in the bug reports forum in which the EE is reporting a branch in branch delay slot. so although it may be illegal, some game with hand coded MIPS asm might be evil enough to do it (many games end up doing it on the ps2's VU processors).

self-modifying code is something we haven't seen in games, but it doesn't mean games don't do it, just that we don't know any that do.

also, if you feel like helping out with pcsx2 i can pm you our irc channel where we have our dev discussions.
There's actually a lot of shared components between the PSX and the PS2.
The whole SPU is basically the same, just 2 of them glued together and running at 48khz.
We've had more games break because of SPU emulation quirks than because of EE or IOP ones Wink
Thanks for the IRC invite - I usually avoid chat rooms though (sucks up way too much of my time). :lol: But I'll keep it in mind if my heart changes.


Quote:shalma the ps2's r3000 should behave the same way as the one on the ps1 does. if the current code isn't doing that then its wrong.

Current ps2 r3000 code bluntly ignores div(u) 0 case. Caused Rue / Mint (Threads of Fate) to keep walking into invisible walls (PS1 major game breaker).


Quote:Our IOP (r3000) rec is pretty outdated and afaik its likely an old version of pcsx's recompiler at-heart. we had someone working on a new IOP rec but the project was never finished.

There's been some discoveries since the old PCSX days. Evil crap that no one was expecting. I'm sure I have the notes so I'll make sure to shore them up again.

The PCSX® recompiler is badly damaged also. Doesn't match up with the pcsxr interpreter. I have those notes too I believe. Making a new one would work better though (imo).

Thing is, we haven't done much testing when I was actively around. So the current code may not be accurate either.


I don't plan on helping out (much) with pcsx2. Just sharing what I've learned and see if it spares your team a little pain.

After my time with pcsxr (no longer), I prefer to 'openly' muse about my thoughts as large blobs (for private reasons).

=======================================
=======================================

(dug up some notes)

Here's an awful one that confused me. It's about r3000 branch delays.
http://forums.ngemu.com/showpost.php?p=1...tcount=595
http://forums.ngemu.com/showpost.php?p=1...tcount=597

http://forums.ngemu.com/showpost.php?p=1...tcount=613
http://forums.ngemu.com/showpost.php?p=1...tcount=614
http://forums.ngemu.com/showpost.php?p=1...tcount=615
http://forums.ngemu.com/showpost.php?p=1...tcount=616
http://forums.ngemu.com/showpost.php?p=1...tcount=617


We do have 1 game that makes use of a double branch sequence (Threads of Fate). I think it was used somewhere in the town (geometry).

This one was researched and these were the findings (above).

=======================================
=======================================

I remember notaz mentioning Shadow Master
http://www.gp32x.com/board/index.php?/to..._p__934815

jr - jr wasn't handled by pcsx. Would skip the 3rd opcode and crash.


These were the 2 changes he made (interpreter only).
http://pcsxr.codeplex.com/SourceControl/...nges/64369
http://pcsxr.codeplex.com/SourceControl/...nges/62781

=======================================
=======================================

I ask this for my own knowledge.

Code:
80018ce0 : 3C028002  LUI     00000000 (v0), 8002 (32770),

; load delay = stops battles from starting..!
80018ce4 : 90438CD0  LBU     80059ef4 (v1), 8cd0 (80020000 (v0)) [80018cd0]
80018ce8 : 10600005  BEQ     80059ef4 (v1), 00000000 (r0), 80018d00,
80018cec : 00000000  NOP    

; don't start battle
80018cf0 : 2484FFE9  ADDIU   0001a9f1 (a0), 0001a9f1 (a0), ffe9 (65513),
80018cf4 : 3C030004  LUI     00000000 (v1), 0004 (4),

PCSX handled this okay before.

In the above example, we introduced a load delay during development. This caused battles to stop working (never jumped to 18d00).

The old code ignored the load delay for 'lbu - beq' and battles would start when the counter dropped. Why would the r3000 appear to 'wait' a cycle? :???:

========================================
========================================

Thinking about it, pcsx doesn't handle (non-branch) load delays at all.

Agemo's Xenogears Chinese translation
Code:
80010000 : ORI 00000000 (t7), 00000000 (r0), fff8 (65528),
80010004 : SW 00000000 (r0), 0000 (0000fff8 (t7)) [0000fff8]
80010008 : ADDIU 00000000 (t8), 00000000 (r0), 0001 (1),

8001000c : LW 00000001 (t8), 0000 (0000fff8 (t7)) [0000fff8]

// due to load delay t8 should be 1 during this instruction
80010010 : ADDIU 007ffff8 (v0), 00000000 (t8), 0000 (0),

80010014 : JR 800118c4 (ra),
80010018 : NOP

This would cause the translation to detect emus and show a message (like d4s Breath of Fire 2 translation).


We didn't handle this (can't remember if it was fixed). So it'll likely be in your r3000 core too.

==========================================
==========================================

Quote:We've had more games break because of SPU emulation quirks than because of EE or IOP ones

Don't get me started on that one again! :lol:

Although it wasn't that bad on PS1 though - the cdrom core >60% PS1 game breakage.



While I'm on-topic about this, can I ask?
- Do any of the PS2 Final Fantasy use noise channel? We've got the same # noise frequency bits on both SPU(2).

And Final Fantasy 7-8-9-Tactics + Vagrant Story abuse the noise very well. Even using noise as a fmod.


There's more I'd like to ask but I have to review the PCSX2 archive again before I create problems and make a fool of myself. Laugh
jr jr is so evil that I'd be more inclined to patch the game than emulate that. Regarding the R3000A in general, it's just used to run services (sound, DVD, pads, memory cards, etc.) which have to be accessed from the IOP on the PS2, so there isn't really much in the way of MIPS abuse (in fact except for poorly coded services, it idles almost all the time in most games).
The behavior of the jr-jr case in the first link posted, is how I implemented it for the VU's... except I only handled 'case 1' and 'case 2', i don't think it behaves correctly on 'case 3' and 'case 4' (when you have JR-JR jumping to other JR's...)
there actually are 1~2 games that seem to do this (not with JR's, but with a mixture of Branches and Conditional branches which is even harder) but they appear to be running fine by my implementation which ignores the 3rd branch iirc...

If I was to design a new dynarec, I'd try to code it in a way that would make handling all those nasty cases easier, since its a lot harder/messier to do it after the dynarec has been written.

I haven't looked into detail on the difficulties of the MIPs architecture for emulation; from what you're writing about, it looks like the load-delays require you to wait 1-cycle before using a register on loads from memory. Is this correct?
Also if implementing this load-delay behavior before a branch broke a game, could it be that conditional branches get the current value?
Or it is possible it has some quirky behavior like the VU's, where the branch will get the value the reg had 4-cycles before the branch instruction.
http://forums.pcsx2.net/Thread-blog-PS2-...#pid103928

Aside from that problem, is there anything else that would be considered a non-interlocked hazard? (i.e. any other cases that would read old values instead of stalling)
The non-interlocked stuff is what makes recompiler design harder and more complex. The more stuff that's interlocked, the easier it is to do things since you don't have to worry as much about the pipeline state and order of operations.
Thanks for the information. Smile

Something I regret not doing when working on pcsx1 + spu1 is loading up the interpreter with tons of MessageBox. To trap + notify all the stupid things one couldn't expect to happen.

But that's all in my past now.


Quote:it idles almost all the time in most games

Wow. I imagined game programmers finding ways to offload non-critical slave chores to eke out some more cycles.

====================================
====================================

A few more queries for you guys if you don't mind sharing some more of your time. Wink

1) Mdec.cpp

....? Is this for PS1-backwards compatibility? Was kinda surprised to see this in the pcsx2 core.



2) IopDma.cpp

This one I'm more interested in. We've had a game or two start a DMA transfer. Mid-way through it would turn off the CHCR DMA bit before it could finish.

ex. Heart of Darkness (stage 1,6)

It would start a DMA0. Then DMA1 to get the MDEC-out data. Mid-way through, the game just (HW_DMA1_CHCR &= ~0x01000000) to us.


We originally had just this when the DMA timing finished:
Code:
HW_DMA1_CHCR &= ~0x01000000;
psxDmaInterrupt(1);

which would cause a deadly hang. It became fixed when I changed it to this:
Code:
if( HW_DMA1_CHCR & 0x01000000 ) {
  HW_DMA1_CHCR &= ~0x01000000;
  psxDmaInterrupt(1);
}

It's not in pcsxr (devs consider it an unverified fix) so I just included it in my ePSXe_shark fixes.

I think this applies to all dma interrupts though.



2a) psxDma3/6

We had a few games which rely on non-zero dma timing. Policenauts + Thousand Arms require some SPU DMA4 time to get things done (we count the # word blocks as our rough time).

Just happened to notice this for a few IOP DMAs and just wanted to bring this up.



3) CDVD/Cdrom.cpp

.....Is this used for non-DVD reading? There's a good deal we've learned that can be ported over here in small bite-sized chunks.

I imagine there aren't many normal CD PS2 games though. But I remember a lot of the scrooge'y details if you need the important ones patched up.



4) DMA content changing

This is another badboy from PS1. Vampire Hunter D would upload a large fully blank list to the GPU via DMA2. About 60-70% the way through, it would modify the last 10% of the list with menu graphics data.

We just send the whole thing at once and so never detect this. There was about 3-4 solutions proposed but no one would agree so it's left broken.
(I still wonder what the ePSXe devs did with this though)


I remember cottonvibes mentioning lots of weird VU problems so I bring this up.



4a) Infinite transfer loops?

I don't think this applies here but a curiosity item. We have dma linked lists where the GPU chip follows a chain of pointers around.

Tekken 3 sets up an infinite chain during battle replays. If the GPU plugin can't detect this condition, the emu locks up (1-shot processing at dma start).



5) GPU vram transfer timing?

Was looking around all the PS2 GPU units for something about this but couldn't locate. It's part of our ps1 voodoo timing hack notes.

Final Fantasy 4 / Rebel Assault 2 / Einhander would use 'memcpy transfer to/from vram' commands as part of the ps1 dma2 transfer. Large chunks of data get copied over the bus.

Because we can't talk directly to the gpu plugin about how much data was transferred per memcpy command, our dma times end up too short for the above games. Results in black screens, missing or partial backdrops.

======================================
======================================

Quote:I haven't looked into detail on the difficulties of the MIPs architecture for emulation; from what you're writing about, it looks like the load-delays require you to wait 1-cycle before using a register on loads from memory. Is this correct?

Normally yes. Load delays = lb/lh/lw/lbu/lhu + mfc2 (mfc0?).
(edit: oops. These are our branch delay slots - beq/bne/blez/bgtz/bltz/bgez/(..) - should separate these)

That Xenogears battle problem with lbu - beq is the only exception I'm aware of (found by accident). And I think anytime lwl/lwr is used the cpu just waits.

Can't believe someone managed to use case 3-4 branches though. Geez.



Quote:Also if implementing this load-delay behavior before a branch broke a game, could it be that conditional branches get the current value?
Or it is possible it has some quirky behavior like the VU's, where the branch will get the value the reg had 4-cycles before the branch instruction.
http://forums.pcsx2.net/Thread-blog-PS2-...#pid103928

I think we have a max 1-latency. So nothing like the VU 4-cycle behavior. Never seen anything where the value shows up 2 cycles later.

I've wondered about the 'conditional branches get the current value' thing. Didn't have anyone to stress test it though.

lbu v0 / beq v0 (v0 = current value at beq) - that's a must for Xenogears the way the Square did it.


Quote:Aside from that problem, is there anything else that would be considered a non-interlocked hazard? (i.e. any other cases that would read old values instead of stalling)

The non-interlocked stuff is what makes recompiler design harder and more complex. The more stuff that's interlocked, the easier it is to do things since you don't have to worry as much about the pipeline state and order of operations.

There's a famous example: Tekken 2. It does beq - mfc2 v0 - mtc2 v0.

The mtc2 uses the old value (writes to GTE). Otherwise the geometry gets blown apart (triangles extending everywhere).



There's the Skullmonkeys (ePSXe) hack.
Code:
800131f0 : 3C02800B  LUI     00000001 (v0), 800b (32779),
800131f4 : 2442E448  ADDIU   800b0000 (v0), 800b0000 (v0), e448 (58440),
800131f8 : 03E00008  JR      80013280 (ra),

; HACK = THIS DELAY SLOT IS NOT RUN!!
; - checks opcode at 80013280
; - reg_src = reg_dst --> r/w delay
;   DO NOT RUN THIS BRANCH DELAY SLOT

800131fc : 93820006  LBU     800131fc (v0), 0006 (800a59b0 (gp)) [800a59b6]
80013200 : 27BDFFE8  ADDIU   8009f3e4 (sp), 8009f3e4 (sp), ffe8 (65512),


(ret slot)
80013280 : 2442000F  ADDIU   800ae448 (v0), 800ae448 (v0), 000f (15),
80013284 : 2403FFF0  ADDIU   0000001b (v1), 00000000 (r0), fff0 (65520),
80013288 : 00431024  AND     800ae457 (v0), 800ae457 (v0), fffffff0 (v1),
8001328c : 3C010001  LUI     800a0000 (at), 0001 (1),

80013280 = use old v0 value (not lbu). This is a game crasher.



Another favorite of mine is Lode Runner.
Code:
8007d758 : 3C028012  LUI     000ded00 (v0), 8012 (32786),
8007d75c : 24429300  ADDIU   80120000 (v0), 80120000 (v0), 9300 (37632),

; (*******************) === PROBLEM OPCODE
8007d760 : 03E00008  JR      800134bc (ra),
; (JAL label calls here)
8007d764 : 1080001D  BEQ     801f8008 (a0), 00000000 (r0), 8007d7dc,


(ret slot)
800134bc : AFC20018  SW      80119300 (v0), 0018 (801fffd8 (fp)) [801ffff0]
800134c0 : 3C048001  LUI     801f8008 (a0), 8001 (32769),

We have no idea what the correct behavior is if that beq is taken. :o

It's known to crash the r3000 dynarec though.



I think those are all the famous ones.

There's the pcsx1 dynarec bugs
- Ram changes didn't clear the dynarec area (ex. recSB, recSH, ..)
- Didn't clear enough dynarec areas (something with Pitfall 3D crash)
- Dma changes didn't clear dynarec cache (ex. dma6)
- On 16-bit writes like $8002, we didn't clear the correct dynarec area ($8002-8005 was cleared = crash)
- Others not listed

So that's why my confidence isn't too high with pcsx1 dynarec. Not to mention it's inefficient when clearing and slow about it (over does it).

I'm a near-true dummy though when it comes to dynarec. The micro-management is way too much to handle.



You could talk to notaz + Ari64 also (http://www.gp32x.com/board/index.php?/to...sx-rearmed) - they're well versed in MIPS.


Wrote a lot more than I expected. That's the large bulk of it though. :o

But I feel bad for the PCSX2 team - you have to deal with so many weird + HARD things. Fixing PS1 bugs drove me up the wall (+ fixing ePSXe 170) but those pale in comparison.
Sneak in a few more questions. :heh:

1) IPUDma.cpp

For the PS1 MDEC, we had DMA0 = MDEC-in + DMA1 = MDEC-out. Which looks awfully similar to IPU0 + IPU1 dma.

Normally games do
- DMA0
- DMA1
- DMA1
(..)
- DMA1 (DMA0 source still not fully drained)


(Fear Effect 2)
The art gallery would throw infinite DMA1 until DMA0 was fully drained of input data. This created a DMA1 stall which alerted the game that the 24-bit bitmap was done decoding.

Couldn't understand whether the IPU logic handled this (more complex than ours).


(Area 51 / Maximum Force)
These would DMA1 first. Then send the DMA0 later. Emu didn't detect this before and started trashing the memory until DMA0 was sent. We now stall DMA1 until DMA0 comes in.


(Novastorm / Eggs of Steel)
This would DMA1. But at the end of each vertical strip, DMA1 would leave the FIFO half-full. The next DMA1 would transfer this leftover data first. Then fill up the FIFO with new data.

I've seen a picture of the (PCSX2 0.9.8) Guncom 2 FMV and it looked like a jigsaw puzzle which reminded me partly of these 2 PS1 games.


2) r3000 strangeness

Two twilight problems we don't understand the behavior to. Affects your cores too?


Gameshark Sampler
Code:
801239a4 : 94620000  LHU     00000000 (v0), 0000 (1f801070 (v1)) [1f801070]
801239a8 : 00000000  NOP    
801239ac : 30420001  ANDI    00000000 (v0), 00000000 (v0), 0001 (1),
801239b0 : 1444FFFC  BNE     00000000 (v0), 00000001 (a0), 801239a4,
801239b4 : 00000000  NOP

This checks the VBlank pin. What makes this unusual is that VBlank masking is off (VBlank IRQs are allowed).

So emu does vblank interrupt handling. And never lets the program see the vblank pin. Infinite loop.

(I think we stall vblank interrupt handling a few cycles, which probably isn't a good idea but it works)



Crash Bandicoot 2 / Hokuto no Ken / BIOS / (few others)
Code:
; Check GTE function on exception return
00000ccc : 8C620000  LW      00000000 (v0), 0000 (800426a0 (v1)) [800426a0]
00000cd0 : 00000000  NOP    
00000cd4 : 00021602  SRL     4a280030 (v0), 4a280030 (v0), 18 (24),
00000cd8 : 304200FE  ANDI    0000004a (v0), 0000004a (v0), 00fe (254),
00000cdc : 2401004A  ADDIU   ff400d42 (at), 00000000 (r0), 004a (74),
00000ce0 : 14410002  BNE     0000004a (v0), 0000004a (at), 00000cec,
00000ce4 : 00000000  NOP    

; creates Crash 2 bug (flicker triangles) - skips GTE opcode
00000ce8 : 20630004  ADDI    800426a0 (v1), 800426a0 (v1), 0004 (4),

When the cpu throws an interrupt on a GTE (COP2) instruction, most (all?) games seem to skip the instruction on rfe. Creates grey flickering triangles or missing textures.

Worse is that a (MAME?) dev reported that doing this in the BIOS region causes a crash.


I think we block the interrupt until it clears the GTE area. Then throw it.
(may not be a wise idea either but works)



3) Counters.cpp

I saw one of Jake Stine's notes about vSyncInfoCalc (MGS3 vsync /4 vs /4).

We had a similar problem with InuYasha (used PS1 VSync / 2 hack before)
- InuYasha = needed longer CdlPause time + Interlace bit emulation (game would burn a frame while interlace was on)


(_cpuTestTarget)
Blade_Arma discovered that when the target is reached, it modifies the root counter flags (ps2 modeflag): RcUnknown10 ($0400) + RcCountEqTarget ($0800).

I think this fixed Lifeforce and a few other games like Final Fantasy also.


(_cpuTestOverflow)
Same for this one: RcUnknown10 ($0400) + RcOverflow ($1000)


(HSync counts per VBlank)
We originally had 262 HSyncs per VSync (NTSC). This broke Wipeout XL (menu model rotation).

Setting to 263 fixed the models. I don't know how much PCSX2 is counting but you might want to be aware of this too.


(IRQ regeneration)
Was looking for this: RcIrqRegenerate = 0x0040. When modeflag set, it keeps firing IRQs. When off, it fires once.

Can't tell if PCSX2 does this or not.



3) SIO.cpp

(Memcard detection)
This one is interesting to me. Seems different than PS1 (ex. Dragon Warrior 7 / Star Ocean 2 / Valkyrie Profile / Metal Gear Solid / Lifeforce Tenka).

When the memcard is missing or stopped, $81 --> $ff (not there = Lifeforce Tenka).

When the memcard is inserted:
81 ---> 00 / 04 / 5a / 5d
81 ---> 00 / 08 / 5a / 5d
81 ---> 00 / 00 / 5a / 5d

$04 -> $08 -> $00 tells the above games that the card is starting -> ready. Lets us switch memcards correctly.

Otherwise we save the wrong data / load wrong data.
(not in pcsxr, just ePSXe_shark builds)



That's mostly it for my questions I think. Smile