Thanks for the information.
Something I regret not doing when working on pcsx1 + spu1 is loading up the interpreter with tons of MessageBox. To trap + notify all the stupid things one couldn't expect to happen.
But that's all in my past now.
Quote:it idles almost all the time in most games
Wow. I imagined game programmers finding ways to offload non-critical slave chores to eke out some more cycles.
====================================
====================================
A few more queries for you guys if you don't mind sharing some more of your time.
1) Mdec.cpp
....? Is this for PS1-backwards compatibility? Was kinda surprised to see this in the pcsx2 core.
2) IopDma.cpp
This one I'm more interested in. We've had a game or two start a DMA transfer. Mid-way through it would turn off the CHCR DMA bit before it could finish.
ex. Heart of Darkness (stage 1,6)
It would start a DMA0. Then DMA1 to get the MDEC-out data. Mid-way through, the game just (HW_DMA1_CHCR &= ~0x01000000) to us.
We originally had just this when the DMA timing finished:
Code:
HW_DMA1_CHCR &= ~0x01000000;
psxDmaInterrupt(1);
which would cause a deadly hang. It became fixed when I changed it to this:
Code:
if( HW_DMA1_CHCR & 0x01000000 ) {
HW_DMA1_CHCR &= ~0x01000000;
psxDmaInterrupt(1);
}
It's not in pcsxr (devs consider it an unverified fix) so I just included it in my ePSXe_shark fixes.
I think this applies to all dma interrupts though.
2a) psxDma3/6
We had a few games which rely on non-zero dma timing. Policenauts + Thousand Arms require some SPU DMA4 time to get things done (we count the # word blocks as our rough time).
Just happened to notice this for a few IOP DMAs and just wanted to bring this up.
3) CDVD/Cdrom.cpp
.....Is this used for non-DVD reading? There's a good deal we've learned that can be ported over here in small bite-sized chunks.
I imagine there aren't many normal CD PS2 games though. But I remember a lot of the scrooge'y details if you need the important ones patched up.
4) DMA content changing
This is another badboy from PS1. Vampire Hunter D would upload a large fully blank list to the GPU via DMA2. About 60-70% the way through, it would modify the last 10% of the list with menu graphics data.
We just send the whole thing at once and so never detect this. There was about 3-4 solutions proposed but no one would agree so it's left broken.
(I still wonder what the ePSXe devs did with this though)
I remember cottonvibes mentioning lots of weird VU problems so I bring this up.
4a) Infinite transfer loops?
I don't think this applies here but a curiosity item. We have dma linked lists where the GPU chip follows a chain of pointers around.
Tekken 3 sets up an infinite chain during battle replays. If the GPU plugin can't detect this condition, the emu locks up (1-shot processing at dma start).
5) GPU vram transfer timing?
Was looking around all the PS2 GPU units for something about this but couldn't locate. It's part of our ps1 voodoo timing hack notes.
Final Fantasy 4 / Rebel Assault 2 / Einhander would use 'memcpy transfer to/from vram' commands as part of the ps1 dma2 transfer. Large chunks of data get copied over the bus.
Because we can't talk directly to the gpu plugin about how much data was transferred per memcpy command, our dma times end up too short for the above games. Results in black screens, missing or partial backdrops.
======================================
======================================
Quote:I haven't looked into detail on the difficulties of the MIPs architecture for emulation; from what you're writing about, it looks like the load-delays require you to wait 1-cycle before using a register on loads from memory. Is this correct?
Normally yes. Load delays = lb/lh/lw/lbu/lhu + mfc2 (mfc0?).
(edit: oops. These are our branch delay slots - beq/bne/blez/bgtz/bltz/bgez/(..) - should separate these)
That Xenogears battle problem with lbu - beq is the only exception I'm aware of (found by accident). And I think anytime lwl/lwr is used the cpu just waits.
Can't believe someone managed to use case 3-4 branches though. Geez.
Quote:Also if implementing this load-delay behavior before a branch broke a game, could it be that conditional branches get the current value?
Or it is possible it has some quirky behavior like the VU's, where the branch will get the value the reg had 4-cycles before the branch instruction.
http://forums.pcsx2.net/Thread-blog-PS2-...#pid103928
I think we have a max 1-latency. So nothing like the VU 4-cycle behavior. Never seen anything where the value shows up 2 cycles later.
I've wondered about the 'conditional branches get the current value' thing. Didn't have anyone to stress test it though.
lbu v0 / beq v0 (v0 = current value at beq) - that's a must for Xenogears the way the Square did it.
Quote:Aside from that problem, is there anything else that would be considered a non-interlocked hazard? (i.e. any other cases that would read old values instead of stalling)
The non-interlocked stuff is what makes recompiler design harder and more complex. The more stuff that's interlocked, the easier it is to do things since you don't have to worry as much about the pipeline state and order of operations.
There's a famous example: Tekken 2. It does beq - mfc2 v0 - mtc2 v0.
The mtc2 uses the old value (writes to GTE). Otherwise the geometry gets blown apart (triangles extending everywhere).
There's the Skullmonkeys (ePSXe) hack.
Code:
800131f0 : 3C02800B LUI 00000001 (v0), 800b (32779),
800131f4 : 2442E448 ADDIU 800b0000 (v0), 800b0000 (v0), e448 (58440),
800131f8 : 03E00008 JR 80013280 (ra),
; HACK = THIS DELAY SLOT IS NOT RUN!!
; - checks opcode at 80013280
; - reg_src = reg_dst --> r/w delay
; DO NOT RUN THIS BRANCH DELAY SLOT
800131fc : 93820006 LBU 800131fc (v0), 0006 (800a59b0 (gp)) [800a59b6]
80013200 : 27BDFFE8 ADDIU 8009f3e4 (sp), 8009f3e4 (sp), ffe8 (65512),
(ret slot)
80013280 : 2442000F ADDIU 800ae448 (v0), 800ae448 (v0), 000f (15),
80013284 : 2403FFF0 ADDIU 0000001b (v1), 00000000 (r0), fff0 (65520),
80013288 : 00431024 AND 800ae457 (v0), 800ae457 (v0), fffffff0 (v1),
8001328c : 3C010001 LUI 800a0000 (at), 0001 (1),
80013280 = use old v0 value (not lbu). This is a game crasher.
Another favorite of mine is Lode Runner.
Code:
8007d758 : 3C028012 LUI 000ded00 (v0), 8012 (32786),
8007d75c : 24429300 ADDIU 80120000 (v0), 80120000 (v0), 9300 (37632),
; (*******************) === PROBLEM OPCODE
8007d760 : 03E00008 JR 800134bc (ra),
; (JAL label calls here)
8007d764 : 1080001D BEQ 801f8008 (a0), 00000000 (r0), 8007d7dc,
(ret slot)
800134bc : AFC20018 SW 80119300 (v0), 0018 (801fffd8 (fp)) [801ffff0]
800134c0 : 3C048001 LUI 801f8008 (a0), 8001 (32769),
We have no idea what the correct behavior is if that beq is taken. :o
It's known to crash the r3000 dynarec though.
I think those are all the famous ones.
There's the pcsx1 dynarec bugs
- Ram changes didn't clear the dynarec area (ex. recSB, recSH, ..)
- Didn't clear enough dynarec areas (something with Pitfall 3D crash)
- Dma changes didn't clear dynarec cache (ex. dma6)
- On 16-bit writes like $8002, we didn't clear the correct dynarec area ($8002-8005 was cleared = crash)
- Others not listed
So that's why my confidence isn't too high with pcsx1 dynarec. Not to mention it's inefficient when clearing and slow about it (over does it).
I'm a near-true dummy though when it comes to dynarec. The micro-management is way too much to handle.
You could talk to notaz + Ari64 also (
http://www.gp32x.com/board/index.php?/to...sx-rearmed) - they're well versed in MIPS.
Wrote a lot more than I expected. That's the large bulk of it though. :o
But I feel bad for the PCSX2 team - you have to deal with so many weird + HARD things. Fixing PS1 bugs drove me up the wall (+ fixing ePSXe 170) but those pale in comparison.