Game rebuilding (Or: High-level emulation)
#11
Quote:Yes, data collected from experiments says a lot more than speculations. However, to be very honest, I won't even be able to compare it. About a year ago (or more than that), I've attempted to compile PCSX2, installed a natively 32-bit distro just to get around the issues of cross-compilation, but got stuck on an annoying dependency ("nvidia-cg-toolkit". I didn't even had a NVidia video card). Someone else managed to compile it and made an installable RPM, so I just installed the readily-available binary. And it was awful. The input lag was very noticeable (A button is pressed and two seconds later the invoked menu appears). I've tried recently to compile it again, but cmake's output saying "this or that is missing" invokes all the bad memories from those days and defeats my willingness to keep going.
Nvidia CG toolkit is a shading language/compiler such as HLSL or GLSL. The target can be any brand (intel/amd/ps3?/others). Same thing for Intel compiler, it can run on AMD hardware. Mac compiler can also run on linux. Etc...
The dependency was removed. And it is normally easier to compile PCSX2. Anyway I think you can get binary from travis or ppa or debian.

You don't need to compare the result with PCSX2 directly. You can simulate both propagation method on your IR. One will stop at branch, the others will be complete.

Quote:Some entries on the developer's blog ("C++ exceptions can be an optimization"), game-specific hacks in the source code files, the windows-centric and nvidia-centric development style, made me deeply untrust the quality of PCSX2's source code.
There is no Windows/Intel/Nvidia-centric development style. It isn't our fault if AMD drivers are slower/buggier. Currently there are dedicated code for AMD's driver on GSdx. One issue was reported 3 years ago (with a testcase), and we will get a fix in a future driver (and I hope for old card too).

There are a couple of hacks/gamefix in the emulator and in GSdx (the latter just drop unsupported format) but we don't have tons of hacks. Sometimes there are bad, sometimes there are the only good solution (full tlb support without speed penalty).

Quote:So if the emulator isn't working as well as I expect it to work, I'm biased towards blaming the software rather than blaming something else like the hardware. And, even if I blame the software, the questions remains: "Where in the software?" "Is the EE recompilation just fine, but the VU recompilation is the true problem?" "Or is because the JIT compiler has to be called again and again because the instruction memory is constantly changing?" (Thus dynarec might make more sense than anything that needs to read the bytecode throughly before actually translating anything)

(And, please forgive me for the oranges-and-apples comparison here, but a very affordable motherboard with integrated Intel Celeron 1037u can run games on Dolphin with lagless video playback, on Linux. The recommended requirements of PCSX2 would cost more than tenfold that).
Recompiler misses various optimization at least on the EE side. It can be improved, and it will be improved if we switch to 64 bits. But the real and biggest issue is the emulation of float. PS2 float isn't the same as X86 float. Numbers encoding is different. Rounding is different. The accurate solution will be to do a software FPU emulation. But it is very slow. And VU1 is 4 ways SIMD (aka 128 bits) at 150MHz. Issue that you have to emulate the EE@300MHz (MIPS3 int/FPU), 2 VU @150MHZ (purely float I think) + 1 IOP (MIPS1 @33MHz). Then PS2 is sensible to delay, some games are likely not thread safe... Sound emulation need to be really accurate delay wise. I don't know well the core but it is a multi dimensional problem.

I could also speak about the GS complexity but that it is another topic Wink

Quote:Yes, I've been reading your commit comments in order to grasp what is going on, but I'm keeping distance from PCSX2's source code. When I was looking through the source code folders in order to find the recompiler in order to understand how much work it would require to move PCSX2 from x86 to x64 (or "noarch"), I found the "pcsx2/x86" folder. And the transformation of "EE bytecode" into "x86 bytecode" is so tightly coupled that uncoupling these two would be doing something from scratch.
On 64 bits topic, quantity of code is big but doable (at least EE/IOP). The trick is to not rush it (and to have free time Wink )

Quote:And, related to the loop problem you have presented:
.....
Notice how every variable on the source code above is assigned only once. There is no "core of the loop" or "before loop" on single static assignment. There are assignments and functions calling other functions. Haskell, by the way, is a single static assignment language (there is no mutability, thus a variable can be assigned only once). Yet it's possible to develop programs with it.

Have you developed something with a functional programming language, Gregory? Haskell has no do-while or for-loop. It looks crazy for someone used to imperative languages, where loops are used everywhere. And, all of a sudden, the words you would need to express your thoughts are gone. If you have only used imperative languages, functional languages will make your world go upside-down.
If you look at the EE ASM generated by GCC, you will find the same pattern. Actually I used to code some CAML in uni so I know a bit functional programming language. But when you look at the code executed by the CPU, you will find the loop again Wink
Reply

Sponsored links

#12
Weekly commit. The first Haskell program (the one that completely disassembles the ELF file) was completing its execution in a fair amount of time (15 seconds), but the newer code (the one that disassembles base procedures) is taking too long (minutes). So I've restarted to write it in C.

Quote:Recompiler misses various optimization at least on the EE side. It can be improved, and it will be improved if we switch to 64 bits. But the real and biggest issue is the emulation of float. PS2 float isn't the same as X86 float. Numbers encoding is different. Rounding is different. The accurate solution will be to do a software FPU emulation. But it is very slow. And VU1 is 4 ways SIMD (aka 128 bits) at 150MHz. Issue that you have to emulate the EE@300MHz (MIPS3 int/FPU), 2 VU @150MHZ (purely float I think) + 1 IOP (MIPS1 @33MHz). Then PS2 is sensible to delay, some games are likely not thread safe... Sound emulation need to be really accurate delay wise. I don't know well the core but it is a multi dimensional problem.

About thread-safety: I've read the source code of drivers (and code used in microcontrollers), and sometimes, the following code pattern is used:
Code:
/* Do thing. */
udelay(n); /* Waits for the hardware to do its thing */
/* Do another thing */
When you mention "the PS2 is sensible to delay", is it because patterns like these are being used for synchronization? When a program waits for a completion signal, it's possible to emulate that completion signal. But this delay-based implicit synchronization would fail on naive recompilation that ignores how many cycles have elapsed. But if delay procedures ("udelay", "mdelay", ...) exists and are used, would it be feasible to pattern-match them and replace them with the correct high-level procedures?

I see the PCSX2 counts cycles for many things:
Code:
./pcsx2/Sif0.cpp:31: sif0.ee.cycles = 0;
./pcsx2/Sif0.cpp:32: sif0.iop.cycles = 0;
./pcsx2/Sif1.cpp:31: sif1.ee.cycles = 0;
./pcsx2/Sif1.cpp:32: sif1.iop.cycles = 0;
./pcsx2/sif2.cpp:31: sif2.ee.cycles = 0;
./pcsx2/sif2.cpp:32: sif2.iop.cycles = 0;
./pcsx2/Gif_Unit.h:288: gsPack.cycles += 2 + gifTag.cycles; // Tag + Len ee-cycles
./pcsx2/x86/microVU_Execute.inl:129: mVU.cycles = cycles;
./pcsx2/x86/microVU_Flags.inl:180: mFC.cycles = 0;

I have read the developer blog entry on SPU2 emulation.
Reply
#13
There is also the CPU cycles. Honestly I don't know what going on. It could just be a bug in a separate place.

Anyway, unlike PC, thread ordering is based on priority array/list. Once the priority 5 is done, you check priority 6. Thread will run until it is blocked (wait/semaphore), finished or interrupted (by an interruption). You have 2 thread pools, one for each CPU (EE/IOP).

Let's imagine, you ask for a transfer in thread 4. You then execute thread 5. You're aware that thread 5 requires the result of thread 4. But you know that delay guarantee that the thread 5 will have its data in time. (kind of implicit sync). If you want to optimize game, you will want to take shortcut.

The previous example was on a CPU but you've also have dependencies of delay between the 2 CPU. When the kernel read a SIF register, it uses the following pattern. It means that timing of nop are important. And IOP must run between EE code. Otherwise, v1 will always be equal to v2, whereas PS2 doesn't have this guarantee.
Code:
start:
v1 = read(reg)
nop
nop
nop
nop
v2 = read(reg)
if (v2 != v1) goto start
else return v1
Reply




Users browsing this thread: 1 Guest(s)