(01-09-2018, 06:36 PM)Atomic83 Wrote: And if you use the Software render, you can use up to 16 cores if I remember the limit.
Splitting a process into too many threads can actually cause more performance degradation than if you use fewer threads in the first place. On a single core, your OS uses a scheduler to swap processes in and out of the execution pipeline. The scheduler itself takes time to do the swapping. If you have too many things going on at once, the scheduler ends up taking a fair amount of CPU time per second just juggling processes instead of actually executing their code. This can even reach a point where the scheduler exceeds actual processes in CPU time. If you ever used a desktop with an old single core Pentium, you know what happens when you have enough RAM but too much running. That sucker still gets
slooooow.
And we can extend this logic to multithreading.
You can, theoretically, run a renderer in 16 threads. If you were rendering a movie, this would make sense. Each thread could be assigned a 30 second chunk of video, grind it out, output the result to a memory address and the main thread picks it up and assembles the output file with it. This makes sense, because precise timing isn't necessary. The main thread can read an ID number when it sees a memory block marked as done and slide it into the right slot of the final file. In a game, especially an emulator,
timing is your god.
The PS2 has several major components, but the ones that PCSX2 truly highlights are the EE, IOP, GS and VUs. Emotion Engine, Input Output Processor, Graphics Synthesizer, Vector Units. In a nutshell, the EE is the main processor, IOP is a mini processor dedicated to memory cards controllers and anything else that plugs into those ports, GS is a GPU and VUs are dedicated vector math processors. You have all these processors, and they can only execute one thing when all the others have prepared the information they need to do their job.
An example: GS wants to put out the next video frame for a loading screen with a progress bar. GS has to get vector information from the VUs, which have to get information from the EE, which has to work with the IOP to do the actual load. For a loading bar, nevermind the rest of the frame. And while one unit is running and trying to give the information to the other, said other has to either find something to do, or just wait and waste time. When people talk about Ratchet and Clank having EE/VU starvation problems, this is what it is. One is sitting waiting for data, doing nothing, and the other is busting its balls to prepare it fast enough that the frame goes out in time.
No idea if this is a worthwhile contribution to the discussion but I figure why not.