Support for 3 cores properly and Vulkan
#11
Yeah, I don't think he's really saying they're the same thing, or anything like that.
#StopRNG
Reply

Sponsored links

#12
(01-10-2018, 01:38 AM)pandubz Wrote: Splitting a process into too many threads can actually cause more performance degradation than if you use fewer threads in the first place. On a single core, your OS uses a scheduler to swap processes in and out of the execution pipeline. The scheduler itself takes time to do the swapping. If you have too many things going on at once, the scheduler ends up taking a fair amount of CPU time per second just juggling processes instead of actually executing their code. This can even reach a point where the scheduler exceeds actual processes in CPU time. If you ever used a desktop with an old single core Pentium, you know what happens when you have enough RAM but too much running. That sucker still gets slooooow.

And we can extend this logic to multithreading.

You can, theoretically, run a renderer in 16 threads. If you were rendering a movie, this would make sense. Each thread could be assigned a 30 second chunk of video, grind it out, output the result to a memory address and the main thread picks it up and assembles the output file with it. This makes sense, because precise timing isn't necessary. The main thread can read an ID number when it sees a memory block marked as done and slide it into the right slot of the final file. In a game, especially an emulator, timing is your god.

The PS2 has several major components, but the ones that PCSX2 truly highlights are the EE, IOP, GS and VUs. Emotion Engine, Input Output Processor, Graphics Synthesizer, Vector Units. In a nutshell, the EE is the main processor, IOP is a mini processor dedicated to memory cards controllers and anything else that plugs into those ports, GS is a GPU and VUs are dedicated vector math processors. You have all these processors, and they can only execute one thing when all the others have prepared the information they need to do their job.

An example: GS wants to put out the next video frame for a loading screen with a progress bar. GS has to get vector information from the VUs, which have to get information from the EE, which has to work with the IOP to do the actual load. For a loading bar, nevermind the rest of the frame. And while one unit is running and trying to give the information to the other, said other has to either find something to do, or just wait and waste time. When people talk about Ratchet and Clank having EE/VU starvation problems, this is what it is. One is sitting waiting for data, doing nothing, and the other is busting its balls to prepare it fast enough that the frame goes out in time.

No idea if this is a worthwhile contribution to the discussion but I figure why not.
Yes you're right. GSdx sw fragment processing support more threads. But it doesn't scale well. We could make it scale further with spinlock but 8 cores will become a minimum to run PCSX2... And spinlock means less turbo on the CPU and more throttling so it could be counter-productive too.

(01-10-2018, 04:12 AM)dabore Wrote: mtvu is a totally different thing then what vulkan does. the idea of mtvu is to enable parallel execution of vu code while using the main thread to transfer data chunks to the gif incase it can be done without to much path signaling between the 2 processes. that option will not be obsoleted by vulkan.

vulkan is entirely a graphics api and inside of gsdx. i've been looking into this vulkan lately. i haven't coded anything on it really, but it theoretically uses 2 threads automaticly. 1 thread for input processing and 1 for execution/sending the commands and payloads. i have no idea how much gsdx would benefit from it tho. depends on the amounts and size of data. more small data package and signal exchange overhead will totally obliterate the idea of multi threading the gs.
Well OpenGL drivers are multithreaded and likely Dx driver too. Time spends in GL call in GS thread is close of 0. However issue arise when you need to wait a result from the GPU.
Reply
#13
(01-10-2018, 04:22 PM)gregory Wrote: Well OpenGL drivers are multithreaded and likely Dx driver too. Time spends in GL call in GS thread is close of 0. However issue arise when you need to wait a result from the GPU.

yeh. i figured, the most performant is when you know where everything gotta go. just open a command buffer per rendertarget. spew it, and done. ofc you gotta wait for drawcall finishes when switching and compositing targets, and when you need data (to be transfered) from the gpu. far from that command assembly is fast. you don't wait for the target to switch. you can fill the next box with target and draw commands while waiting.
Reply




Users browsing this thread: 1 Guest(s)