..:: PCSX2 Forums ::..

Full Version: [blog] Threading VU1
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
THANKS
now i have a purpose why im buying a i7 now hehe
thanks again PCSX2
(11-02-2011, 06:30 AM)CobraSA Wrote: [ -> ]Does it only work with software mode with 3 threads or more, or also with hardware more (where you can't choose threading)

It has nothing to do with GSdx it's a PCSX2 core feature, so if it's on it works.
"The remaining problems of VU1 threading are handling the cases where the EE or other processors like VU0 ever need to read back from VU1. This happens very rarely, but in this situation all we need to do is call a “Wait on VU1 Thread” function immediately before trying to read from VU1 memory. If it happened often it would ruin any chance of speedups threading VU1 had, but luckily it rarely happens. The EE actually works very closely with VU0 as opposed to VU1, and so threading VU0 would not be a speedup because the EE would end up reading back from it too much. The good thing is that VU0 is rarely a bottleneck in games; this is evident because you can usually run VU0 interpreters and get a minimal speed-hit (if you try to run vu1 interpreters on the other hand, your speed will usually crawl to ~2fps)."

I've found a game I believe uses the VU0 primarily over the VU1! I could be wrong, but check out Primal. I experimented by changing VU0 and VU1 settings a bit, switching them from microVU Recompiler to interpreter each separately. Switching the VU1 had an impact, a frame drop from 40-41fps to 10.5fps. But switching the VU0 into the interpreter had a massive impact, frame rate dropped like a rock down to 1.5fps.

I have the MTVU hack enabled right now and when I run Primal, in normal gameplay, the EE is constantly at 99-100%, whereas the VU never goes above 15% and goes as low as 5%. Enabling and disabling MTVU has a big impact on other games for my computer, but makes little to no difference in Primal.

Is it possible this game basically does what you describe but in reverse, with the VU0 used primarily over the VU1? In this case is there room for a MTVU0 hack in addition to our MTVU1 hack? Or maybe some kind of room for VU0 optimisations or hacks, etc?

I know the EE and VU0 work closely in most cases, whereas the VU1 is "used for coordinate, matrix, and vector calculations games need to do for 3d graphics", but is it possible this a game that does this in the reverse? Or uses the VU0 in a way that's unique?


.. or I could be talking rubbish, just throwing that out there. Anything to get a slighter higher frame rate on Primal would be awesome. Tongue I'm getting it to run slightly better each day, I can get about 35-42fps out of 50fps now. Getting closer! But it definitely is a game bogged down by the CPU, the GS can handle the game at anything from native to x6 resolution (game looks FANTASTIC at x6) with no impact on fps, it's just the EE/VU0 that seems to be the bottleneck for Primal.

My CPU for reference is an AMD Phenom II x6 1100T @ 3.7GHz.
the VU0 is a unit like the VU1, however it is also used as a Co-Processor, hence the close link to the EE unit, which is what prevents us from seperating it off. Games like Primal and Tekken Tag do tend to use it quite extremely, most likely to get quick easy access to the information from the EE. Its highly possible the FPU (COP0) unit on the EE is used in tandem with it for generating the vertex information.
(08-21-2012, 10:37 AM)refraction Wrote: [ -> ]the VU0 is a unit like the VU1, however it is also used as a Co-Processor, hence the close link to the EE unit, which is what prevents us from seperating it off. Games like Primal and Tekken Tag do tend to use it quite extremely, most likely to get quick easy access to the information from the EE. Its highly possible the FPU (COP0) unit on the EE is used in tandem with it for generating the vertex information.
Hm, I see. Any theories or ideas on what could be given a speed improvement there? Some optimisation for the FPU perhaps? Would the VU0 be less coupled to the EE when it's doing these VU1 style operations? Would there be a way of detecting when that's happening and activating a VU1 style multithreading, then turning it off when it's back to doing Co-Processor responsibilities?

I'll admit, I know next to nothing about PS2 emulation, but I do have experience programming, so forgive me if my ideas sound silly, just trying to transfer logic from the stuff I know to this, just throwing out ideas in the hope that someone interprets something grand and wonderful from my nonsense. X3
Unfortunately the coupling isnt software restricted, in hardware the VU0 is linked directly in to the EE so they share functions/code in the emulator, overhead from threading the VU0 would be so extreme and difficult to time properly it would most likely give a negative effect from threading it. It is also more likely that the coupling is taken advantage of when the VU0 is used in this manner rather than the opposite. There is nothing we can really do on this front. we would have more luck multithreading everything else away from the EE/VU0 but it is unknown how much of a difference that would really make, if any.
Ahh, I see then. Well that rules out threading the VU0. Well, was curious anyway, thanks for the answers.
The precise reason you can't get benefits from threading the VU0 is that it can't do anything with the results of calculations by itself, all data has to go back. With the VU1 you can say "okay, you have a job for me? Let's pretend I did it just now so you can carry on while this worker thread does nothing you should worry about over there" much of the time.
Yeah I get that, fair enough. That's a shame. Out of curiosity, back onto the topic of VU1 threading, how did you handle the data being passed back and forth? Is it memory shared between both threads that's just locked before use, or is literally passed back and forth between the threads with the EE thread and VU1 thread having their own separate memory?
There is a ring buffer which the data (and threading instructions) is placed in to, the other thread watches this buffer waiting for commands and information, once it's in there it starts processing it.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15