Software mode threading
#11
gamerX1990:

Which CPU do you have? Core2 quad or i7? Please compare games with old and new builds of gsdx, if you find any being significantly slower, I'll try to find that game to see it myself.

To answer your question, why it cannot use everything up to 100%, there are two reasons. If the geometry is complex, the main thread cannot process the incoming data fast enough to feed the drawing threads. Other, when the distribution of the pixels is mainly focusing on one part of the screen, then the threads don't get equal number of tasks and one finish later then the other, making them wait, doing nothing.

Edit:

I made some vtune shots to show how this works. This is from metal gear solid 3's intro, at this place there are 120k vertices, probably the most ever in a ps2 game. On 6 threads it slows down to 23 fps. The top line is the main thread, distributing incoming data to the workers below. Overall it is using 5x cpu time (483% at the selection), there are idle sections when one threads must wait the other, these are unavoidable. In this case workers could reach higher cpu usage if the main was not overloaded, and vica-versa.

   
   

Edit2:

Just want to add some fillrate numbers. The real hardware can do 8 textured pixels in one clock and it's running at 150 MHz, that puts the max fillrate at 1200 mpix, which of course can never be achieved, but at about 2/3 fullness it is still 800 mpix. GSdx can do one average complexity pixel in about 20-30 clocks, at 3 GHz that gives about 100 mpix fillrate per thread. As you add more cores, the main thread becomes the bottleneck, currently there is a limit around 400 mpix.
Reply

Sponsored links

#12
Okay well that makes much more sense now.. I have the i7 920 my specs are:

-Core i7 920 @ 4.01GHz w/HT
-GIGABYTE EX58-UD4P
-OCZ 6GB (3x2GB) PC3-14400 Platinum XTC LV
-Sapphire HD 6970 2GB
-(RAID 0) 2x Seagate 1TB 7200.12 RPM SATA III
-PC Power & Cooling Silencer 750W CF Edition
-Windows 7 x64 SP1

I would have them in my sig if there was an option for a sig I dont have it for some reason. I will try to find the oldest GSdx possible that will run, I tried using different builds of PCSX2/GSdx before and some that are "too old" simply do nothing when you try to run any game, it will just close.
-Core i7 6700k @ 4.5 GHz
-GIGABYTE Z170X-Gaming 5
-G.SKILL Ripjaws V 16GB DDR4 2400 @ 14-13-13-30-1T
-EVGA GTX 970 4GB @ 1380/1853 MHz
-Crucial MX100 512GB, Silicon Power S60 120GB, Toshiba 2TB 7200 RPM
-PC P & C Silencer 750 Quad
-Windows 7 x64

-----

-Core i7 4710MQ
-16GB DDR3 1866
-GTX 965M 4GB @ 1127/1353 MHz
-Mushkin ECO2 240GB, HGST 1TB 7200 RPM
-Windows 7 x64
Reply
#13
nice tech details you posting gabest. 20-30 cyles per pixel looks heavy. why is that so much? swizzle and palette I guess?!
Reply
#14
(01-09-2012, 01:34 AM)Gabest Wrote: On my i7 2600, which is 4x2, 4 to 6 threads gives the most fps, actual cores - 1 was the limit before but not anymore. CPU was maxed in old versions because it was not giving up the threads to let them do other work, they were mostly not doing anything just continuously checking for some flag, task manager registered this as being active. Current builds use syncronization primitives to wait, provided by the operating system, so you see the real cpu usage. There is some task switching overhead (more on xp, less since vista) when a thread goes to sleep and wakes up, this should be compensated by the more uninterrupted work given to the workers and less syncronization, since r4992. There are revisions after big rewrites, which have unfinished code and not running at full speed, the first good one is probably r5036, then today's r5063 should be pretty bug-free as well.


^this is why it didn't work properly back then. xD

(01-09-2012, 05:17 AM)gamerX1990 Wrote: Okay well that makes much more sense now.. I have the i7 920 my specs are:

-Core i7 920 @ 4.01GHz w/HT
-GIGABYTE EX58-UD4P
-OCZ 6GB (3x2GB) PC3-14400 Platinum XTC LV
-Sapphire HD 6970 2GB
-(RAID 0) 2x Seagate 1TB 7200.12 RPM SATA III
-PC Power & Cooling Silencer 750W CF Edition
-Windows 7 x64 SP1

I would have them in my sig if there was an option for a sig I dont have it for some reason. I will try to find the oldest GSdx possible that will run, I tried using different builds of PCSX2/GSdx before and some that are "too old" simply do nothing when you try to run any game, it will just close.

new users have it disabled to prevent signature spammers
Reply
#15
(01-09-2012, 05:48 AM)xstyla Wrote: nice tech details you posting gabest. 20-30 cyles per pixel looks heavy. why is that so much? swizzle and palette I guess?!

GSdx dynamically compiles the code for its own kind of "pixel shader", which usually consists of a few pages worth sse instructions, doing the pixel test, texture lookup, alpha blending, etc. It's already a miriacle that latest cpus can eat that much code so fast. On the other hand the real GS can pipeline this work on its specialized hardware. Using the mentioned tasks as an example:

Code:
1st pixel: ztest texlookup alphablend
2nd pixel:        ztest    texlookup   alphablend
3rd pixel:                   ztest     texlookup    alphablend

To process one pixel you need three clocks in this case, but once we reach the 3rd pixel, a new pixel gets finished in every step. Our cpus also pipeline the sse code, but without the specialized hardware stages it takes way more than 1 clock to do them.
Reply
#16
(01-09-2012, 05:59 PM)Gabest Wrote: GSdx dynamically compiles the code for its own kind of "pixel shader", which usually consists of a few pages worth sse instructions, doing the pixel test, texture lookup, alpha blending, etc. It's already a miriacle that latest cpus can eat that much code so fast. On the other hand the real GS can pipeline this work on its specialized hardware. Using the mentioned tasks as an example:

Code:
1st pixel: ztest texlookup alphablend
2nd pixel:        ztest    texlookup   alphablend
3rd pixel:                   ztest     texlookup    alphablend

To process one pixel you need three clocks in this case, but once we reach the 3rd pixel, a new pixel gets finished in every step. Our cpus also pipeline the sse code, but without the specialized hardware stages it takes way more than 1 clock to do them.

Is it possible to "combine" HW & SW mode at once to get the accuracy of SW mode with the speed of HW mode from GPU? Or tessellation support for DX 11?
-Core i7 6700k @ 4.5 GHz
-GIGABYTE Z170X-Gaming 5
-G.SKILL Ripjaws V 16GB DDR4 2400 @ 14-13-13-30-1T
-EVGA GTX 970 4GB @ 1380/1853 MHz
-Crucial MX100 512GB, Silicon Power S60 120GB, Toshiba 2TB 7200 RPM
-PC P & C Silencer 750 Quad
-Windows 7 x64

-----

-Core i7 4710MQ
-16GB DDR3 1866
-GTX 965M 4GB @ 1127/1353 MHz
-Mushkin ECO2 240GB, HGST 1TB 7200 RPM
-Windows 7 x64
Reply
#17
Tesselation is of no use to the GS emulation, also opencl/directcompute/etc are all just going to slow it down because of the context latencies involved when swapping the shaders tasking.
Reply
#18
I would say that opencl is a subset of opengl. If there is a difference, it will be the removal of hardware accelerated unit of Opencl. Hardware accelerated unit cann't emulate some GS part. In future some stuff could be moved into the shader, slower but more programmable.
Reply
#19
I'm surprised the PS2 can pull off those kind of numbers on a 150mhz gpu with only 4mb of vram. How does it store so many pixels at once with only 4mb of vram?
Reply
#20
GS has max fillrate of 2.4 Gpixel/s which is double of a G4Ti4600. Your mostly limited to particle effects if you want to use it all though.
Reply




Users browsing this thread: 1 Guest(s)