Comparing GSdx SSE2/SSSE3/SSE4.1/AVX/AVX2
#31
Well if that was the case then indeed SSE4 would always top SSSE3, SSSE3 would top SSE2 etc. I doubt it works like that...

Quote:But it shows that the test was at least valid

Calling a test valid because it produced what you expected is not a very good idea Tongue It would be 'valid' if it gave us results on how the flavors generally behave in all games (that's the target of benchmarks usually)
[Image: newsig.jpg]
Reply

Sponsored links

#32
(09-24-2014, 09:35 PM)Bositman Wrote: Well if that was the case then indeed SSE4 would always top SSSE3, SSSE3 would top SSE2 etc. I doubt it works like that...

Well I'll ask Rama, but I'm pretty sure that IS how it works. After all, no CPUs support SSE4.1 that don't support SSE2 and SSSE3. None support SSSE3 that don't support SSE2. etc

Quote:Calling a test valid because it produced what you expected is not a very good idea Tongue It would be 'valid' if it gave us results on how the flavors generally behave in all games (that's the target of benchmarks usually)

No, you misunderstand. I'm calling it valid because it produced a result. It showed a difference. It actually didn't show what I expected, because it showed that with extra threads the instruction set benefits almost disappear.

I'm not saying "it's valid as a test that shows what you can expect in all games in PCSX2" what I am saying is "It's valid as a test because it shows there IS a difference between the performance of the various versions of GSdx"

Games that don't use the optimized codepaths won't benefit in any case, as you have said.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#33
(09-24-2014, 08:47 PM)Blyss Sarania Wrote: Well, we compared with 2 threads for the test. Nominally I use 3. We chose 2 for the test because Nobbs CPU is a 4 core, and we wanted to both use the same number of threads.

I understand that if the extra rendering threads is 0, GSDX will use only 1 core (thread). So for quad-core CPUs, the extra rendering threads should be set to 3. Do I understand this correctly?
Reply
#34
(09-24-2014, 09:55 PM)xemnas99 Wrote: I understand that if the extra rendering threads is 0, GSDX will use only 1 core (thread). So for quad-core CPUs, the extra rendering threads should be set to 3. Do I understand this correctly?

Yes and no.

Yes to the fact that with extra threads = 0, GSdx uses 1 thread for rendering.

No to using 3 on a quad core.

Every case is different but generally it works like this: First consider hardware mode with no MTVU.

Core #0: EE recompiler
Core #1: GS recompiler
Core #2: Idle
Core #3: Idle

Now consider software mode with 0 extra threads:

Core #0: EE recompiler
Core #1: GS recompiler
Core #2: Software rendering thread #0
Core #3: Idle

Now consider software mode with 3 extra threads:

Core #0: EE recompiler + Software rendering thread #2
Core #1: GS recompiler + Software rendering thread #3
Core #2: Software rendering thread #0
Core #3: Software rendering thread #1

See how now you are sharing resources on two cores between the recompilers and the renderers? This can lead to a performance DECREASE because they are competing. It might or might not depending on the power of CPU, but it can.

Generally for a four core 2 extra threads is best. In that case you have:

Core #0: EE recompiler + Software rendering thread #2
Core #1: GS recompiler
Core #2: Software rendering thread #0
Core #3: Software rendering thread #1

But this is okay because generally the EE thread is not maxing out anyway, and you gain more from another software thread than you lose from sharing resources.

If the game's core emulation is low enough demand wise, 3 extra threads might be okay as well. But generally 2 is best for 4 cores.

Anyway, if you are having trouble staying full speed in software, it's best to experiment. But never set extra threads > cores - 1. Because then you have software threads competing with software threads so any further gain is lost. Also over 4 extra rendering threads, you have seriously diminished gains even with a lot of cores, because of cross thread communication and such.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#35
Yep, I do get higher framerates in B3 with 3 threads just for the record. Though, you and I did do some testing with Ratchet and Clank Going Commando where 2 threads was faster for me but 3 was faster for you.
[Image: gmYzFII.png]
[Image: dvedn3-5.png]
Reply
#36
Exactly why the bold part at the end of the post. It can vary, and it also depends on how the OS schedules threads. But in theory at the very least, you don't want your software threads competing with the core emulation threads.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#37
Yeah, but don't forget about MTVU and putting VU1 on it's own thread as well.
[Image: gmYzFII.png]
[Image: dvedn3-5.png]
Reply
#38
I didn't, I specifically said "without MTVU" in my examples up there.

It's actually a lot more complicated than I made it sound, but I was going for simple. If the EE and GS are both low usage compared to software demand, then sharing the resources might still provide a boost. But if they are both really high, then sharing either could be slower than using just 1 extra thread on a 4 core. And where does the OS schedule threads? Logically it's like I showed, but Windows is hardly logical. Tongue2 If software mode is the bottleneck, then the EE and GS threads may be having low usage, and you gain more from putting software threads with them than you lose. It all just depends.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#39
Thanks for the detailed explanation. Rep +1. Smile I see that using 3 extra rendering threads could result in lower performance in case the EE thread or GS thread (or both) maxes out their respective core. But I think it depends on multiple factors. If the CPU is powerful enough, it shouldn't affect the performance as the EE/GS thread won't max out their respective core for most games.

Also, if there are 3 extra rendering threads, each thread will handle 25% of the load. But if there are 2 extra rendering threads, each thread will handle about 33.33% of the load. Using 2 extra rendering threads could result in lower performance than using 3 extra rendering threads as well depending on the load on other cores.

It's like this:

3 extra rendering threads:
Core #0: EE recompiler + Software rendering thread #2 (25%)
Core #1: GS recompiler + Software rendering thread #3 (25%)
Core #2: Software rendering thread #0 (25%)
Core #3: Software rendering thread #1 (25%)

2 extra rendering threads:
Core #0: EE recompiler + Software rendering thread #2 (33.33%)
Core #1: GS recompiler
Core #2: Software rendering thread #0 (33.33%)
Core #3: Software rendering thread #1 (33.33%)
Reply
#40
(09-25-2014, 03:09 AM)xemnas99 Wrote: Thanks for the detailed explanation. Rep +1. Smile I see that using 3 extra rendering threads could result in lower performance in case the EE thread or GS thread (or both) maxes out their respective core. But I think it depends on multiple factors. If the CPU is powerful enough, it shouldn't affect the performance as the EE/GS thread won't max out their respective core for most games.

Also, if there are 3 extra rendering threads, each thread will handle 25% of the load. But if there are 2 extra rendering threads, each thread will handle about 33.33% of the load. Using 2 extra rendering threads could result in lower performance than using 3 extra rendering threads as well depending on the load on other cores.

It's like this:

3 extra rendering threads:
Core #0: EE recompiler + Software rendering thread #2 (25%)
Core #1: GS recompiler + Software rendering thread #3 (25%)
Core #2: Software rendering thread #0 (25%)
Core #3: Software rendering thread #1 (25%)

2 extra rendering threads:
Core #0: EE recompiler + Software rendering thread #2 (33.33%)
Core #1: GS recompiler
Core #2: Software rendering thread #0 (33.33%)
Core #3: Software rendering thread #1 (33.33%)

Yes, that's right. It all depends on how the load of those EE and GS threads are.

If the EE thread needs 80% of Core #0, and the software thread needs 40%, both will suffer. But if in that case the EE needs 80%, but the GS only needs 30% of it's core, then adding the thread can help, because as you point out it will drop the needs of the software thread on the EE core. But if in that same case, the GS needs 90% of it's core, then adding that thread won't help.

That's why I said it was a lot more complicated than I made it sound. But in general, it's best to try not to share resources unless you have to.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply




Users browsing this thread: 3 Guest(s)