Comparing GSdx SSE2/SSSE3/SSE4.1/AVX/AVX2
#1
So I've been curious about this for quite a while, and I finally got around to running some tests. Nobbs66 also ran the same tests on his rig, so we would have both AMD and Intel data.

Just how much does using a "Better" instruction set boost FPS in PCSX2? The results are in, and they are interesting.

First, our hardware:

Blyss Sarania:
Gigabyte GA-78LMT-USB3 AM3+
AMD FX 6300 @ 4.40 Ghz
8GB (2 x 4GB) Kingston Ram @ 1600,9,9,9,27
HIS IceQ Radeon HD 7870 2GB @ 1300 core, 1200 mem
1TB Seagate Barracuda
Lite-On CDVD+/-RW

Nobbs66:
Intel Core i5 4440 @ 3,1ghz
MSI Gtx 650 1GB OC Edition
8GB RAM 1333mhz
ASRock H81M-HDS
NZXT Phantom 410

NOTE: Only Nobbs66 results include AVX2, since my CPU does not support that.

Next, our test game. We used Burnout 3: Takedown. We used "Coupe 1" on the Silverlake southbound track, with no rivals. We held the brake as the event started, and tested from where our car stopped. We used totally default settings with D3D11.


Software mode:

First, with 0 extra threads:

Blyss Sarania:
SSE2: 20 FPS
SSSE3: 21 FPS
SSE4.1: 21.9 FPS
AVX: 23.2 FPS

We can see that the better instructions set do have a decent FPS gain. AVX is 16% faster than SSE2 for me.

Nobbs66:
SSE2: 22 FPS
SSSE3: 22.5 FPS
SSE4.1: 22 FPS
AVX: 23 FPS
AVX2: 24.5 FPS

Here we can see that the instruction set scaling is a bit different on the Intel CPU, but generally using more advanced instructions gets you more FPS.

Now, with 2 extra threads:

Blyss Sarania:
SSE2: 31.2 FPS
SSSE3: 31.3 FPS
SSE4.1: 31.8 FPS
AVX: 31.8 FPS

Interesting, right? There is almost no difference between the instruction sets when you are using 2 extra rendering threads. My guess is that when using extra threads, cross thread communication becomes the bottleneck and any gains from using the better instructions set get completely erased with my AMD chip. But for Nobbs66's Intel chip, the story is a bit different:

Nobbs66:
SSE2: 39 FPS
SSSE3: 39 FPS
SSE4.1: 39 FPS
AVX: 41 FPS
AVX2: 46 FPS

We see the same thing here. In the first 4 cases, the boost from the instructions is basically negated when using more threads. But look at AVX2! AVX2 is still significantly faster than SSE2 here, showing that for some reason it scales better with more threads!

So now you know. If you aren't using any extra threads, you will get a reasonable boost with AVX/AVX2 over SSE2. But if you are (and you should be unless you only have a dual core!) then it really doesn't matter, unless you can use AVX2! If your chip supports AVX2, definitely make sure to use the AVX2 version of GSdx, as the performance gain is definitely there.


Hardware mode

Now, the hardware mode results. The setup was the same, except using D3D11 Hardware.

Blyss Sarania:
SSE2: 51.1 FPS
SSSE3: 51.3 FPS
SSE4.1: 55 FPS
AVX: 51.2 FPS

Notice that SSE4.1 is faster in HW mode on AMD! The others are all the same, but SSE4.1 is almost 8% faster!

Nobbs66:
SSE2: 66 FPS
SSSE3: 64 FPS
SSE4.1: 69 FPS
AVX: 66 FPS
AVX2: 68 FPS

For the Intel case we see a bit of weirdness, but SSE4.1 is again the fastest. AVX2 does about the same, so it's a good choice for hardware and software.


So for hardware mode, we have determined that SSE4.1 is the fastest in both Intel and AMD cases.



Conclusion:

Well, this has provided some interesting data. We have learned that for hardware mode, SSE4.1 is the fastest by far, but on the Intel side, AVX2 is basically the same. For software, the more advanced instruction sets provide a boost, but that boost is negated when using extra rendering threads except in the case of AVX2. AVX2 provides big benefits to PCSX2 in both hardware and software.

So what should you use? It depends on what your chip supports. Generally for software mode you should use the highest instruction set your chip supports. That's not really startling information. But for hardware mode, SSE4.1 is the fastest.

So it's like this:

Intel chip that supports AVX2: Use AVX2 in all cases. It's fastest in software mode, and the same as SSE4.1 in hardware.

AMD chip that supports AVX: Use AVX for software, but use SSE 4.1 for hardware.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply

Sponsored links

#2
Thanks for asking me to test. I always love putting my rig to the test.
[Image: gmYzFII.png]
[Image: dvedn3-5.png]
Reply
#3
No problem. I'm glad we were able to get the AVX2 data, because that is a big find.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#4
I have to thank NarooN for giving me that plugin a while back.
[Image: gmYzFII.png]
[Image: dvedn3-5.png]
Reply
#5
When I was doing GIT builds I included it in my builds, but only recently did the build bot start spitting out AVX2 plugins.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#6
That explains why AVX2 was there for me to use right away.
[Image: gmYzFII.png]
[Image: dvedn3-5.png]
Reply
#7
Where exactly do get amost no diffrence?

0 thread vs 2 thread from both of you in all plugins show 40% increase at a minimum which nice boost
Reply
#8
(09-24-2014, 01:20 AM)tsunami2311 Wrote: Where exactly do get amost no diffrence?

0 thread vs 2 thread from both of you in all plugins show 40% increase at a minimum which nice boost

On my CPU with 2 extra threads:

Quote:Blyss Sarania:
SSE2: 31.2 FPS
SSSE3: 31.3 FPS
SSE4.1: 31.8 FPS
AVX: 31.8 FPS

I did not mean that the extra rendering threads didn't have a boost, I meant that WITH extra threads the INSTRUCTIONS don't provide a boost. Except AVX2.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#9
yes they did provide a boost you even shown it has a boost each instruction got on the order of atlest 35% increase with extra threads. maybe we just different idea on what is and isnt a boost.

No extra threads performance of all instruction sets are with in 5% of each other same goes for HW
Reply
#10
Blyss is saying that unless you use AVX2 the gs plugin doesn't really matter for software mode with extra rendering threads. The plugin difference is only apparent without extra rendering threads.
[Image: gmYzFII.png]
[Image: dvedn3-5.png]
Reply




Users browsing this thread: 1 Guest(s)