Comparing GSdx SSE2/SSSE3/SSE4.1/AVX/AVX2
#41
(09-24-2014, 06:28 AM)Blyss Sarania Wrote: Second, I don't think it's common knowledge that SSE 4.1 is faster in hardware mode. Most people just say the instruction sets help with software only. I had heard on the forums one single time that SSE 4.1 was faster in Hardware, and that is indeed the case. AVX2 and SSE4.1 both show improvements over all the others in HW. This is not common knowledge.

(09-24-2014, 06:34 AM)Blyss Sarania Wrote: ^ I disagree. Let's say I set mine at AVX and leave it(which is what I've done til now). Take Xenosaga I as an example. Cutscenes regularly drop to ~55FPS for me in HW mode. Switching to SSE4.1 causes them to stay at 60FPS. So if you are using something other than SSE4.1/AVX2, and you have a game that's going ~10% slow, switching to SSE4.1 or AVX2 will bring it to full speed.

(09-24-2014, 01:10 PM)nosisab Ken Keleh Wrote: Just an observation, is technically incorrect saying "Better" instruction set because the newer is an extension of the former. So when using AVX plugin one is just using a plugin expected to be using AVX defined instructions + all the instructions from the previous sets.

On the other side saying better plugin makes more sense... what must be what Bliss meant in the OP Smile

Edit: On the other hand, flagging a plugin as AVX/AVX2 effectively prevents it being used by any non supporting CPU, what is BAD idea. For a long time we will be seeing AVX extension in use in games only when options to SSEx does exist as well.

(09-24-2014, 08:05 PM)Bositman Wrote: The problem here is that the various SSE instructions are very, very case bound. This means that for example X game could be using lots of SSE4 optimized code paths making it much faster with that flavor of GSdx while Z plugin uses none of it, making it run equally fast in both SSE2 and SSE4 flavors.

In your case you analyzed how Burnout 3 behaves with the different versions of a plugin. A proper test (which doesn't exist, it would have to be synthetic) would use all code paths equally, so it would show which could potentially give the most speed boost. It would still not have any practical use though since the result could be radically different from game to game.

(09-24-2014, 09:04 PM)Blyss Sarania Wrote: 1. True, it doesn't show how a crap ton of different games will perform, you are right. But it shows that the test was at least valid, because there is an improvement between plugins. At the very least it shows for software that the newest instructions set will be faster than the older ones. Or at least the same. So we can still draw the conclusion that for software you should use the highest you support.

2. Based on how I understand it, it works like this:

Compile GSdx with SSE2: Some instructions get SSE2 optimized.
Compile GSdx with SSSE3: Some instructions get SSE2 optimized, and some more get SSSE3 optimized.
Compile GSdx with SSE4.1: Some get SSE2, some get SSSE3, and some more get SSE4.1

If the compiler is doing it "right" then GSdx SSSE3 should contain all the SSE2 optimizations that SSE2 version does(except where SSSE3 optimizations would be faster than SSE2) + also now SSSE3 optimizations.

I'm almost certain that's how it works. I don't know if AVX is backwards inclusive or not though.

May 05 [17:11:38] My question is, should I do SSE4 or AVX2?
May 05 [17:11:53] I always had that question when picking GSdx plugins
May 05 [17:12:06] I wasn't sure if by picking AVX2 it would use SSE4 for the non-SW renderer.
May 05 [17:12:11] or if I needed to pick it myself.
May 05 [17:12:17] AVX2 implies SSE4

OK so...what the hell is going on? GSdx-AVX2 should include all the previous instruction sets and yet there's cases where SSE4 outperforms it? The *****? Something must be wrong in how GSdx is being compiled or something.
[Image: pNm13X9.gif]
Windows 10 Pro x64 Version 1909 | AMD Ryzen 5 5600X | GIGABYTE AORUS GeForce GTX 1080 Ti | Crucial 16GB (2x8GB) DDR4 3600 RAM | Samsung 850 EVO 500 GB SSD | WD Red Plus 8TB

CPU Intensive Games
GPU Intensive Games
Games that don't need a strong CPU
Reply

Sponsored links

#42
Quote:OK so...what the hell is going on? GSdx-AVX2 should include all the previous instruction sets and yet there's cases where SSE4 outperforms it? The *****? Something must be wrong in how GSdx is being compiled or something.

As I said in that post you quoted, I don't know if AVX is reverse inclusive like SSE is(e.g. SSE4.1 contains SSSE3 and SSE2, but does AVX? IDK).

pseudonym is implying there it does, however as you say the results show differently.

I don't really know. I don't have an Intel CPU to make tests. It could be that the game we used as a case study (Burnout 3: Takedown) is the reason why. Someone with AVX2 can make some test in another game.

Edit: Also note the HW tests:

Quote:Nobbs66:
SSE2: 66 FPS
SSSE3: 64 FPS
SSE4.1: 69 FPS
AVX: 66 FPS
AVX2: 68 FPS

Are actually probably within a margin of error. We were able to duplicate them, but still a 2-3FPS difference is not definitive(unlike SW mode tests which were ~20% in some case)
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#43
I think you forget the compiler... especially the optimization... That AVX includes SSE4.1 instructions (on code level) doesn't mean that they were both optimized in the same way (on assembler level). For stable releases we even have PGO...
Reply
#44
Quote:I think you forget the compiler... especially the optimization... That AVX includes SSE4.1 instructions (on code level) doesn't mean that they were both optimized in the same way (on assembler level).

Good point.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#45
[13:21:06] well 2 is not that much. I think it might worth to compare 4 extra thread SSE4.1 and AVX2

Could we get vsub in here to test SSE4.1 vs AVX2?
[Image: pNm13X9.gif]
Windows 10 Pro x64 Version 1909 | AMD Ryzen 5 5600X | GIGABYTE AORUS GeForce GTX 1080 Ti | Crucial 16GB (2x8GB) DDR4 3600 RAM | Samsung 850 EVO 500 GB SSD | WD Red Plus 8TB

CPU Intensive Games
GPU Intensive Games
Games that don't need a strong CPU
Reply
#46
Margin of error is 2%-5% due to low-power feature/cache access, unexpected memory transfer. It is my status on linux but it is likely the same on win.

Normally SSE is included on AVX. Except when avx replace a full chunk of code like rasteriser in sw renderer.

Fma instruction was disabled for intel but maybe amd can execute them faster. However result is strange, normally allow to compute 8 pixels instead of 4 by instructions.
Reply
#47
Cool thread, I just notice today a bug on new generation GSX Plugins
I Don't know if that a hardware compatibility problem or emu problem.

PCSX2 v1.5.0-dev-2226

I have test all v2 plugins, all has the same graphic glitch
The whole room is blur, aspect of model is out scaled graphic offsets looks out scaled. or


Doesn't matter If I change Interlacing, or Stock Native Resolutions or any other setting like CRC Hacks/Maps.
I try to change Emulation settings & Speedhacks to +0 but problem still remains
It looks like the entire v2 plugins have graphical offset problem.

Glitches exist to many other games with same Graphic Engine like MK Shaolin Monks, Onimusha,
God of War 2 have same problem but fixed when I use CRC Aggressive but only that game.

PCSX2 v1.5.0-dev-2226
Here is the pictures:
           

Problem Fixed Only with classic AVX version 1.
After I imported the AVX from pcsx2-v1.5.0-dev-1676 Graphics Fixed.

I don't know if this is the right thread to post, I can move my post
Thanks!

My hardware:

CPU-Z INFO

Processor: AMD Ryzen 7 1700 OC 3965.0MHz
Memory: Corsair DDR4 Dual 16GB OC 3200MHz
GPU: nVidia Asus Strix GTX960 4GB OC 1227MHz
Reply
#48
You need to make new thread for your issue. This is a really old and unrelated thread. But you should try the Half-Pixel Offset hack (in Advanced Settings & Hacks menu of the GSdx plugin).
Reply
#49
^
^
^
RE4 is "notorius" to blurish....don't use custom resolution....use custom 1024x1024.
already tried that because i have a 4K monitor.
Main PC1:i5-4670,HD7770(Active!)
Main PC2:i5-11600K,GTX1660Ti(Active!)
PCSX2 Discord server IGN:smartstrike
PCSX2 version uses:Custom compiled build 1.7.0 64-bit(to be update regularly)
smartstk's YouTube Channel
Reply
#50
(01-19-2018, 02:15 AM)FlatOut Wrote: You need to make new thread for your issue. This is a really old and unrelated thread. But you should try the Half-Pixel Offset hack (in Advanced Settings & Hacks menu of the GSdx plugin).

i think that is a megathread no need to close that thread.
Main PC1:i5-4670,HD7770(Active!)
Main PC2:i5-11600K,GTX1660Ti(Active!)
PCSX2 Discord server IGN:smartstrike
PCSX2 version uses:Custom compiled build 1.7.0 64-bit(to be update regularly)
smartstk's YouTube Channel
Reply




Users browsing this thread: 1 Guest(s)