Comparing GSdx SSE2/SSSE3/SSE4.1/AVX/AVX2
#21
Really? I never saw that when we were testing. But still, the recommendation isn't to use SSE2, it's to use 4.1 Tongue2
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply

Sponsored links

#22
Yeah, I was wrong. It's full boot. makes me wonder how many times i used full boot on my old pc
[Image: gmYzFII.png]
[Image: dvedn3-5.png]
Reply
#23
it's better for those who have sse4.1 & avx2 instructions! actually avx instructions is better for software rendering (I know about that for long time ) ! but avx isn't find in most low ~ mid end rigs ! so if someone not have sse4.1 & avx then it's better to use sse2 !
Core i3 9100f 3.6Ghz
RAM=8GB
nvidia GT 1030
pcsx2 version-1.3.1  
Reply
#24
Thanks for the information. Does this mean for Intel CPUs that don't support AVX2, SSE4.1 should be used in hardware mode instead of AVX?

Also, why don't you use 3 extra rendering threads? Isn't it faster than using only 2 extra rendering threads on 4-core+ CPUs?
Reply
#25
Just an observation, is technically incorrect saying "Better" instruction set because the newer is an extension of the former. So when using AVX plugin one is just using a plugin expected to be using AVX defined instructions + all the instructions from the previous sets.

On the other side saying better plugin makes more sense... what must be what Bliss meant in the OP Smile

Edit: On the other hand, flagging a plugin as AVX/AVX2 effectively prevents it being used by any non supporting CPU, what is BAD idea. For a long time we will be seeing AVX extension in use in games only when options to SSEx does exist as well.
Imagination is where we are truly real
Reply
#26
comment deleted
We're supposed to be working as a team, if we aren't helping and suggesting things to each other, we aren't working as a team.
- Refraction
Reply
#27
Nice testing to both of you, but you picked a pretty bad area unfortunately Tongue

The problem here is that the various SSE instructions are very, very case bound. This means that for example X game could be using lots of SSE4 optimized code paths making it much faster with that flavor of GSdx while Z plugin uses none of it, making it run equally fast in both SSE2 and SSE4 flavors.

In your case you analyzed how Burnout 3 behaves with the different versions of a plugin. A proper test (which doesn't exist, it would have to be synthetic) would use all code paths equally, so it would show which could potentially give the most speed boost. It would still not have any practical use though since the result could be radically different from game to game.
[Image: newsig.jpg]
Reply
#28
(09-24-2014, 12:18 PM)xemnas99 Wrote: Thanks for the information. Does this mean for Intel CPUs that don't support AVX2, SSE4.1 should be used in hardware mode instead of AVX?

Also, why don't you use 3 extra rendering threads? Isn't it faster than using only 2 extra rendering threads on 4-core+ CPUs?

Well, we compared with 2 threads for the test. Nominally I use 3. We chose 2 for the test because Nobbs CPU is a 4 core, and we wanted to both use the same number of threads.

And yeah based on this, unless you support AVX2, you should use 4.1 for Hardware.

(09-24-2014, 08:05 PM)Bositman Wrote: Nice testing to both of you, but you picked a pretty bad area unfortunately Tongue

The problem here is that the various SSE instructions are very, very case bound. This means that for example X game could be using lots of SSE4 optimized code paths making it much faster with that flavor of GSdx while Z plugin uses none of it, making it run equally fast in both SSE2 and SSE4 flavors.

In your case you analyzed how Burnout 3 behaves with the different versions of a plugin. A proper test (which doesn't exist, it would have to be synthetic) would use all code paths equally, so it would show which could potentially give the most speed boost. It would still not have any practical use though since the result could be radically different from game to game.

I understand what you are saying, but I can also say that at least the fact that SSE4.1 is faster in hardware for me holds up across several different games.

The fact that the SSE code with no extra threads scales pretty linearly from SSE2 to 3, to 4.1, to AVX shows that this is a pretty good test case for that too.

Also remember that SSSE3 is an "extension" of SSE2 which means SSSE3 contains the optimizations SSE2 does + more. SSE4.1 contains those + more.

Now, I don't know about AVX, but generally newer SSE should be faster than older, because it contains all the instructions of it's predecessors + the new ones. At least that's how I understand it.

Wikipedia Wrote:SSE2, Willamette New Instructions (WNI), introduced with the Pentium 4, is a major enhancement to SSE. SSE2 adds new math instructions for double-precision (64-bit) floating point and also extends MMX integer instructions to operate on 128-bit XMM registers. Until SSE2, SSE integer instructions introduced with later SSE extensions could still operate on 64-bit MMX registers because the new XMM registers require operating system support. SSE2 enables the programmer to perform SIMD math on any data type (from 8-bit integer to 64-bit float) entirely with the XMM vector-register file, without the need to use the legacy MMX or FPU registers. It offers an orthogonal set of instructions for dealing with common data types.

SSE3, also called Prescott New Instructions (PNI), is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process (thread) management instructions.

SSSE3, Merom New Instructions (MNI), is an incremental upgrade to SSE2, adding 16 new instructions which include permuting the bytes in a word, multiplying 16-bit fixed-point numbers with correct rounding, and within-word accumulate instructions. SSSE3 is often mistaken for SSE4 as this term was used during the development of the Core microarchitecture.

SSE4, Penryn New Instructions (PNI), is another major enhancement, adding a dot product instruction, additional integer instructions, a popcnt instruction, and more.

AVX (Advanced Vector Extensions), Gesher New Instructions (GNI), is an advanced version of SSE announced by Intel featuring a widened data path from 128 bits to 256 bits and 3-operand instructions (up from 2). Intel released processors in early 2011 with AVX support. AVX requires support from the operating system. AVX cannot be used on older operating systems like Windows XP or Windows Vista, even if the CPU supports AVX.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#29
(09-24-2014, 08:47 PM)Blyss Sarania Wrote: I understand what you are saying, but I can also say that at least the fact that SSE4.1 is faster in hardware for me holds up across several different games.

The fact that the SSE code with no extra threads scales pretty linearly from SSE2 to 3, to 4.1, to AVX shows that this is a pretty good test case for that too.

Also remember that SSSE3 is an "extension" of SSE2 which means SSSE3 contains the optimizations SSE2 does + more. SSE4.1 contains those + more.

Now, I don't know about AVX, but generally newer SSE should be faster than older, because it contains all the instructions of it's predecessors + the new ones. At least that's how I understand it.

Not really. Just because this game scales from 2 to 4 linearly tells us nothing about the general behavior of lets say 300 games. Some other game might be SSe2->SSSE3 0% boost and SSSE3->SSE4 10% boost or whichever other crazy scenario you can imagine.
I'm afraid to draw any kind of realistic results, a huge game base would have to be used, so you can deduce how most of the games tend to behave.

Not sure if newer->faster is true for GSdx, meaning if in the SSE4 flavor the same optimizations that were done for SSSE3 are redone using SSE4 to make them faster, or if you get totally different optimizations which could not be done with SSSE3.

I think rama knows some stuff about this, maybe ask him Tongue
[Image: newsig.jpg]
Reply
#30
(09-24-2014, 08:59 PM)Bositman Wrote: Not really. Just because this game scales from 2 to 4 linearly tells us nothing about the general behavior of lets say 300 games. Some other game might be SSe2->SSSE3 0% boost and SSSE3->SSE4 10% boost or whichever other crazy scenario you can imagine.
I'm afraid to draw any kind of realistic results, a huge game base would have to be used, so you can deduce how most of the games tend to behave.

Not sure if newer->faster is true for GSdx, meaning if in the SSE4 flavor the same optimizations that were done for SSSE3 are redone using SSE4 to make them faster, or if you get totally different optimizations which could not be done with SSSE3.

I think rama knows some stuff about this, maybe ask him Tongue

1. True, it doesn't show how a crap ton of different games will perform, you are right. But it shows that the test was at least valid, because there is an improvement between plugins. At the very least it shows for software that the newest instructions set will be faster than the older ones. Or at least the same. So we can still draw the conclusion that for software you should use the highest you support.

2. Based on how I understand it, it works like this:

Compile GSdx with SSE2: Some instructions get SSE2 optimized.
Compile GSdx with SSSE3: Some instructions get SSE2 optimized, and some more get SSSE3 optimized.
Compile GSdx with SSE4.1: Some get SSE2, some get SSSE3, and some more get SSE4.1

If the compiler is doing it "right" then GSdx SSSE3 should contain all the SSE2 optimizations that SSE2 version does(except where SSSE3 optimizations would be faster than SSE2) + also now SSSE3 optimizations.

I'm almost certain that's how it works. I don't know if AVX is backwards inclusive or not though.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply




Users browsing this thread: 1 Guest(s)