AMD SSE2 plugin speeds
#1
Im shocked at the difference between performance of sse2 and sse3/ssse3/sse4 plugins, i have phenom II X2 555 BE clocked @4.4Ghz and its still doesn't run pcsx2 as smooth as Intel cpus with sse3/ssse3/sse4 plugins, cant devs create plugin that supports amds sse3 features?
Mobo: EVGA X58 SLI LE
CPU: Intel i7 920 C0 @ 4.2 Ghz 1.36v Cogage Arrow
Ram: 3x2GB OCZ Gold 1690 9-9-8-24 1.65v
GPU: MSI GTX580 Lightning @ 970/2200 1.09v+MSI GTX460 Hawk PhysX
HDD: Corsair Force GT 120, 2x F4 320GB Raid 0,F4 2TB, WD-G 1TB
PSU: Corsair HX850 80 PLUS SILVER Modular
Case: Antec 1200 EVGA Mod
Reply

Sponsored links

#2
SSE3 has no features that work well in this case. Also, it's not just the instruction set(s) that give Intel the advantage over AMD's best architecture (for now).

Have you heard of AMD's Bulldozer? It has the instruction sets that it has lacked, and a few other unique changes. Maybe you could look to those for PCSX2 salvation in June (if you're an AMD fan). Smile
Reply
#3
My Intel CPU supports SSSE3 and I use it with GSdx and the difference between that and SSE2 is almost null most of the time.

You can check in the link bellow to see that the SSE version doesn't really change the outcome, newer Intel CPUs however (i3/5/7) are simply faster than old architectures at same clocks (Phenom II and Core 2), it's not like you're missing out much but if you want faster you may need to wait for next gen of AMD CPUs (we're not even sure if they'll be faster but at least they should have SSSE3/SSE4.1 if that matters much to you).
http://forums.pcsx2.net/Thread-CPU-Bench...d-on-FFX-2
Core i5 3570k -- Geforce GTX 670  --  Windows 7 x64
Reply
#4
ok so its the intel architecture not the sse capability, but it doesnt seem like it makes as much difference with pc games, i mean phenom II dual core @4.4Ghz will outperform core 2 duo @3.5Ghz running pc games but with pcsx2 its the other way around, why is that, just because it was designed to work with Intel architecture in first place ?

and i do have capable intel system to run pcsx2 (look in sig) but im just shocked at the difference between intel and amd when it comes to pcsx2, i mean @4.4Ghz i should be able to run any ps2 game
Mobo: EVGA X58 SLI LE
CPU: Intel i7 920 C0 @ 4.2 Ghz 1.36v Cogage Arrow
Ram: 3x2GB OCZ Gold 1690 9-9-8-24 1.65v
GPU: MSI GTX580 Lightning @ 970/2200 1.09v+MSI GTX460 Hawk PhysX
HDD: Corsair Force GT 120, 2x F4 320GB Raid 0,F4 2TB, WD-G 1TB
PSU: Corsair HX850 80 PLUS SILVER Modular
Case: Antec 1200 EVGA Mod
Reply
#5
As mentioned the instruction sets aren't the real reason Intel CPU's are faster. its mainly just better architecture, so that even if the clock-speed of the cpu is lower, the CPU still performs better.

SSE3 and SSSE3 aren't very useful instruction sets so its hard to find places to make use of them.
SSE4.1 is pretty useful however and pcsx2 and gsdx use it to speed up some things.
probably the SSE4.1 optimizations give around a ~5% speedup in games (some things like vif-unpacks were greatly optimized with SSE4.1). so its probably equivalent to OCing your CPU an extra 100~200mhz.

anyways i don't have a SSE4.1 cpu so i haven't actually tested the stuff myself, but i do know most of the places pcsx2 uses SSE4.1 optimizations (i implemented most of them). in some common routines that take 3~5 instructions with SSE1+SSE2, SSE4 does it in 1~2 instructions.
Check out my blog: Trashcan of Code
Reply
#6
(05-15-2011, 07:38 AM)vdgamer Wrote: ok so its the intel architecture not the sse capability, but it doesnt seem like it makes as much difference with pc games, i mean phenom II dual core @4.4Ghz will outperform core 2 duo @3.5Ghz running pc games but with pcsx2 its the other way around, why is that, just because it was designed to work with Intel architecture in first place ?

and i do have capable intel system to run pcsx2 (look in sig) but im just shocked at the difference between intel and amd when it comes to pcsx2, i mean @4.4Ghz i should be able to run any ps2 game

hmm. well pcsx2 uses the DaZ and FtZ flags for SSE, and relies heavily on SSE optimizations.

setting the DaZ and FtZ flags is a huge speedup on intel CPUs, and for AMD cpus its decent but not as huge of a speedup as intel's (at least it wasn't with the amd X2 architecture).

a lot of pc games probably don't set the DaZ/FtZ flags for SSE.
also PC games might not have as much SSE optimizations since the majority of code that is executed is most-likely high level C/C++ compiled code. Compilers aren't very good at utilizing SSE themselves. Usually they use the x87 FPU for floating point computations, and when they do use SSE its just for some small stuff. you can't really expect the compiler to do clever vector optimizations as well as a human can.
pcsx2 has huge blocks of code that is pretty much pure-sse (the VU recompilers generate code that is probably around ~90% SSE instructions).

so i believe that explains the difference in results.
Check out my blog: Trashcan of Code
Reply
#7
Quote:ok so its the intel architecture not the sse capability, but it doesnt seem like it makes as much difference with pc games, i mean phenom II dual core @4.4Ghz will outperform core 2 duo @3.5Ghz running pc games but with pcsx2 its the other way around, why is that, just because it was designed to work with Intel architecture in first place ?

Tried playing starcraft 2 lately?
how about Civilisation 5?

Cross platform garbage is made for the lowest common denominator, you can't measure a processors might based on such games.

Intel dominates (currently available) amd 3:1 for Core Per Core performance, which is the most important to PCSX2.
Reply
#8
Well, I think a Phenom II vs Core 2 dual with that much of a clock difference is gonna show in AMD's favor for PCSX2's sake, but the concept is correct.

Right on, cotton. That thing on AMD vs Intel is very much about DaZ with PCSX2, as rama said.
Reply
#9
(05-15-2011, 07:38 AM)vdgamer Wrote: i mean phenom II dual core @4.4Ghz will outperform core 2 duo @3.5Ghz running pc games but with pcsx2 its the other way around

It isn't the other way around, did you check the CPU benchmark thread?


Here's the closest to those clocks you mentioned and the Phenom II still beats the Core2:
Quote:60.84 FPS - SLUS 20672 - AMD Phenom II X6 1090T - 4.2 GHz OC - UnrealChrisG
54.98 FPS - SLUS 20672 - Intel Core 2 Duo E7200 - 3.52 GHz OC - cyber

and at same clocks they still have similar performance:
Quote:58.61 FPS - SLUS 20672 - Intel Core 2 Duo E7200 - 3.8 GHz OC - pcsx2fan
57.66 FPS - SLUS 20672 - AMD Phenom II x4 955 - 3.8 GHz OC -Ryner Lute

In that case the Intel Core2 has SSE4.1 and the Phenom II only SSE2, and the difference is less than 2% and could probably be the same if they had different RAM... anyway the performance difference between Phenom II and core 2 isn't high at all.

Again, the only ones that have a huge difference between Intel and AMD are the core i3/i5/i7 because of the newer architecture.
And if you see the lowest results performance/clock base are Athlon64 x2 because they're even older architecture, Intel Pentium D/4 would also be there competing for lowest performers tho Tongue2.



Edit: Now this one's interesting in the pal version:
Quote:57.04 FPS - SLES 51815 - Intel Core i7 870 - 2.93 GHz Stock - hallmark
52.37 FPS - SLES 51815 - Intel Core 2 Duo E8400 - 3.6 GHz OC - bositman
52.12 FPS - SLES 51815 - AMD Phenom II X4 955 - 3.8 GHz OC - sakraycore
50.55 FPS - SLES 51815 - Intel Core 2 Duo E8300 - 3.6 GHz OC - boogerthe2nd

Seems boogerthe2nd would have a bit worse RAM than bosit or running more background apps or something (E8300 and E8400 are not different and should perform the same at the same clocks in the same environment), but anyway the performance is still very comparable between Phenom II and Core2 within a small margin of error and the Core i7 simply beats them so easy Tongue2
Core i5 3570k -- Geforce GTX 670  --  Windows 7 x64
Reply
#10
(05-15-2011, 07:56 AM)cottonvibes Wrote: Compilers aren't very good at utilizing SSE themselves. Usually they use the x87 FPU for floating point computations, and when they do use SSE its just for some small stuff. you can't really expect the compiler to do clever vector optimizations as well as a human can.
MS VC doesn't even do auto vectorizing but doesn't enabling SSE on the compiler do FP operations as a scalar SSE operation so it can use a XMM register instead of GPR or something like that? This is how 64bit "obsoletes" x87. GCC has an option to use both x87&SSE so it can use both sets of registers.
Reply




Users browsing this thread: 1 Guest(s)