Poll: AVX1 64 bits vs AVX1 32 bits
You do not have permission to vote in this poll.
slower : - 10%
5.56%
2 5.56%
same : +/- 5%
27.78%
10 27.78%
faster : + 10%
36.11%
13 36.11%
much faster : + 20%
13.89%
5 13.89%
on fire : + 50%
16.67%
6 16.67%
Total 36 vote(s) 100%
* You voted for this item. [Show Results]

GS - Software mode: Wanna bet on 64 bits performance
#11
Well obviously more bits means more performance Tongue /s

(I actually voted +10% b/c I'm hopeful)

Sponsored links

#12
even if it's just +5, still better than nothing.
Model: Clevo P570WM Laptop
GPU: GeForce GTX 980M ~8GB GDDR5
CPU: Intel Core i7-4960X CPU +4.2GHz (12 CPUs)
Memory: 32GB Corsair Vengeance DDR3L 1600MHz, 4x8gb
OS: Microsoft Windows 7 Ultimate
#13
On fire baby!

i think it's gonna be slower


are you saying 32 bit needs an extra mov instruction? or 64 bit?
#StopRNG
#14
Quote:are you saying 32 bit needs an extra mov instruction? or 64 bit?
You only have 8 logical registers on 32 bits. So you need more mov from/to memory than 64 bits (which have 16 logical registers). Reality is more complex, and the CPU is able to remove some mov.
#15
i think with time can be more useful 64 bits than 32
CPU: Intel Core i7 4790K @ 4 Ghz
CPU Cooler: Corsair H75
MoBo: MSI Z97S SLI KRAIT  
GPU: AMD Radeon R9 290X
RAM: Corsair 16 GB 1600 Mhz

Desktop
[Image: 1086677.png]

Laptop
[Image: 872520.png]
#16
Only time will say it for sure Smile

I managed to implement linear filtering right yesterday. Unfortunately, a bug (or 2 if I'm unlucky) remains in some shaders. I need to fix it first before the benchmark.
#17
Bug found and corrected. Rendering is close of 32 bits (except there is no mipmap support). I need to tune the code a bit to better use some registers. Then benchmark ^^
#18
And, sadly, the winner is FlatOut with -10% !

Potentially code can be optimized better. But it won't be faster than 32 bits for sure (at least not AVX). So far the extra cost seem to be
* the extra prefix for 64 bits operation. Overhead ought to be low here as we barely use it
* the overhead to compute the address [reg + offset]. 64 bits addition is likely slower than 32 bits one.
#19
By the way, I made an interesting 32 bits change (in my PR https://github.com/PCSX2/pcsx2/pull/1664 )
The AVX1/SSE4/SSSE3 selection (NOT AVX2) of the SW renderer will be done based on the runtime detection rather than plugin selection.

It would be nice to do some benchmark of SSE2 vs SSSE3 vs SSE4 vs AVX GSdx build on both HW/SW renderer.
SSSE3 seems rather worthless, and likely AVX too.
#20
(11-19-2016, 05:55 PM)gregory Wrote: By the way, I made an interesting 32 bits change (in my PR https://github.com/PCSX2/pcsx2/pull/1664 )
The AVX1/SSE4/SSSE3 selection (NOT AVX2) of the SW renderer will be done based on the runtime detection rather than plugin selection.

It would be nice to do some benchmark of SSE2 vs SSSE3 vs SSE4 vs AVX GSdx build on both HW/SW renderer.
SSSE3 seems rather worthless, and likely AVX too.

If we can do that for all of it, it would be brilliant to cut down to just 1 plugin. Won't all them if statements kill the speed though?
[Image: ref-sig-anim.gif]





Users browsing this thread: 1 Guest(s)