Well obviously more bits means more performance /s
(I actually voted +10% b/c I'm hopeful)
(I actually voted +10% b/c I'm hopeful)
Poll: AVX1 64 bits vs AVX1 32 bits You do not have permission to vote in this poll. |
|||
slower : - 10% | 2 | 5.56% | |
same : +/- 5% | 10 | 27.78% | |
faster : + 10% | 13 | 36.11% | |
much faster : + 20% | 5 | 13.89% | |
on fire : + 50% | 6 | 16.67% | |
Total | 36 vote(s) | 100% |
* You voted for this item. | [Show Results] |
GS - Software mode: Wanna bet on 64 bits performance
|
Well obviously more bits means more performance /s
(I actually voted +10% b/c I'm hopeful)
11-16-2016, 05:37 PM
even if it's just +5, still better than nothing.
Model: Clevo P570WM Laptop
GPU: GeForce GTX 980M ~8GB GDDR5 CPU: Intel Core i7-4960X CPU +4.2GHz (12 CPUs) Memory: 32GB Corsair Vengeance DDR3L 1600MHz, 4x8gb OS: Microsoft Windows 7 Ultimate
On fire baby!
are you saying 32 bit needs an extra mov instruction? or 64 bit?
#StopRNG
11-17-2016, 10:04 AM
Quote:are you saying 32 bit needs an extra mov instruction? or 64 bit?You only have 8 logical registers on 32 bits. So you need more mov from/to memory than 64 bits (which have 16 logical registers). Reality is more complex, and the CPU is able to remove some mov.
11-17-2016, 12:30 PM
i think with time can be more useful 64 bits than 32
11-17-2016, 12:54 PM
Only time will say it for sure
I managed to implement linear filtering right yesterday. Unfortunately, a bug (or 2 if I'm unlucky) remains in some shaders. I need to fix it first before the benchmark.
11-17-2016, 11:24 PM
Bug found and corrected. Rendering is close of 32 bits (except there is no mipmap support). I need to tune the code a bit to better use some registers. Then benchmark ^^
11-19-2016, 12:05 PM
And, sadly, the winner is FlatOut with -10% !
Potentially code can be optimized better. But it won't be faster than 32 bits for sure (at least not AVX). So far the extra cost seem to be * the extra prefix for 64 bits operation. Overhead ought to be low here as we barely use it * the overhead to compute the address [reg + offset]. 64 bits addition is likely slower than 32 bits one.
11-19-2016, 05:55 PM
By the way, I made an interesting 32 bits change (in my PR https://github.com/PCSX2/pcsx2/pull/1664 )
The AVX1/SSE4/SSSE3 selection (NOT AVX2) of the SW renderer will be done based on the runtime detection rather than plugin selection. It would be nice to do some benchmark of SSE2 vs SSSE3 vs SSE4 vs AVX GSdx build on both HW/SW renderer. SSSE3 seems rather worthless, and likely AVX too.
11-19-2016, 06:49 PM
(This post was last modified: 11-19-2016, 06:49 PM by refraction.)
(11-19-2016, 05:55 PM)gregory Wrote: By the way, I made an interesting 32 bits change (in my PR https://github.com/PCSX2/pcsx2/pull/1664 ) If we can do that for all of it, it would be brilliant to cut down to just 1 plugin. Won't all them if statements kill the speed though? |
« Next Oldest | Next Newest »
|