11-16-2016, 05:35 PM
11-16-2016, 05:37 PM
even if it's just +5, still better than nothing.
11-17-2016, 08:40 AM
On fire baby!
i think it's gonna be slower
are you saying 32 bit needs an extra mov instruction? or 64 bit?
i think it's gonna be slower
are you saying 32 bit needs an extra mov instruction? or 64 bit?
11-17-2016, 10:04 AM
Quote:are you saying 32 bit needs an extra mov instruction? or 64 bit?You only have 8 logical registers on 32 bits. So you need more mov from/to memory than 64 bits (which have 16 logical registers). Reality is more complex, and the CPU is able to remove some mov.
11-17-2016, 12:30 PM
i think with time can be more useful 64 bits than 32
11-17-2016, 12:54 PM
Only time will say it for sure
I managed to implement linear filtering right yesterday. Unfortunately, a bug (or 2 if I'm unlucky) remains in some shaders. I need to fix it first before the benchmark.
I managed to implement linear filtering right yesterday. Unfortunately, a bug (or 2 if I'm unlucky) remains in some shaders. I need to fix it first before the benchmark.
11-17-2016, 11:24 PM
Bug found and corrected. Rendering is close of 32 bits (except there is no mipmap support). I need to tune the code a bit to better use some registers. Then benchmark ^^
11-19-2016, 12:05 PM
And, sadly, the winner is FlatOut with -10% !
Potentially code can be optimized better. But it won't be faster than 32 bits for sure (at least not AVX). So far the extra cost seem to be
* the extra prefix for 64 bits operation. Overhead ought to be low here as we barely use it
* the overhead to compute the address [reg + offset]. 64 bits addition is likely slower than 32 bits one.
Potentially code can be optimized better. But it won't be faster than 32 bits for sure (at least not AVX). So far the extra cost seem to be
* the extra prefix for 64 bits operation. Overhead ought to be low here as we barely use it
* the overhead to compute the address [reg + offset]. 64 bits addition is likely slower than 32 bits one.
11-19-2016, 05:55 PM
By the way, I made an interesting 32 bits change (in my PR https://github.com/PCSX2/pcsx2/pull/1664 )
The AVX1/SSE4/SSSE3 selection (NOT AVX2) of the SW renderer will be done based on the runtime detection rather than plugin selection.
It would be nice to do some benchmark of SSE2 vs SSSE3 vs SSE4 vs AVX GSdx build on both HW/SW renderer.
SSSE3 seems rather worthless, and likely AVX too.
The AVX1/SSE4/SSSE3 selection (NOT AVX2) of the SW renderer will be done based on the runtime detection rather than plugin selection.
It would be nice to do some benchmark of SSE2 vs SSSE3 vs SSE4 vs AVX GSdx build on both HW/SW renderer.
SSSE3 seems rather worthless, and likely AVX too.
11-19-2016, 06:49 PM
(11-19-2016, 05:55 PM)gregory Wrote: [ -> ]By the way, I made an interesting 32 bits change (in my PR https://github.com/PCSX2/pcsx2/pull/1664 )
The AVX1/SSE4/SSSE3 selection (NOT AVX2) of the SW renderer will be done based on the runtime detection rather than plugin selection.
It would be nice to do some benchmark of SSE2 vs SSSE3 vs SSE4 vs AVX GSdx build on both HW/SW renderer.
SSSE3 seems rather worthless, and likely AVX too.
If we can do that for all of it, it would be brilliant to cut down to just 1 plugin. Won't all them if statements kill the speed though?