11-16-2016, 12:42 PM
I want to do a break on GSdx dev. So I started to look at the 64 bits port of the SW renderer of GSdx
Based on Gabest's initial/partial implementation I made huge progress on the implementation. Here the status so far
* AVX1 only (as initial Gabest's code was AVX)
* Linux only (Windows/Linux ABI aren't compatible, but it won't be too difficult to port)
* no mipmap
* broken linear filtering but I think I spot the issue
* others bugs (I suspect one is related to 16 bit frame buffer)
Nevertheless, some gs dumps render correctly. Once I manage to fix the linear filtering, I will try to do some early benchmark. Do you want to bet on the performance difference. Test will be done with 0 rendering thread on AVX/Haswell @ 3GHz.
For the record, here the difference between the instruction set on 32 bits
SSE2: process 4 pixels, require extra mov, require extra operation to extract/insert element
SSE41: process 4 pixels, require extra mov (can insert/extract to/from SSE register)
AVX1: process 4 pixels (close of SSE4.1 but use 3 register instruction to avoid mov)
AVX2: process 8 pixels
Steam Stat:
SSE2: 100%
SSE4.1: 87.20%
AVX1:72.56%
Note: I'm not sure it worth to support SSE2/SSE4.1 JIT x64 on GSdx. A C reference implementation is available.
Based on Gabest's initial/partial implementation I made huge progress on the implementation. Here the status so far
* AVX1 only (as initial Gabest's code was AVX)
* Linux only (Windows/Linux ABI aren't compatible, but it won't be too difficult to port)
* no mipmap
* broken linear filtering but I think I spot the issue
* others bugs (I suspect one is related to 16 bit frame buffer)
Nevertheless, some gs dumps render correctly. Once I manage to fix the linear filtering, I will try to do some early benchmark. Do you want to bet on the performance difference. Test will be done with 0 rendering thread on AVX/Haswell @ 3GHz.
For the record, here the difference between the instruction set on 32 bits
SSE2: process 4 pixels, require extra mov, require extra operation to extract/insert element
SSE41: process 4 pixels, require extra mov (can insert/extract to/from SSE register)
AVX1: process 4 pixels (close of SSE4.1 but use 3 register instruction to avoid mov)
AVX2: process 8 pixels
Steam Stat:
SSE2: 100%
SSE4.1: 87.20%
AVX1:72.56%
Note: I'm not sure it worth to support SSE2/SSE4.1 JIT x64 on GSdx. A C reference implementation is available.