DirectComputer/DX/OpenGL/OpenCL ... if you remove all marketing BS and limitation to sold new GPU what remains?
Just an hardware GPU with its constraint nothing more.
The fastest solution will be to use a "sw renderer" on shader for few complex draw calls and use a "hw renderer" on hardware unit for numerous but basic draw calls. As you can see by yourself, the new accurate option is much faster than it used to be.
Quote:The GS Emulation over DirectX and OpenGL is full of hacks and workarounds to make specific effects look correct but it usually breaks effects in other games. all these workarounds slow stuff down and can probably make it slower than software emulation.
You're wrong. Code isn't full of hack. It has a couple of clever solution to emulate the GS memory behavior. There are 3 major issues
1/ MEM <=> PCI tranfer => it is an issue for dedicated GPU (not related to Dx/GL)
2/ Texture format conversion. OpenCL does it on the fly. SW & HW uses a texture cache
3/ Upscaling. You can't write anymore the result of the rendering in the GS memory. The only possibility is to scale down the rendering and convert it back to the GS format. OpenCL & SW don't support upscaling so it is easy.
Remove upscaling and all those extra filtering options and it will be much easier to make the HW rendering accurate.