08-24-2011, 04:38 AM
(This post was last modified: 08-24-2011, 04:48 AM by Squall Leonhart.)
DaZ Optimisations =/= SSE2's DaZ instruction.
AMD and Intel both support DaZ with SSE2, but only intel has implemented the IE 754's hardware level optimisations.
On supported Core 2 and i7 processors, Intel runs float ops faster where DaZ are required because AMD did not implement the optimisations recommended in the ie 754 floating point standard.
SSE's pre-existing instructions receive improvements simply by improving on the hardware, thats all the difference here with amd DaZ vs intel DaZ.
last i heard from my old AMD source (left after the ati buyout) K8 and initial K10 had not actually implemented real hardware support for denormals are zero, they were faking it by trapping in software.
Finally found what i was looking for
http://www.agner.org/optimize/instruction_tables.pdf
AMD and Intel both support DaZ with SSE2, but only intel has implemented the IE 754's hardware level optimisations.
On supported Core 2 and i7 processors, Intel runs float ops faster where DaZ are required because AMD did not implement the optimisations recommended in the ie 754 floating point standard.
cottonvibes Wrote:hmm. well pcsx2 uses the DaZ and FtZ flags for SSE, and relies heavily on SSE optimizations.
setting the DaZ and FtZ flags is a huge speedup on intel CPUs, and for AMD cpus its decent but not as huge of a speedup as intel's (at least it wasn't with the amd X2 architecture).
cottonvibes Wrote:Intel CPUs don't follow the spec better, they both follow it the same; performance-wise however, Intel CPUs get a significant speedup with DaZ + FtZ flags whereas AMD cpu's don't get as big of a speedup.
SSE's pre-existing instructions receive improvements simply by improving on the hardware, thats all the difference here with amd DaZ vs intel DaZ.
last i heard from my old AMD source (left after the ati buyout) K8 and initial K10 had not actually implemented real hardware support for denormals are zero, they were faking it by trapping in software.
Finally found what i was looking for
http://www.agner.org/optimize/instruction_tables.pdf