[blog] Nightmare on Floating-Point Street
#1
It is very hard to emulate the floating-point calculations of the R5900 FPU and the Vector Units on an x86 CPU because the Playstation 2 does not follow the IEEE standard. Multiplying two numbers on the FPU, VU, and an x86 processor can give you 3 different results all differing by a couple of bits! Operations like square root and division are even more imprecise.

Originally, we thought that a couple of bits shouldn't matter, that game developers would be crazy to rely on such precise calculation. Floating points are mostly used for world transformations or interpolation calculations, so no one would care if their Holy Sword of Armageddon was 0.00001 meters off from the main player's hand. To put it shortly, we were wrong and game developers are crazier than we thought. Games started breaking just by changing the floating point rounding mode!

While rounding mode is a problem, the bigger nightmare is the floating-point infinities. The IEEE standard states that when a number overflows (meaning that it is larger than 3.4028234663852886E+38), the result will be infinity. Any number multiplied by infinity is infinity (even 0 * infinity = infinity). That sounds great until you figure out that the VUs don't support infinities. Instead they clamp all large numbers to the max floating point possible. This discrepancy breaks a lot of games!

For example, let's say a game developer tries to normalize a zero vector by dividing by its length, which is 0. On the VU, the end result will be (0,0,0). On x86/IEEE, the result will be (infinity, infinity, infinity). Now if the game developer uses this vector to perturb some faces for artificial hair or some type of animation, all final positions on the PS2 will remain the same. All final positions on x86 will go to infinity... and there goes the game's graphics, now figure out where the problem occurred.

The simplest solution is to clamp the written vector of the current instruction. This requires 2 SSE operations and is SLOW; and it doesn't work sometimes. To top it off, you can never dismiss the fact that game developers can be loading bad floating-point data to the VUs to begin with! Some games zero out vectors by multiplying them with a zero, so the VU doesn't care at all what kind of garbage the original vector's data has, x86 does care.

These two problems make floating-point emulation very hard to do fast and accurate. The range of bugs are from screen flickering when a fade occurs, to disappearing characters, to spiky polygon syndrome (the most common problem and widely known as SPS).

In the end Pcsx2 does all its floating-point operations with SSE since it is easier to cache the registers. Two different rounding modes are used for the FPU and VUs. Whenever a divide or rsqrt occur on the FPU, overflow is checked. Overflow is checked much more frequently with the VUs. The fact that VUs handle both integer and floating-point data in the same SSE register makes the checking a little longer. In the future, Pcsx2 will read the rounding mode and overflow settings from the patch files. This is so that all games can be accomodated with the best/fastest settings.

Moral of the blog When comparing two floating point numbers a and b, never use a == b. Instead use something along the lines of

Code:
fabs(a-b) < epsilon

where epsilon is some very small number.
Reply

Sponsored links

#2
It never ceases to amaze me how accurate you have to be when writing emulators, especially the CPU instruction set. Even something like a Gameboy would require almost every single instruction to be emulated near flawlessly, this is what astounds me about the PCSX2. This is also what attracts me to emulation, it is such an exact science Smile
[Image: 40122467.png]
Reply
#3
Well don't get too much of that idea for the PS2 and PSP emulation. One of the bigger challenges in emulating the PS2 and PSP is getting away from the "exact emulation" paradigm, due to them being multi-cpu systems where "exact" emulation isn't really effective.

Programming for multi-core and multi-cpu systems (ala 'parallel programming') requires a different way of thinking and, likewise, efficiently emulating them does as well. It is possible to do "exact" emulation, but it would result in an emulator that would be incredibly slow -- as recompiler strategies (aka binary translation) would be pretty much useless, and the actual compatibility wouldn't be any better than by using more clever strategies that utilize the concepts of parallel programming.
Jake Stine (Air) - Programmer - PCSX2 Dev Team
Reply
#4
(03-01-2010, 04:22 PM)Air Wrote: Well don't get too much of that idea for the PS2 and PSP emulation. One of the bigger challenges in emulating the PS2 and PSP is getting away from the "exact emulation" paradigm, due to them being multi-cpu systems where "exact" emulation isn't really effective.

Programming for multi-core and multi-cpu systems (ala 'parallel programming') requires a different way of thinking and, likewise, efficiently emulating them does as well. It is possible to do "exact" emulation, but it would result in an emulator that would be incredibly slow -- as recompiler strategies (aka binary translation) would be pretty much useless, and the actual compatibility wouldn't be any better than by using more clever strategies that utilize the concepts of parallel programming.

Yes, although I have not (as yet) learnt all that much about binary translation, but I am aware that their is a trade off between accuracy and speed. As far as I know, a recompiler should maintain the fastest working state of the emulator as possible, even leaving out certain things if it can afford too. I guess the problem is figuring out what needs to be exactly (or as precisely as possible) emulated and what doesn't.

With regard to my previous comment, I would say that at the level I am at in writing emulators, speed is obviously far less of an issue so I can pretty much afford to be as accurate as possible. But you are completely correct, I shouldn't apply my way of thinking to other emulators like the PCSX2 for example, it clearly requires other ways of thinking and problem solving. Optimization for speed is obviously a very important part of PS2 emulation, and a very challenging one.

Anyway, I think the work you all do is incredible, thats really what I am really trying to express here Smile
[Image: 40122467.png]
Reply
#5
interesting moral... I think there are a few others that can and should be drawn from it. Not sure if I am getting them all but:

1. When trying to zero a variable, set it equal to zero rather then multiplying it by zero.

2. I am stumped, I know there are more but its a bit beyond me. Anyone cares to help?
I do not have a superman complex; for I am God, not superman!

Rig: Q9400, 4GB DDR2, eVGA GTX260 SC, gigabyte EP35-DS3R. X25-M 80GB G2.
Reply
#6
(09-19-2010, 06:55 AM)taltamir Wrote: interesting moral... I think there are a few others that can and should be drawn from it. Not sure if I am getting them all but:

1. When trying to zero a variable, set it equal to zero rather then multiplying it by zero.

2. I am stumped, I know there are more but its a bit beyond me. Anyone cares to help?

Eh, not really. In floating point, it just depends on the type of math you're doing. In games, infinity isn't much use so you'll usually want to assign 0 instead of multiply by zero. In other types of mathematics, retaining the infinity status might be important, so you'd want to use the "mathematical" approach of multiplication by zero. Really the only strict moral is to never use direct equality tests (==) on floating point variables. Everything in floating point should assume a range of error (aka, epsilon), as noted in the blog.

Interestingly, a lot of this blog has been proven inaccurate anyway. Most games used 'greater than' and 'less than' equality comparisons on their floating point computations anyway, and the VUs/FPU and IEEE are pretty much dead-on accurate for everything except really huge or really small values.

Basic clamping applied correctly, and bugfixes in mac/status flags fixed most of the spiky polygons; and the speed hit wasn't really significant. Fixing some bugs in the VU pipeline management (the VUs have a complex 4-stage pipeline) fixed many others. Fixing VU0's access to VU1 registers has fixed a few remaining SPS cases. SPS in several other games were actually caused by buggy VIFunpack emulation; something entirely unrelated to
floating point math.

I think there are still a couple games that seem to exhibit minor bugs in AI or collision detection due to VU or FPU emulation incompatibility; but the known list is very short.
Jake Stine (Air) - Programmer - PCSX2 Dev Team
Reply
#7
Although the list isn't very large, its still very annoying to have to deal with the incompatibilities between ps2 floats and IEEE standard floats.

on the few games where mVU gets sps on where sVU doesn't, its very difficult to know if its due to an emulation error, or due to some floating point precision problem which randomly works better in sVU (due to clamping at different times, and using different instructions to emulate certain opcodes). these differences greatly hinder my ability to fix the bugs, and is a reason i couldn't make mVU more compatible.

and really there is no good solution to the problem, i've thought of many and have been presented with others, and i don't think that any solution is worth the time or will be more compatible than what we have now.
Check out my blog: Trashcan of Code
Reply
#8
(07-25-2006, 01:23 AM)ZeroFrog Wrote: It is very hard to emulate the floating-point calculations of the R5900 FPU and the Vector Units on an x86 CPU because the Playstation 2 does not follow the IEEE standard. Multiplying two numbers on the FPU, VU, and an x86 processor can give you 3 different results all differing by a couple of bits! Operations like square root and division are even more imprecise.

Originally, we thought that a couple of bits shouldn't matter, that game developers would be crazy to rely on such precise calculation. Floating points are mostly used for world transformations or interpolation calculations, so no one would care if their Holy Sword of Armageddon was 0.00001 meters off from the main player's hand. To put it shortly, we were wrong and game developers are crazier than we thought. Games started breaking just by changing the floating point rounding mode!

While rounding mode is a problem, the bigger nightmare is the floating-point infinities. The IEEE standard states that when a number overflows (meaning that it is larger than 3.4028234663852886E+38), the result will be infinity. Any number multiplied by infinity is infinity (even 0 * infinity = infinity). That sounds great until you figure out that the VUs don't support infinities. Instead they clamp all large numbers to the max floating point possible. This discrepancy breaks a lot of games!

For example, let's say a game developer tries to normalize a zero vector by dividing by its length, which is 0. On the VU, the end result will be (0,0,0). On x86/IEEE, the result will be (infinity, infinity, infinity). Now if the game developer uses this vector to perturb some faces for artificial hair or some type of animation, all final positions on the PS2 will remain the same. All final positions on x86 will go to infinity... and there goes the game's graphics, now figure out where the problem occurred.

The simplest solution is to clamp the written vector of the current instruction. This requires 2 SSE operations and is SLOW; and it doesn't work sometimes. To top it off, you can never dismiss the fact that game developers can be loading bad floating-point data to the VUs to begin with! Some games zero out vectors by multiplying them with a zero, so the VU doesn't care at all what kind of garbage the original vector's data has, x86 does care.

These two problems make floating-point emulation very hard to do fast and accurate. The range of bugs are from screen flickering when a fade occurs, to disappearing characters, to spiky polygon syndrome (the most common problem and widely known as SPS).

In the end Pcsx2 does all its floating-point operations with SSE since it is easier to cache the registers. Two different rounding modes are used for the FPU and VUs. Whenever a divide or rsqrt occur on the FPU, overflow is checked. Overflow is checked much more frequently with the VUs. The fact that VUs handle both integer and floating-point data in the same SSE register makes the checking a little longer. In the future, Pcsx2 will read the rounding mode and overflow settings from the patch files. This is so that all games can be accomodated with the best/fastest settings.

Moral of the blog When comparing two floating point numbers a and b, never use a == b. Instead use something along the lines of

Code:
fabs(a-b) < epsilon

where epsilon is some very small number.

Maybe I am joking... but why not use double instead float?you see,because double is "big enough", it probably keep the same when it's greater then 3.4028234663852886E+38.
Reply
#9
One simple answer is that SSE only works on 32bits Wink
Reply
#10
(12-04-2010, 11:40 PM)rama Wrote: One simple answer is that SSE only works on 32bits Wink

SSE2 introduced doubles support for almost everything -- but the other almost-as-simple answer is because its roughly 5x-6x slower than using floats. It'd require a near-complete rewrite of the VU recompilers, and expect games that run 60-70fps currently to run 20. Happy
Jake Stine (Air) - Programmer - PCSX2 Dev Team
Reply




Users browsing this thread: 1 Guest(s)