09-09-2009, 02:51 AM
(This post was last modified: 09-09-2009, 02:52 AM by cottonvibes.)
(09-09-2009, 12:14 AM)frankdd89 Wrote: Hi CottonVibes!
Can i ask you why you didn't check if value is a NaN or Inf before the clamp?
float ps2_sqrt(float value) {
value = clamp(value); // Clamp Value if NaN or Inf to an ordered/normal number
value = abs(value); // Make Positive
value = sqrt(value); // Get sqrt of now-positive value
return value;
}
you can spare a few time if you do a control to value variable like this:
float ps2_sqrt(float value) {
if(value==NaN||value==Inf)
value = clamp(value);
if(value<0)
value = abs(value);
value = sqrt(value);
return value;
}
I think this solution would be more efficient because you are not calling abs and clamp functions if they are not necessary!
(I Dunno how did you've implemented it anyway in the sourcecode)
I actually simplified the code above a bit to make it easier to understand.
In practice I do something like:
Code:
float ps2_sqrt(float value) {
InvalidFlag = 0;
if (*(u32*)&value & 0x80000000) { // check Sign bit to see if Negative
InvalidFlag = 1;
value = abs(value);
}
value = Positive_Clamp(value); // Clamp Value
value = sqrt(value); // Get sqrt of now-positive value
return value;
}
I use the conditional because the ps2 actually sets some flags when a square root with a negative number occurs.
There's other ways to code this without a conditional, but SQRT aren't called enough to specifically optimize for them, and this code is a bit more readable.
Also "Positive_Clamp()" is just 1 SSE instruction, where as normal clamping is a minimum of 2 SSE instructions (to do a sign-preserving accurate clamp it needs 5 SSE instructions).
I guess I'll quickly list the 3 different clamping we do, and how they look in SSE:
Positive Clamp looks like this:
Code:
SSE_MINPS_M128_to_XMM(reg, (uptr)mVU_maxvals);
Normal Clamping looks like this:
Code:
SSE_MINPS_M128_to_XMM(reg, (uptr)mVU_maxvals);
SSE_MAXPS_M128_to_XMM(reg, (uptr)mVU_minvals);
What this actually does is:
If (reg == Positive_NaN) reg = 0x7f7fffff; // Floating Point Positive Max Value
If (reg == Negative_NaN) reg = 0x7f7fffff; // Floating Point Positive Max Value
If (reg == Positive_Infinity) reg = 0x7f7fffff; // Floating Point Positive Max Value
If (reg == Negative_Infinity) reg = 0xff7fffff; // Floating Point Negative Max Value
As you can see, it makes ALL NaNs into the positive greatest floating point number, regardless if the signbit is positive or negative (i.e. it doesn't preserve the sign after conversion).
The fun thing is most of the time, games will be happy with this. They mainly just care about getting rid of the NaNs, and don't complain about the sign.
"Preserve-Sign" Clamping looks like this:
Code:
SSE_MOVAPS_XMM_to_XMM(regT1, reg);
SSE_ANDPS_M128_to_XMM(regT1, (uptr)mVU_signbit);
SSE_MINPS_M128_to_XMM(reg, (uptr)mVU_maxvals);
SSE_MAXPS_M128_to_XMM(reg, (uptr)mVU_minvals);
SSE_ORPS_XMM_to_XMM (reg, regT1);
What this does is:
If (reg == Positive_NaN) reg = 0x7f7fffff; // Floating Point Positive Max Value
If (reg == Negative_NaN) reg = 0xff7fffff; // Floating Point Negative Max Value
If (reg == Positive_Infinity) reg = 0x7f7fffff; // Floating Point Positive Max Value
If (reg == Negative_Infinity) reg = 0xff7fffff; // Floating Point Negative Max Value
Now this actually preserves the sign when a number is NaN, but as you can see its over twice as slow! (5 instructions compared to 2)
If you've ever wondered what "Extra + Preserve Sign" Clamp mode is actually doing, its using the Preserve Sign clamping I posted above instead of the Normal Clamping.
But also you might have noticed that any game that needs "Extra + Preserve Sign" mode, generally works with just "Extra" mode which only uses Normal Clamping (I don't know any game that 'needs' the preserve sign clamping that doesn't work with Extra clamping).
The reason is because of what I explained earlier that games just seem to care about not having NaNs, and they don't really care if you convert them to a positive number.
(NaNs are very evil because they propagate to the result of any other instruction. So the only way to stop this is to make the number an ordered/normal number.)
Also... when coding with SSE you generally try and avoid conditionals, not just because branch prediction misses are costly, but also because you might be working with 4 vectors at once; and you have to gear your thinking towards algorithms that can 'preserve' the state of some vectors while only modifying the ones you want.
Many times if don't think it out, and you try and use conditionals with SSE, you'll end up with 16 different cases (4 vectors, and each one can either match the condition or not match the condition) and things can get very ugly, large, and slow.
But sometimes conditionals are useful, and you can do very smart stuff with them. Also a very smart thing you can do when you can't work out the algorithm w/o jumps, is to have a Jump table with all 16(or however many) results possible (I remember i implemented some MMI instruction using some cool tricks like that (see the function 'void recQFSRV()' in pcsx2)).
Doing tricks like that is often a lot of work to code though...
Also there's a bit of an overhead when using conditional jumps and SSE. Many times you might have to copy some info to a GPR like echosierra mentioned. And the actual setup to the jump can be a couple of instructions.
But this highly depends on what you're trying to do, and how you implemented it.
Anyways I guess I should also clear up that all the code I've posted here is the very high-level implementation of what we do.
When doing the low-level SSE/x86 asm code, it makes a bit more sense why we do it this way
Check out my blog: Trashcan of Code