Login

**cottonvibes** · (This post was last modified: 09-09-2009, 02:52 AM by cottonvibes.)

(09-09-2009, 12:14 AM)frankdd89 Wrote: Hi CottonVibes!
Can i ask you why you didn't check if value is a NaN or Inf before the clamp?

float ps2_sqrt(float value) {
value = clamp(value); // Clamp Value if NaN or Inf to an ordered/normal number
value = abs(value); // Make Positive
value = sqrt(value); // Get sqrt of now-positive value
return value;
}

you can spare a few time if you do a control to value variable like this:

float ps2_sqrt(float value) {
if(value==NaN||value==Inf)
value = clamp(value);
if(value<0)
value = abs(value);
value = sqrt(value);
return value;
}

I think this solution would be more efficient because you are not calling abs and clamp functions if they are not necessary!

(I Dunno how did you've implemented it anyway in the sourcecode)

I actually simplified the code above a bit to make it easier to understand.

In practice I do something like:

Code:
float ps2_sqrt(float value) {

    InvalidFlag = 0;

    if (*(u32*)&value & 0x80000000) { // check Sign bit to see if Negative

         InvalidFlag = 1;

         value = abs(value);

    }

    value = Positive_Clamp(value); // Clamp Value

    value = sqrt(value); // Get sqrt of now-positive value

    return value;

}

I use the conditional because the ps2 actually sets some flags when a square root with a negative number occurs.
There's other ways to code this without a conditional, but SQRT aren't called enough to specifically optimize for them, and this code is a bit more readable.

Also "Positive_Clamp()" is just 1 SSE instruction, where as normal clamping is a minimum of 2 SSE instructions (to do a sign-preserving accurate clamp it needs 5 SSE instructions).

I guess I'll quickly list the 3 different clamping we do, and how they look in SSE:

Positive Clamp looks like this:

Code:
SSE_MINPS_M128_to_XMM(reg, (uptr)mVU_maxvals);

This will clamp values correctly 'if' the value is known to be positive. (Like after we use abs(value)).

Normal Clamping looks like this:

Code:
SSE_MINPS_M128_to_XMM(reg, (uptr)mVU_maxvals);

SSE_MAXPS_M128_to_XMM(reg, (uptr)mVU_minvals);

What this actually does is:
If (reg == Positive_NaN) reg = 0x7f7fffff; // Floating Point Positive Max Value
If (reg == Negative_NaN) reg = 0x7f7fffff; // Floating Point Positive Max Value
If (reg == Positive_Infinity) reg = 0x7f7fffff; // Floating Point Positive Max Value
If (reg == Negative_Infinity) reg = 0xff7fffff; // Floating Point Negative Max Value

As you can see, it makes ALL NaNs into the positive greatest floating point number, regardless if the signbit is positive or negative (i.e. it doesn't preserve the sign after conversion).
The fun thing is most of the time, games will be happy with this. They mainly just care about getting rid of the NaNs, and don't complain about the sign.

"Preserve-Sign" Clamping looks like this:

Code:
SSE_MOVAPS_XMM_to_XMM(regT1, reg);

SSE_ANDPS_M128_to_XMM(regT1, (uptr)mVU_signbit);

SSE_MINPS_M128_to_XMM(reg,   (uptr)mVU_maxvals);

SSE_MAXPS_M128_to_XMM(reg,   (uptr)mVU_minvals);

SSE_ORPS_XMM_to_XMM  (reg, regT1);

What this does is:
If (reg == Positive_NaN) reg = 0x7f7fffff; // Floating Point Positive Max Value
If (reg == Negative_NaN) reg = 0xff7fffff; // Floating Point Negative Max Value
If (reg == Positive_Infinity) reg = 0x7f7fffff; // Floating Point Positive Max Value
If (reg == Negative_Infinity) reg = 0xff7fffff; // Floating Point Negative Max Value

Now this actually preserves the sign when a number is NaN, but as you can see its over twice as slow! (5 instructions compared to 2)

If you've ever wondered what "Extra + Preserve Sign" Clamp mode is actually doing, its using the Preserve Sign clamping I posted above instead of the Normal Clamping.

But also you might have noticed that any game that needs "Extra + Preserve Sign" mode, generally works with just "Extra" mode which only uses Normal Clamping (I don't know any game that 'needs' the preserve sign clamping that doesn't work with Extra clamping).

The reason is because of what I explained earlier that games just seem to care about not having NaNs, and they don't really care if you convert them to a positive number.
(NaNs are very evil because they propagate to the result of any other instruction. So the only way to stop this is to make the number an ordered/normal number.)

Also... when coding with SSE you generally try and avoid conditionals, not just because branch prediction misses are costly, but also because you might be working with 4 vectors at once; and you have to gear your thinking towards algorithms that can 'preserve' the state of some vectors while only modifying the ones you want.
Many times if don't think it out, and you try and use conditionals with SSE, you'll end up with 16 different cases (4 vectors, and each one can either match the condition or not match the condition) and things can get very ugly, large, and slow.

But sometimes conditionals are useful, and you can do very smart stuff with them. Also a very smart thing you can do when you can't work out the algorithm w/o jumps, is to have a Jump table with all 16(or however many) results possible (I remember i implemented some MMI instruction using some cool tricks like that (see the function 'void recQFSRV()' in pcsx2)).
Doing tricks like that is often a lot of work to code though...

Also there's a bit of an overhead when using conditional jumps and SSE. Many times you might have to copy some info to a GPR like echosierra mentioned. And the actual setup to the jump can be a couple of instructions.
But this highly depends on what you're trying to do, and how you implemented it.

Anyways I guess I should also clear up that all the code I've posted here is the very high-level implementation of what we do.
When doing the low-level SSE/x86 asm code, it makes a bit more sense why we do it this way Smile

dralor · 09-09-2009, 03:19 AM

Okay here you lost me. Maybe because I never coded in SSE but why would the one take 5 instructions vs 2? If I understand this correctly in preserve sign there are 3 cases. The normal float case, the nand or inf pos and the nand or inf neg case. In normal it does nothing no point in clamping. In pos wouldn't you just mov 0x7f7ffff into the reg and for neg mov 0xff7fffff into the reg. While there maybe issues in moving directly it usually can be accomplished with an xor or most systems that don't allow a constant inputted into certain registers.

**Air** · 09-09-2009, 03:27 AM

(09-09-2009, 03:19 AM)dralor Wrote: Okay here you lost me. Maybe because I never coded in SSE but why would the one take 5 instructions vs 2? If I understand this correctly in preserve sign there are 3 cases.

You understand incorrectly. I shall quote from Cotton:

cottonvibes Wrote:Also... when coding with SSE you generally try and avoid conditionals, not just because branch prediction misses are costly, but also because you might be working with 4 vectors at once; and you have to gear your thinking towards algorithms that can 'preserve' the state of some vectors while only modifying the ones you want.

Many times if don't think it out, and you try and use conditionals with SSE, you'll end up with 16 different cases (4 vectors, and each one can either match the condition or not match the condition) and things can get very ugly, large, and slow.

In other words most of the time there are not merely the three cases you accounted for. Any one of the four vectors could either be NAN, INF, or valid. Therefore, each of the four vectors of the VU register must have its clamping performed separately of the other 3 vectors. And to do that without coding 16 different cases requires clever use of SSE's packed operations.

dralor · (This post was last modified: 09-09-2009, 03:34 AM by dralor.)

Well 12 but I think I follow now. It's the parallel nature of vector instructions that are causing the issues.

well maybe not 12 my statics wants me to say 3^4 whater that is off the top of my head which is much larger than 16.

**cottonvibes** · (This post was last modified: 09-09-2009, 06:32 PM by cottonvibes.)

(09-09-2009, 03:19 AM)dralor Wrote: Okay here you lost me. Maybe because I never coded in SSE but why would the one take 5 instructions vs 2? If I understand this correctly in preserve sign there are 3 cases. The normal float case, the nand or inf pos and the nand or inf neg case. In normal it does nothing no point in clamping. In pos wouldn't you just mov 0x7f7ffff into the reg and for neg mov 0xff7fffff into the reg. While there maybe issues in moving directly it usually can be accomplished with an xor or most systems that don't allow a constant inputted into certain registers.

with the normal clamping:
SSE_MINPS_M128_to_XMM(reg, (uptr)mVU_maxvals);
SSE_MAXPS_M128_to_XMM(reg, (uptr)mVU_minvals);

The first instruction is saying
reg = Min(reg, 0x7f7fffff);

when using SSE Min/Max Infinities behave as normal numbers, so if reg is positive infinity (0x7f800000), it would be greater than 0x7f7fffff, and so min(pos_infinity, 0x7f7fffff) is 0x7f7fffff.

but SSE's Min/Max act pretty funny when a value is a NaN.
It will always return the second operand if one of the operands is a NaN.
So once the first Min is passed, if there was a NaN it will 'always' be converted to 0x7f7fffff.

The next instruction is saying:
reg = Max(reg, 0xff7fffff);

The only reason we need this second instruction is because if reg is Negative Infinity then the first Min() would have left it as Negative infinity, and this second Max() will clamp Negative Infinity to 0xff7fffff.

To sum it up:

Code:
SSE_MINPS_M128_to_XMM(reg, (uptr)mVU_maxvals); // reg = Min(reg, 0x7f7fffff);

// Which Does:

// if (reg == normal_number) reg = reg;

// if (reg == Neg_Inf) reg = reg;

// if (reg == Pos_Inf) reg = 0x7f7fffff;

// if (reg == NaN) reg = 0x7f7fffff;

Code:
SSE_MAXPS_M128_to_XMM(reg, (uptr)mVU_minvals); // reg = Max(reg, 0xff7fffff);

// Which Does:

// if (reg == normal_number) reg = reg;

// if (reg == Neg_Inf) reg = 0xff7fffff;

// if (reg == Pos_Inf) This Case is not possible due to Min() above

// if (reg == NaN) This Case is not possible due to Min() above

Now the Preserve Sign Clamp function uses extra instructions to copy the sign bit, and re-apply it to the result.
So that guarantees that the sign-bit will be preserved after the clamping (but that's also why it takes 3 more SSE instructions!)

dralor · 09-09-2009, 04:21 AM

Ahh well it sounds to me as if the SSE instruction set or the IEEE754 standard is as equally quirk as the PS2 in this case causing extra headaches to the conversion than if it was straight forward with the behavior of the 2.

**cottonvibes** · 09-09-2009, 04:45 AM

I should actually add that using SSE4.1 its possible to do the 'preserve sign' clamp code in 2 instructions.

I'm actually going to be working on implementing a lot of clamp stuff to mVU this week.
I'll make sure to add the SSE4.1 optimized code as well.

dralor · 09-09-2009, 04:52 AM

Too bad my my Q6600 doesn't support it and isn't getting upgraded anytime soon. Oh well the downfall of the forward march of technology is there is always something new and better just over the horizon. Nice to know they worked on fixing things like that though.

**rama** · 09-09-2009, 05:49 AM

Some very rare SPS in katamari damacy need extra + preserve sign clamp. I have a savestate of that Wink

frankdd89 · (This post was last modified: 09-09-2009, 10:39 AM by frankdd89.)

I didn't knew that SSE instructions manage the clamp function Smile

btw my suggestion was born because i didn't knew that in those case the use of conditionals were such expensive.
Biggrin

very nice explaination anyway, it is really interesting discussing these coding things.

Login
Username:
Password:	Lost Password?
	Remember me