[Blog] Explanation of impossible blend
#1
This explanation was originally written by Gregory:


The goal of blending is to combine two colors. The general equation on a modern GPU is:
Code:
coefficient1 * color1 +/- coefficient2 * color2
Color1/Color 2 are either the source color or the destination color.
Coefficient1/Coefficient2 are either the alpha value (transparency) of source/source2/destination, 1 - alpha, or a constant. The GPU will clamp the coefficients to [0;1]

The general equation on the PS2 however is:
Code:
(Color1 - Color2) * Coefficient + Color3
Color1/Color2/Color3 are either the source or destination color or zero.
Coefficient is the alpha value (transparency) of the source or destination, or a constant.

The issues we have with this are as follows:
1. In the PS2, the coefficient factor could range from [0;2] (fortunately this almost never happens)
2. If Color3 and Color1 are the same source, the equation will be:
Code:
Color1 * (1+Coefficient) - Color2 *Coefficient
which will result in the first half of this equation always being larger than 1. This is a problem because the GPU is limited to 1. This is why this type of blending is impossible on the fixed function unit of a PC's GPU.

Our recent update fixed the second case. Since it is impossible to do that blending on the fixed function unit, we instead emulate them in the GPU's fragment shader. Fragment shaders are very small dedicated CPUs so it is quite easy to do a few small operations on them.

There is a catch however. Fragment shaders(like any CPU) are relatively slow. In order to compensate for this the fragments are executed out-of-order. For example, if you do a draw call consisting of 2 triangles, it is possible that the second triangle will be computed before the first one. It is quite annoying because blending is an in-order operation. However as long as primitives don't self overlap, only a single fragment shader must execute and therefore there is no issue with order.

Great. At this point we just need to split the draw calls into N draw calls without primitive overlap. It's not free performance wise to do this, but it remains reasonable in some cases.

Moving on, we need to access the destination value to compute the final value, however the GPU has a texture cache. The texture cache is read only so that there is no coherency issue. The target value can be written but all the subsequent reads will be wrong because of the discrepancy with the cache. Getting back to the input texture case, the texture is read only during the draw but this could change on the following draw. There must be a way to invalidate the cache if you upload a new texture to the same location. The driver has this ability, but until recently applications did not. GL 4.5 changes this. A function is provided to invalidate the cache Smile The end result is that we can implement basic blending in the fragment shader instead of relying on the limited GPU core.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply

Sponsored links

#2
Quote:which will result in the first half of this equation always being larger than 1. This is a problem because the GPU is limited to 1.

Dumb question: Why can't we multiply the needed parameters by 0.5 before sending them to the unit, then divide the output by 0.5?

Is it an accuracy and rounding issue, or is the extra operations too expensive?
Reply
#3
I can't answer that, but maybe gregory can. In theory it makes sense, but I'm sure he woulda thought of it if it were that simple.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#4
That is why i'm asking why is it not possible, not if it is possible.
Reply
#5
weird topic indeed. looked it up a lil. that formula

Quote:Color1 * (1+Coefficient) - Color2 *Coefficient

looks wrong tho. how did you do that? that's not a permuation of that general. i don't get there. and is the coefficient really 0-2? sure easy to figure if 1 and 3 are same on that

Quote:(Color1 - Color2) * Coefficient + Color3

a ps2dev post i googled real quick even states

Quote:Cout=(A-B)*C>>7+D

(white - black) * 1 + white = white + white. the result is sorta x2. a color boost. is that being clamped or wrapped tho? the rest is regular subtractive and additive blending. multiplicative actually not possible. the read coefficient is really tricky tho. i have two formulas here. the ps2dev one does only 1 alpha step (highest nibble = 128+). yours does more. so.. i dunno about that and...

wonder tho if that works with multitexturing and the temporary register? d3d has something like that. been a while i did that. first stage for subtractive. second stage is multiplyadd the temporary with the coefficient and the additive. there's alphareplicate. where you could put the alpha source into.

the actual problem nonetheless is to read the coefficient from the destination. the "impossible"? you'd have to do a render trick. use the output framebuffer as texture at the same time. does that not work? i'll have to difg some code tho...

so... don't take anything in that post for real coder stuff. Laugh
Reply
#6
You can control the input of blending but you can't apply any operation after the blending unit. So it isn't possible.

The PS2 blending unit can either clamp or wrap the output... Modern GPU can only clamp

@dabore,
if color1 == color3, you can factorize the code.

You're wrong C>>7 can be above 1. C can range from 0 to 255.

Quote:Primitives have only color information - this is enough for a small surface but not efficient to draw a big surface
Sure, it isn't impossible in the absolute because I did it (at least some cases) Tongue2 However it is impossible with the fixed function unit.



Quote: Coefficient1/Coefficient2 are either the alpha value (transparency) of source/source2/destination; 1- alpha or a constant. The GPU will clamp the coefficients to [0;1]
In short, you can use any input as alpha coefficient factor Wink

Quote:The main situation it doesn't work in is splitting the draw call.
I don't understand myself Tongue2 I wanted to say that splitting draw call is not enough to make it work.
Reply
#7
(06-04-2015, 09:20 AM)gregory Wrote: @dabore,
if color1 == color3, you can factorize the code.

yeh... maybe... but... just look at it. you can see already that it's slower. an multiplication and an addition more. as a shader i'd rather put that direct in there rather then "complicated". and... this way or that way... to fix that blend i presume you always have to use two-stages... somehow... that's a given. however you do it. the blend unit has that contraint with the addition. shaders have that "copy problem" due to the parallel computation glitches. i actually dunno if the old fixed function multitexture would have done that. neither have i really tried to plug a texture rendertarget trick to a shader. that's how far as i looked over it and can see real quick. most of it is foggy to me tho. Biggrin
Reply
#8
GPU Blending unit has 2 inputs (GS 3). They explain you the operation with + * but the hardware unit do it in a different way.

On the shader the biggest issue is not the 2/4 extra operations. First shaders are out-of-order. Seconds shaders (due to cache) can't (always) read the destination. Multitexture won't change it. The issue is the hardware. Copy is not a solution it is too expensive, and you need to copy it again after an update.

The others solution is to do store all fragments, then sort them then do the blending. It is much more complex to implement so I didn't do it.
Reply




Users browsing this thread: 1 Guest(s)