PDA

View Full Version : Boolean Branches in shader



techwinder
04-03-2016, 05:00 AM
Hi everyone,
Being a bit new to modern opengl, I have this very general question. Is it good practice to have conditional branches based on uniform bools in the fragment shader ?



uniform bool lightOn;
uniform bool hasTexture;
...
if(lightOn)
{
if(hasTexture)
{
MaterialAmbientColor = vec4(texture2D(textureSampler, UV).rgb*LightAmbient, vertexcolor.a) ;
MaterialDiffuseColor = vec4(texture2D(textureSampler, UV).rgb*LightDiffuse, vertexcolor.a);
}
else
{
MaterialAmbientColor = vec4(vertexcolor.rgb * LightAmbient, vertexcolor.a);
MaterialDiffuseColor = vec4(vertexcolor.rgb * LightDiffuse, vertexcolor.a);
}
...


I understand that this will lead to more GPU work due to the branch being executed for each pixel, but this kind of instruction is usually very fast on a CPU processor. Having these branches avoids the trouble of managing four shaders for all combinations of light on/off and textures on/off.

Also is there such a thing as a uniform bool, or should it be integers only?

Thanks in advance for any insight.

Spoops
04-03-2016, 07:08 AM
You're usually safe when branching on uniform variables. On most Nvidia cards, shaders are recompiled when changing uniform values (or rather re-assembled, since drivers often make an optimization there), and there is no branching anymore. I don't know about AMD or Intel cards though.

GClements
04-03-2016, 12:28 PM
I understand that this will lead to more GPU work due to the branch being executed for each pixel, but this kind of instruction is usually very fast on a CPU processor.

The main issue with branches on a GPU is that it may result in the GPU executing both branches. However, this only occurs if the branch test yields a different result for different invocations which are executed concurrently. E.g. for a fragment shader, if the test is true for some pixels and false for others which are being evaluated concurrently, both branches will be executed.

If the test depends only upon uniform variables, this can't happen.

If a test in a fragment shader depends only upon uniform variables or "flat"-qualified inputs, this isn't an issue for modern hardware. It can be an issue for older hardware which lacked branch instructions and where uniform branches were optimised by re-compiling the shader if the test changed.



Having these branches avoids the trouble of managing four shaders for all combinations of light on/off and textures on/off.

As the shader becomes more complex, you may start to run into issues with the fact that all of the variables within the shader are "live", even if there's no combination of uniforms which will use all of them. In that situation, using individual shaders may be more efficient. But it would need to be far more complex than your example before that becomes an issue.

Another issue is that the implementation may use memory bandwidth to feed vertex attributes which aren't actually used (e.g. texture coordinates in the case where texturing is disabled). I don't know how "smart" current implementations are at handing this, but there are cases where it can't reasonably determine what's required and what isn't (e.g. if texturing was enabled or disabled on a per-primitive basis according to a flat-qualified vertex attribute).

On the other hand, using a single shader may allow you to coalesce more primitives into a single draw call (but that requires using attributes rather than uniforms). Also, the cost of updating internal state when changing shaders may be more than for changing uniforms.

IOW, there are reasons why multiple shaders may be faster, there are reasons why they may be slower. If you want a definitive answer, you have to profile the program on the target hardware.

techwinder
04-03-2016, 12:54 PM
That's interesting information, thanks.
I have been struggling to understand why one of these branch statement behaves correctly on modern GPU and erratically on a 5 years old ATI 6750 M.

It can be an issue for older hardware which lacked branch instructions and where uniform branches were optimised by re-compiling the shader if the test changed.
This may or may not be the cause, but I'll implement two different shaders to rule it out just in case, and also to support old hardware.