Performance: alternative for if ( ... ) { }

Hi there,

some years ago I read that if-statements should be avoided in shader source to get better performance.


if (UniformTwoSided && Dot < 0.0)
{
  Dot *= Flip;
  FragNormal *= Flip;
}

Is that still the case? And if yes, is that a faster alternative for the code above?


float Flip = UniformTwoSidedFlipVec[int(step(0.0, Dot))];
Dot *= Flip;
FragNormal *= Flip;
static const float FlipVec[2][2] = { {1.0f, 1.0f}, {-1.0f, 1.0f} };
glUniform2fv(FlipVecLocation, 1, FlipVec[(int) TwoSided]);

Basically the second code uses a vector and some client side switch
to achieve the same result but makes the code less readable.
Not even sure if it makes it faster so your comments are really welcome!

Thanks!

The compilers will look at the size and complexity of the code that’s in if/else blocks. The "Dot *= Flip; FragNormal *= Flip; " part, in this case. If it’s small/calculations-only, then predicated instructions will be generated (much like instructions in ARM cpus). Otherwise, dynamic branching - which will halve your performance per jump if there’s no coherence. In some cases you want early-return from a shader, there you’re almost always at a win.

If you try to avoid if-statements like the plague, you may end-up adding 5-10 arithmetic instructions to do what 2 predicated instructions can do.

Some early programmable GPUs evaluated both, the if and the else part of a switch and ignored one result. In that time each if slowed you down. Nowadays the GPUs jump if the condition is false - if it is false for all processed fragments/vertices (that’s for example batches of 32 on NVidia GPUs). If some elements take the if and some dont, all cores will evaluate the same path (== both pathes) or wait. So for fragments shaders the worst case would be:


if (gl_FragCoord.x % 2 == 1) {
...
} else {
...
}

While something like this won’t get you into trouble:


if (uniformValue == 1) {
...
} else {
...
}

It all boils down to the question, does your condition evaluate differently for elements that are (likely) processed together?

Edit: Removed due to stating the exact same thing as menzel. That’s what I get for not reading carfully enough.

thokra: you’re right but that’s also what i said: older GPUs executed both pathes. always. Newer GPUs execute both pathes only if some of the elements (vertices/fragments) of one batch use different branches (and this number can be as high as 32 on some NVidias or even higher).

If this is your usecase, you could have a look at subroutines, which in theory were introduced specifically for such usecase (different codepaths depending on uniform value).
Dont have much idea as to what performance profile of the feature is though.

Thank you all for the comments!

Since I have only two lines in the if-block it might be possible that these predicated instructions are used.

Are if-statements without an else-block generally less “dangerous” regarding performance?

Well, in many cases yes. Think about it, if you use predicated instructions, you have probably less predicated instructions if you don’t have an else branch. If actual branching is used, you get the worst case performance of the block behind the “if” being executed always, even if only one thread took that path, instead of having to execute two branches (the “if” and the “else”) in case of divergence.

However, this all depends on how large is your “if” and “else” block and how diverging is your branching. It’s not the “else” itself that could cost you much performance.

Got it, thanks!

Coherence is a big deal when it comes to branching… take a gander at http://bps11.idav.ucdavis.edu/, talk entitled “Real-Time Rendering Architectures” from Mike Houston, take a gander at page 27 “What about branches?” … the upshot is essentially this: if your branching has lots of coherence on the screen, then things are great, but if there is little coherence, the GPU will execute both branches often [the basic nutshell is like this: when the GPU runs the fragment shader on a “block” of pixels (like 4x4-16x16 [I do not remember the numbers, but pretty sure 16x16 is much bigger that it is]), if the condition is all the same then only that code gets executed, but if varies, both branches get “executed” [not exactly executed, but as far as clocks are concerned it is]… some GPU’s have like the multi-resolution thing too where the block gets it’s size changed up to a certain minimal size (I think like 2x2) so it avoids the “execute” both branches… as a side note, if you have a branch that is dependent on a uniform, either do as _kyle suggest and use sub-routines OR just have a different shader and rather than if(something) have #if something.

if and #if in my experience are both evaluated at compile time if the value used as a conditional is a const expression (or evaluates to a constant) --on NVidia at least. if () results in “much” more readible code.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.