PDA

View Full Version : shader conditionals



ugluk
09-21-2011, 03:44 AM
While reading posts here as well as on gamedev, I often find 2 kinds of posts as regards shader conditionals:

- one is to combine all shaders into one megashader, then configure it, similar to the fixed pipeline,

- people saying the addition of a single conditional statement (i.e. an if or a switch) to their shader, spoiled perf considerably. This happens often when people are discussing mobile platforms.

Obviously both approaches are correct, but they result in different perf. Do there exist rules of thumb as to how many conditionals are too many in shaders? Or how can I find out when the cost of switching a shader (which, I suppose, is fixed) outweighs the cost of a certain number of conditionals (but the cost of each conditional is probably not fixed).

sqrt[-1]
09-21-2011, 05:46 AM
It is obviously going to depend on the hardware you are running the shader on, but one thing to keep in mind is what variables are involved in the conditional statement.

eg.

If it is a constant - if(0) etc) - it will get compiled out

If it is a direct uniform access - if(uniformBool) {} the branch can often have minimal cost

If it is a branch based on some dynamic variable, it is typically more expensive.

If it is a branch in a pixel shader that varies a lot per pixel (true/false/true/false etc) it will be quite expensive.

aqnuep
09-21-2011, 05:59 AM
To clarify a little bit the last one:

Shader instances are usually run synchronously on multiple shader cores (16, 32, 64) so if the condition does not evaluate to the same value on all these instances then both branches are executed, just the saving of the results are masked out on the "if" branch for those invocations which evaluated the condition as "false" and the results of the "else" branch (if one exists) are masked out for those invocations which evaluated the condition as "true".

So the main point is that conditionals that are coherent across shader invocations are faster.

ugluk
09-21-2011, 06:46 AM
What interesting insights! Probably the mobile GPUs don't have a great many cores and hence the poor perf results from conditionals.

Suppose one writes a mega-shader. Is it useful to avoid unnecessary changes to uniform variables? So far, I haven't done that, as I consider uniform changes to be cheap and it would complicate batching. I simply update all the uniforms after switching.

Aleksandar
09-21-2011, 08:51 AM
What interesting insights! Probably the mobile GPUs don't have a great many cores and hence the poor perf results from conditionals.
You didn't get the point. Read again what aqnuep has written.

aqnuep
09-21-2011, 09:30 AM
What interesting insights! Probably the mobile GPUs don't have a great many cores and hence the poor perf results from conditionals.
You've misunderstood me, as it was mentioned already.
Anyway, conditionals have poor performance on mobile GPUs because they use an architecture much more similar to early Shader Model 2.0 desktop GPUs that also suffered from poor performance when using conditionals because the condition evaluation stalled the cores till the results were available.

This is not really an issue in case of modern desktop GPUs because one core can have several shader instances active. In case one of these shaders is waiting for e.g. the result of a texture fetch or the evaluation of a conditional, another shader instance that has nothing to wait for can take its place. This is the so called "latency hiding" mechanism implemented in modern GPUs.


Suppose one writes a mega-shader. Is it useful to avoid unnecessary changes to uniform variables? So far, I haven't done that, as I consider uniform changes to be cheap and it would complicate batching. I simply update all the uniforms after switching.
Uniform variable changes are not that lightweight, especially when they come in big numbers (that's the reason we have uniform buffers now), but they are not the only reason to use a mega-shader. Also, with Shader Model 5.0 hardware we have shader subroutines which work like function pointers that allow splitting the execution path of shaders without conditionals.

Anyway, when and how to use conditionals heavily depends on the target GPU generation, whether we are talking about desktop GL or GL ES and many other things. On mobile GPUs I would rather not use conditionals as of now, but e.g. if you target GL3+ capable hardware, I would not bother that much of when to use conditionals as sometimes it can be advantageous if you can skip a few expensive operations using conditionals (e.g. in case of skeletal animation, skipping the calculations for bone matrices that would have 0 weight anyway).

sqrt[-1]
09-21-2011, 03:03 PM
Just a note on changing uniforms, on earlier Nvidia hardware (Geforce 6/7) changing uniforms in a pixel shader was very slow - almost like it was re-compiling the shader itself.

Aleksandar
09-21-2011, 03:10 PM
]Just a note on changing uniforms, on earlier Nvidia hardware (Geforce 6/7) changing uniforms in a pixel shader was very slow - almost like it was re-compiling the shader itself.
What did you mean with the previous claim? The uniform is a constant in the shader and cannot be changed.

Alfonse Reinheart
09-21-2011, 04:05 PM
He's talking about how changing certain uniforms to certain values (0 and 1, for example) caused NVIDIA drivers to recompile a shader. This was mostly done around uniforms in conditional logic.

Dark Photon
09-21-2011, 04:08 PM
What did you mean with the previous claim? The uniform is a constant in the shader and cannot be changed.
See these threads:

* nVidia FP uniforms driver optimization lags (http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=169125#Post1691 25)
* glUniform is slow? (http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=250009#Post2500 09)

from these links on down.

Aleksandar
09-22-2011, 02:40 AM
Thank you both for the clarification. I new about the problem, but I think it is not related to the hardware architecture, but rather to the drivers implementation. I have to admit that SM 3.0 cards have smaller number of registers, and that probably leaded to some kind of optimization.

The current state of NVIDIA drivers concerning optimization is quite bright. Last night I tested uniform usage, and here are the conclusions:
- drivers eliminate uniforms changes if there is no drawing calls for the current program (shader),
- drivers also eliminates superfluous uniform setups (setting to a same value).

Everything is tested on 8600M GT graphics card (SM 4.0) with R266 drivers.

Several months ago I take some time to optimize a shader of mine with lots of trigonometric functions calls by caching calculated values and removing calculations for the same values. Can you guess what performance boost I achieved? None! GLSL compiler is highly optimized. I wonder what will happen if we could disable optimizations in both GLSL compiler and drivers? It would be much harder for us, the programmers, but maybe we could squeeze a little more performance, and remove potential bugs in the drivers optimization. But currently I have no objections on NVIDIA drivers optimizations. ;)

sqrt[-1]
09-23-2011, 05:01 AM
I am fairly certain it is a hardware limitation in that generation of Nvidia hardware - as I have worked a lot on PS3 that uses the same chip.
Basically, there are no uniform registers on that generation of pixel shader hardware - only inline constants that have to be updated.