performance considerations for branching in shaders

Is it a good practice to avoid branching statements (if else switch) in shaders? Shaders are generally optimized for high throughput on data parallel workloads. I have read several OpenGL books and books on graphics hardware. OpenGL books never mention performance hits casued by unnecessary branches. But books on graphics hardware suggest shader cores deal poorly with branches.

For example, in modern CUDA NVIDIA hardware the CUDA cores operate less than optimally (warps not fully populated or something like that) when they encounter branches.

Also what about recursion. AFAIK CUDA and ATI shader cores do not support recursion. Is recursion permitted in GLSL? My understanding is that shaders these days generally are simple cores and do not include complex context switching/stack support to permit recursive calling.

I am not an OpenGL expert so I would appreciate your insights.

Recursion is no more allowed in GLSL than it is in OpenCL or CUDA.

As you point out, the branching issue is in part due to the nature of the hardware. GLSL doesn’t change what your hardware is doing, so that will still generally be true.

However, just as with OpenCL or CUDA, if you have strong locality with respect to your conditions, if most of the fragments (or vertices) that all go along one branch are near each other, then it probably won’t be too big of an issue.

As with all performance issues, profile before trying to optimize.

OpenGL books never mention performance hits casued by unnecessary branches.

And why should they? OpenGL books are about OpenGL, not the hardware that runs it. OpenGL defines what the API does, not how fast it goes. That’s defined by the hardware.