performance considerations for branching in shaders
Is it a good practice to avoid branching statements (if else switch) in shaders? Shaders are generally optimized for high throughput on data parallel workloads. I have read several OpenGL books and books on graphics hardware. OpenGL books never mention performance hits casued by unnecessary branches. But books on graphics hardware suggest shader cores deal poorly with branches.
For example, in modern CUDA NVIDIA hardware the CUDA cores operate less than optimally (warps not fully populated or something like that) when they encounter branches.
Also what about recursion. AFAIK CUDA and ATI shader cores do not support recursion. Is recursion permitted in GLSL? My understanding is that shaders these days generally are simple cores and do not include complex context switching/stack support to permit recursive calling.
I am not an OpenGL expert so I would appreciate your insights.