Part of the Khronos Group

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 2 of 2

Thread: performance considerations for branching in shaders

  1. #1
    Junior Member Newbie
    Join Date
    May 2013

    performance considerations for branching in shaders

    Is it a good practice to avoid branching statements (if else switch) in shaders? Shaders are generally optimized for high throughput on data parallel workloads. I have read several OpenGL books and books on graphics hardware. OpenGL books never mention performance hits casued by unnecessary branches. But books on graphics hardware suggest shader cores deal poorly with branches.

    For example, in modern CUDA NVIDIA hardware the CUDA cores operate less than optimally (warps not fully populated or something like that) when they encounter branches.

    Also what about recursion. AFAIK CUDA and ATI shader cores do not support recursion. Is recursion permitted in GLSL? My understanding is that shaders these days generally are simple cores and do not include complex context switching/stack support to permit recursive calling.

    I am not an OpenGL expert so I would appreciate your insights.

  2. #2
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Recursion is no more allowed in GLSL than it is in OpenCL or CUDA.

    As you point out, the branching issue is in part due to the nature of the hardware. GLSL doesn't change what your hardware is doing, so that will still generally be true.

    However, just as with OpenCL or CUDA, if you have strong locality with respect to your conditions, if most of the fragments (or vertices) that all go along one branch are near each other, then it probably won't be too big of an issue.

    As with all performance issues, profile before trying to optimize.

    OpenGL books never mention performance hits casued by unnecessary branches.
    And why should they? OpenGL books are about OpenGL, not the hardware that runs it. OpenGL defines what the API does, not how fast it goes. That's defined by the hardware.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts