Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 9 of 9

Thread: Performance question: Switch shaders, or use empty texture

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Junior Member Newbie
    Join Date
    Nov 2017
    Posts
    3

    Performance question: Switch shaders, or use empty texture

    Hi,

    I have a rendering engine with tiles, a frame may consist of a few hundred tiles. Most tiles will be rendered with a single texture, while a few may be rendered with an extra texture in a different unit.

    Which is faster:

    1. Using two different shaders where one has two samplers and the other only one, and switch shaders according to which type of tile it is.
    2. Using the same shader all the time, but binding a 1x1 transparent pixel texture to the second texture unit for most of the tiles.

    Is there an alternative 3?

    I can probably sort the tiles to reduce the number of shader switches in alternative 1, but there are already some other sort criteria on these tiles, so it may not work.

    Cheers

  2. #2
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,466
    Quote Originally Posted by eldritch972 View Post
    I can probably sort the tiles to reduce the number of shader switches in alternative 1
    You shouldn't need more draw calls than there are shader programs. If you have, then reducing the number of draw calls is going to make more difference than whether you have one shader or two.

  3. #3
    Junior Member Newbie
    Join Date
    Nov 2017
    Posts
    3
    Quote Originally Posted by GClements View Post
    You shouldn't need more draw calls than there are shader programs. If you have, then reducing the number of draw calls is going to make more difference than whether you have one shader or two.
    Putting the whole tile set in one render call is unfortunately not an option due to transforms and other uniforms that differ from tile to tile. So you're saying that in this case it doesn't matter if I'm switching shaders between tiles?

  4. #4
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,466
    Quote Originally Posted by eldritch972 View Post
    Putting the whole tile set in one render call is unfortunately not an option due to transforms and other uniforms that differ from tile to tile.
    If values differ from tile to tile, they probably shouldn't be uniforms.

    Quote Originally Posted by eldritch972 View Post
    So you're saying that in this case it doesn't matter if I'm switching shaders between tiles?
    I'm saying that the main factor is the number of draw calls. The number of shaders sets a lower bound on the number of draw calls (you can't change shaders within a draw call). But using two shaders and thus two draw calls is likely to be preferable to a hundred draw calls all using the same shader.

    Switching shaders between draw calls has some cost relative to the same set of draw calls with the same shader. That cost isn't necessarily any higher than changing uniforms between draw calls (older hardware optimised uniform branches by compiling different variants of the shader for different values of the uniforms, so changing uniforms may actually switch shaders).

  5. #5
    Junior Member Newbie
    Join Date
    Nov 2017
    Posts
    3
    First of all, thanks for taking time to answer, and giving valuable input, it is highly appreciated!

    Quote Originally Posted by GClements View Post
    If values differ from tile to tile, they probably shouldn't be uniforms.
    The values that change are typically a transformation matrix for each tile (this is a 3D globe model, and the tiles are placed in a huge coordinate system with a local origin that moves from from to frame), and some texture transformation parameters (scale/translation encoded in a vec4). Given that I need to target OpenGL ES 3.0 and possibly even 2.0, is there a better way to set these values than using uniforms?

    Quote Originally Posted by GClements View Post
    I'm saying that the main factor is the number of draw calls. The number of shaders sets a lower bound on the number of draw calls (you can't change shaders within a draw call). But using two shaders and thus two draw calls is likely to be preferable to a hundred draw calls all using the same shader.

    Switching shaders between draw calls has some cost relative to the same set of draw calls with the same shader. That cost isn't necessarily any higher than changing uniforms between draw calls (older hardware optimised uniform branches by compiling different variants of the shader for different values of the uniforms, so changing uniforms may actually switch shaders).
    OK, so in my case, given that I can't reduce the number of draw calls any further, I guess it is it better to switch shaders between tiles than using a dummy texture?

  6. #6
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,151
    Quote Originally Posted by eldritch972 View Post
    OK, so in my case, given that I can't reduce the number of draw calls any further, I guess it is it better to switch shaders between tiles than using a dummy texture?
    It's going to depend on your driver, but I suspect it'd be faster to use the same shader with a dummy texture than to switch shaders.

    Realizing that you may not be targeting NVidia GPUs and GL drivers, see slide 48 here for the relative costs of various state changes on NVidia GL drivers: Beyond Porting (NVidia). However, check the "OpenGL ES Developer's Guide" published by the GPU vendor's you're targeting for specific details on their hardware and drivers.

    Also, I would add this to GClements' comments: It's not so much all about the number of draw calls. Yes, assuming the same number and type of pipeline state changes, fewer draw calls can be faster, but that won't necessarily be faster. It's really most important to minimize the number of state changes, and to work hard to reduce the number of state changes of a particular type the more expensive that type of state change is (see the above relative cost table for instance). For instance, if you're changing render targets a lot each frame, particularly on mobile, your performance is going to suffer. And on mobile, you need to be very careful to avoid state changes that will trigger texture ghosting and know exactly which operations on your GLES driver will cause implicit synchronization (i.e. halting your draw thread's execution until some driver-internal condition is met).

    Once you've minimized your state changes (e.g. by grouping batches -- aka draw calls -- with the same state), then consider merging batches with the same state if possible. This can be a win but won't necessarily be. Example: if you're sending batches containing lots of triangles that are out of the view frustum down the pipe, you're going to be wasting lots and lots of GPU cycles letting it cull them out at the triangle level. If you have a lot of content in your scene, it's better group your batches spatially and then perform coarse-grain culling on the CPU to send only the batches that at least partially overlap the view frustum. To support this, you'll probably end up with > 1 batch per state permutation. So I wouldn't obsess about having to have only one batch per unique pipeline state. That's probably only going to be true for toy scenes, or scenes where you are dynamically generating draw indirect batches (which you probably won't be). Rendering the 2nd and subsequent batches with the same pipeline state is often pretty efficient, so don't sweat trying to have only one batch per state combination it if that's not easily done.
    Last edited by Dark Photon; 11-06-2017 at 05:20 AM.

  7. #7
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,466
    Quote Originally Posted by eldritch972 View Post
    The values that change are typically a transformation matrix for each tile (this is a 3D globe model, and the tiles are placed in a huge coordinate system with a local origin that moves from from to frame), and some texture transformation parameters (scale/translation encoded in a vec4). Given that I need to target OpenGL ES 3.0 and possibly even 2.0, is there a better way to set these values than using uniforms?
    What exactly is a "tile" here?

    If you need to support ES 2, there aren't any good options for the more complex possibilities. For desktop OpenGL, you can coalesce chunks of geometry by putting the uniforms into an array (one entry for each chunk) then adding an integer attribute to identify the chunk.

    Splitting draw calls to change textures can be avoided by using array textures or arrays of samplers.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •