Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 2 of 2

Thread: Modern OpenGL features and GPU bottleneck

  1. #1
    Junior Member Newbie
    Join Date
    Jun 2014
    Posts
    5

    Modern OpenGL features and GPU bottleneck

    Hello.
    I am working on real-time GPU Raytracer using OpenGL.

    I am targeting OGL4+ GPUs(including Nvidia Fermi which doesn't support bindless graphics).
    My engine is always GPU bottlenecked and uses uniforms without uniform blocks, glDrawElements and traditional texture bindings.
    Using Intel Core i7 950 + Nvidia GeForce 690 GTX.
    CPU: 3-4 ms. GPU 15-25 ms(Global Illumination).

    I plan to add support for Uniform buffers, MultiDrawIndirect and TextureArrays to reduce GPU overhead.
    Can these features potentially improve GPU performance, or it is just a CPU optimization, which will move some tasks to the GPU and reduce performance?

  2. #2
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,475
    Quote Originally Posted by Feature420 View Post
    I am targeting OGL4+ GPUs(including Nvidia Fermi which doesn't support bindless graphics).
    Wait. NV bindless "is" supported on Fermi cards. In fact, it's even supported on cards that are pre-Fermi all the way back to at least GeForce 8.

    My engine is always GPU bottlenecked and uses uniforms without uniform blocks, glDrawElements and traditional texture bindings.
    Using Intel Core i7 950 + Nvidia GeForce 690 GTX.
    CPU: 3-4 ms. GPU 15-25 ms(Global Illumination).

    I plan to add support for Uniform buffers, MultiDrawIndirect and TextureArrays to reduce GPU overhead.
    Can these features potentially improve GPU performance, or it is just a CPU optimization, which will move some tasks to the GPU and reduce performance?
    Before you jump to techniques, it sounds like you first need to do a bottleneck analysis. First, what is your performance goal? Once you reach it, stop! Next, what about your processing is consuming the biggest amount of frame time? Are you compute bound? Are you memory bound? Are your threads very divergent? And specifically what about what you're doing is making that the bottleneck.

    Once you have that, you can look at techniques to optimize that bottleneck.

    Since you said ray tracing, I'll venture a guess. Past the primary rays, full ray tracing (generally speaking) is very divergent, and GPUs aren't good at that. So my first guess is that you might have a big problem with thread divergence. Also, general ray tracing involves lots of spatial queries, which is very memory bandwidth intensive. GPUs hide memory latency by having a bunch of threads running and swap other threads in while others are waiting on mem accesses. However, if all your threads are waiting on memory reads, ...well, you get the idea. So you could be memory bound.

    So anyway, I'd suggest you profile first (tried NSight?). Then once you know the largest bottleneck, figure out what you can do about it.

    As to the techniques you mention, the first two probably won't help you much; they're largely to get rid of CPU overhead and avoid GPU pipeline bubbles. The latter TBD, depending on whether your algorithm and your card are faster with texture arrays than a bunch of individual textures.
    Last edited by Dark Photon; 06-30-2014 at 05:15 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •