Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: Bad performance (possibly FBO related) on NVIDIA

  1. #1

    Bad performance (possibly FBO related) on NVIDIA

    Hi,
    I'm developing a 3D engine which is both OpenGL & Direct3D capable.

    I'm seeing quite bad OpenGL performance (possible pipeline stalls) when running Linux & the NVIDIA driver (up to 50ms/frame.) When the same machine is booted into Windows, performance is as expected (below 10ms/frame)

    I'm seeing this both on a laptop with a Geforce GT540M and a desktop machine with GTX580.

    On Mac OS X, the same OpenGL rendering code also works without performance issues on NVIDIA hardware. Also, Linux + AMD hardware seems to work fine.

    The performance issue seems to be proportional to the number of times I change the surfaces bound to the FBO (I use a single FBO object.). Therefore forward rendering without shadows works fine, but anything like adding shadows or postprocessing, or doing deferred rendering starts to bog down the performance.

    Anyone else seen something like this?

    (btw. the engine code is public at Google Code: http://urho3d.googlecode.com)

  2. #2
    Junior Member Newbie
    Join Date
    May 2010
    Posts
    28

    Re: Bad performance (possibly FBO related) on NVIDIA

    Have you tested with multiple FBO:s, one for each surface?

  3. #3

    Re: Bad performance (possibly FBO related) on NVIDIA

    Not yet, I plan to.

    Actually I've narrowed things down a bit .. it is not the amount of surface changes after all, but rather the amount of drawcalls that go to the FBO instead of the backbuffer.

    For example a forward-rendered, complex scene without bloom post-effect has no problems, as it goes directly to the backbuffer. But the same scene with bloom on must be rendered to the FBO first so that it can be operated on, and for a complex scene that causes a > 20ms performance hit.

  4. #4
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,882

    Re: Bad performance (possibly FBO related) on NVIDIA

    Quote Originally Posted by cadaver
    I'm seeing quite bad OpenGL performance ... when running Linux & the NVIDIA driver (up to 50ms/frame.) ... with GTX580 ... it is ... the amount of drawcalls that go to the FBO instead of the backbuffer. ... Anyone else seen something like this?
    Only in one specific scenario. And I use a number of FBOs to render frames, with NVidia, on Linux (for many years), on GTX580, GTX480s, GTX285s (and others) just like you are.

    The only time I've seen anything like this is when you're hitting up against (or flat blowing past) GPU memory capacity. When you do, that means the driver can/will start tossing textures and such off the board to try to make room so it can keep everything it needs for rendering batches on there, and that can result in massive frame time hits as it tries frantically to play musical chairs with CPU and GPU memory to render your frame. This includes your shadow textures, which may be swapped off the board to make room for other things when you're not rendering to them.

    So check how much memory you're using. Use NVX_gpu_memory_info. It is trivial and well worth your while. In my experience, you should never see the "evicted" number > 0 (on Linux). If you do, you're blowing past GPU memory. Shut down/restart X via logout/login or Ctrl-Alt-Bkspc (or just reboot) to reset the count to 0.

    Also, if you've got one of those GPU memory and performance wasting desktop compositors enabled, disable it (for KDE, use kcontrol GUI to disable effects/composting, or just Shift-Alt-F12).

    As far as controlling which get kicked off first, glPrioritizeTextures is generally mentioned as a no-op. And while NVidia hasn't updated their GPU programming guide in a good while (3 years), we might have some clue as to how to influence texture/render target GPU residency priority through advice there (see below). But best advice, just never fill up GPU memory and then you don't have to worry about this.

    Quote Originally Posted by NV GPU Prog Guide, GF8 Edition
    In order to minimize the chance of your application thrashing video memory, the best way to allocate shaders and render targets is:

    1. Allocate render targets first
    a. Sort the order of allocation by pitch (width * bpp).
    b. Sort the different pitch groups based on frequency of use, The surfaces that are rendered to most frequently should be allocated first.
    2. Create vertex and pixel shaders
    3. Load remaining textures"

  5. #5

    Re: Bad performance (possibly FBO related) on NVIDIA

    The problem indeed seems to be using a single FBO. As a test, I switched to using another FBO for shadow map rendering (switching between shadowmaps and the main view is the most frequent rendertarget change for me), and most of the "unexpected" performance hit went away. The rendering as a whole is still some constant factor slower than on Windows & OpenGL, but it's much more consistent now.

    Now just to implement the multiple-FBO mechanism properly and transparently to the caller

    Thanks to all who replied!

  6. #6
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,882

    Re: Bad performance (possibly FBO related) on NVIDIA

    Quote Originally Posted by cadaver
    The problem indeed seems to be using a single FBO. As a test, I switched to using another FBO for shadow map rendering (switching between shadowmaps and the main view is the most frequent rendertarget change for me), and most of the "unexpected" performance hit went away. The rendering as a whole is still some constant factor slower than on Windows & OpenGL, but it's much more consistent now.

    Now just to implement the multiple-FBO mechanism properly and transparently to the caller

    Thanks to all who replied!
    Over the course of a frame, in rebinding different render targets to the FBO, do you ever change the resolution and/or pixel format of the FBO?

    I don't know if it still is, but used to be that this was a slow path in the NVidia driver (circa GeForce 7 days). And yeah, the solution was to avoid doing that -- use multiple FBOs.

  7. #7

    Re: Bad performance (possibly FBO related) on NVIDIA

    Yes, the shadow maps (or possible post-processing buffers) are different size and format.

    Would it possibly be that the Linux driver is still using older code?

  8. #8
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,882

    Re: Bad performance (possibly FBO related) on NVIDIA

    Quote Originally Posted by cadaver
    Yes, the shadow maps (or possible post-processing buffers) are different size and format.

    Would it possibly be that the Linux driver is still using older code?
    The implication of your statement is that there's a newer improved version. However, IIRC from the NVidia post, it's not that this a path that was written inefficiently, but just that it's a slow path. It said that reconfiguring the resolution or internal format of an FBO was expensive, and to avoid doing that a lot.

  9. #9

    Re: Bad performance (possibly FBO related) on NVIDIA

    I can confirm further improvement (on Linux) by implementing a map of FBO's, where the resolution and format form the search key.

    However, what is curious that on Windows performance was always fine with the same hardware, and it did not improve over the initial code, which was just binding all surfaces to the same FBO.

  10. #10
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,882

    Re: Bad performance (possibly FBO related) on NVIDIA

    Quote Originally Posted by cadaver
    However, what is curious that on Windows performance was always fine with the same hardware, and it did not improve over the initial code, which was just binding all surfaces to the same FBO.
    That is interesting. Wonder if the Windows driver is doing the FBO virtualization thing under the covers that we're both doing in the app.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •