Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: NV FBO Performance, Part 2

  1. #1
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,891

    NV FBO Performance, Part 2

    In this test case, I'm seeing a 5950 Ultra and even a 5900 XT smoke a 6800 Ultra. so I'd sure appreciate some insight as to what I'm doing wrong.

    The usage scenario is volume particle lighting, ala Harris . For example:
    </font>
    1. Set up a 32x32 color-only FBO
    2. For all 183188 particles
      • <font size="2" face="Verdana, Arial">Read back a 1x1 to 8x8 pixel region centered on particle loc (glReadPixels)
      • Render QUAD particle into buffer with alpha blend

      </font>
    <font size="2" face="Verdana, Arial">Now I haven't even tried any optimization yet because I'm completely baffled by the stats I'm getting. Here they are:
    • 12960 ms - 5950 Ultra (FBO)
    • 23274 ms - 6800 Ultra (FBO)
    • 5268 ms - 5950 Ultra (system frame buffer)
    • 5547 ms - 6800 Ultra (system frame buffer)

    The first two are rendered to a 32x32 color-only FBO. The latter two are rendered to the bottom-left 32x32 corner of the default frame buffer (MSAA disabled of course).
    This immediately prompts two questions:
    • Why is the older card faster in each technique
    • Why is the system framebuffer path faster than FBOs?

    This is all on the same system with the same app and same rendering path -- only the graphics card has been changed.
    Anyone have an idea what's going on here? --Thanks!

    NVidia Driver: 1.0-7667
    NVidia Cfg: AGPGART, 8x, Fast Writes, SBA

  2. #2
    Senior Member OpenGL Guru
    Join Date
    Dec 2000
    Location
    Reutlingen, Germany
    Posts
    2,052

    Re: NV FBO Performance, Part 2

    Did you uninstall and reinstall the driver when you changed the gfx cards?
    GLIM - Immediate Mode Emulation for GL3

  3. #3
    Senior Member OpenGL Pro
    Join Date
    May 2000
    Location
    Naarn, Austria
    Posts
    1,142

    Re: NV FBO Performance, Part 2

    Is the color attachment of the FBO a texture or a renderbuffer?

    If it's a texture, I guess the glReadPixel call could be a lot slower than on the system framebuffer. I don't think FBOs are optimized for readback.

    Have you tried the same without glReadPixel? Try to archieve the same effect with RTT if possible...

  4. #4
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,891

    Re: NV FBO Performance, Part 2

    Jan:
    Did you uninstall and reinstall the driver when you changed the gfx cards?
    No, I installed the driver with the 6800 Ultra in-place (so it should have been optimal, but wasn't); then I just dropped in the 5950 Ultra.

    Is the driver install card-specific?

  5. #5
    Senior Member OpenGL Guru
    Join Date
    Dec 2000
    Location
    Reutlingen, Germany
    Posts
    2,052

    Re: NV FBO Performance, Part 2

    Well, i don't know if the install is card-specific, maybe not, but i wouldn't simply assume it isn't.

    Jan.
    GLIM - Immediate Mode Emulation for GL3

  6. #6
    Senior Member OpenGL Guru
    Join Date
    Mar 2001
    Posts
    3,768

    Re: NV FBO Performance, Part 2

    The first two are rendered to a 32x32 color-only FBO.
    Um, why 32x32? I could imagine that some hardware would have trouble rendering to hyper-small framebuffers.

    Why is the system framebuffer path faster than FBOs?
    Rendering to a texture will be slower than rendering to a framebuffer. If you just need a rendertarget for reading or something, you should use a renderbuffer, not a texture. Use a texture only if you need to texture the results onto something else.

    Plus, you're using a 32x32 target, which, as I mentioned, may not be well accelerated.

  7. #7
    Junior Member Newbie
    Join Date
    Jun 2005
    Posts
    3

    Re: NV FBO Performance, Part 2

    hi

    seems to be the glReadPixels is done faster with a geforce 5900xt than with a geforce 6800 ultra

    we made this observation during our pbo tests, too

  8. #8
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,891

    Re: NV FBO Performance, Part 2

    Thanks to Overmind, Korval, and Jan for the suggestions.

    Here are the results of the latest tests. Still no silver bullet:

    1. Use renderbuffer instead of a texture

      RESULT: No difference.
    2. Render to a larger texture than 32x32

      RESULT: No difference rendering to lower-left 32x32 of a 256x256 or 512x512 texture (rendering to a larger region "of" this texture would mean more readback bandwidth).
    3. Use glGetTexImage to read texture instead of glReadPixels
      So far I haven't been able to get reasonable pixel data returns from this.

  9. #9
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,891

    Re: NV FBO Performance, Part 2

    Overmind: Is the color attachment of the FBO a texture or a renderbuffer?
    A texture. Changing it to a renderbuffer didn't affect performance.

    Have you tried the same without glReadPixel?
    Simple timers around glReadPixel reveal that 91-95% of the latency is in (or masked) by the glReadPixels call, and commenting it out reduces the total time to 5-17% of the original time. So it is mostly the sheer latency of glReadPixels.

    However, all this doesn't yet explain why an NV3x smokes a top-of-the-line NV40 on this test (by 2X when reading from an FBO), or why a glReadPixels from an FBO is 2X-4X slower than the system framebuffer,

    Try to archieve the same effect with RTT if possible...
    Yeah, I've been thinking about that. Even if I was doing a 4-pass 8x8 downsample per particle on the GPU before rendering the particle, it probably wouldn't be any slower.

    The unfortunate but intuitive feature of this lighting algorithm is that the readback is needed before rendering each particle to determine how much light has made it through the volume to this point. So it essentially ping-pongs back and forth between the CPU and the GPU (for all particles: get lighting from texture area; attentuate lighting in texture behind particle).

  10. #10
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,891

    Re: NV FBO Performance, Part 2

    Henry Jones:
    seems to be the glReadPixels is done faster with a geforce 5900xt than with a geforce 6800 ultra ... we made this observation during our pbo tests, too
    Interesting. Thanks! Did you happen to work out any stats (e.g. % diff for some transfer size)?

    Could be that's the 5% time increase I see with this alg reading from the system framebuffer (5950U vs. 6800U). Though the 80% increase with FBOs suggests something else is at-work here...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •