Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 3 123 LastLast
Results 1 to 10 of 22

Thread: FBO performance: who is right?

  1. #1
    Intern Newbie
    Join Date
    Feb 2005
    Posts
    30

    Question FBO performance: who is right?

    Hello,

    years ago, I saw Simon Green's presentation, which showed recommendations for FBO performance. The relevant slide is nr.29 from http://http.download.nvidia.com/developer/presentations/2005/GDC/OpenGL_Day/OpenGL_FrameBuffer_Object.pdf .
    He recommends to use one FBO and switch between textures.

    But then, Valve comes, and tells a very different story. The slides can be found here: https://developer.nvidia.com/sites/d...to%20Linux.pdf the relevant one is nr.65. It says "Do not create a single FBO and then swap out attachments on it."

    Now, I realize that (a) nothing replaces doing your own benchmarks (b) Green's slides are years older than Valve's. So, perhaps changes in the hardware caused Green's preferred option to no longer be the fastest one?

    Your thoughts?

  2. #2
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    FBOs were designed to encapsulate the whole framebuffer state and provides a way to replace it all, thus I would highly recommend having multiple FBOs instead of a single one and swapping attachments. Why?

    1. This is how the API was designed to be used
    2. Usually results in less API calls
    3. The driver doesn't have to re-validate framebuffer completeness and other internal state (these states are cached inside an FBO)

    Sure, doing your own benchmarks is the best way to determine what's best for you, but even if currently the choice I've suggested is not the most efficient on some driver implementation, it is the more likely one that can be further optimized, so in the long run it should be faster.

    (Off-topic: That's why I didn't really understand Valve's presentation about VAOs, as they not recommend using them. In practice VAOs are in fact faster if you have a larger number of attributes than 1 or 2, and they should be preferred because of the same reason: that's the purpose of them, less API calls, and state can be cached in the object)
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  3. #3
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,099
    Quote Originally Posted by aqnuep
    That's why I didn't really understand Valve's presentation about VAOs, as they not recommend using them.
    Well, it doesn't really matter for the recent core GL developer anyway.

  4. #4
    Junior Member Regular Contributor
    Join Date
    Mar 2009
    Posts
    152
    Quote Originally Posted by thokra View Post
    Well, it doesn't really matter for the recent core GL developer anyway.
    It matters. In Core Profile you need VAO yes it is true, but you can gen and bind one at application startup and never undbind it. Then before each draw call configure your arrays with GL_ARB_vertex_attrib_binding. This is a fast path for NV driver.

  5. #5
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,099
    Well, having just read the very brief passage in the Valve slides, I find a claim like "Slower than glVertexAttribPointer on all implementations" without any substantial data to back it up rather amusing - just like them calling a VAO a vertex attribute object. Someone clearly has a love for the spec terminology.

    The idea of having more API calls and still reach higher performance, even if true on all current implementations, just baffles me and again leaves me wondering, why the hell is it so difficult with OpenGL to have mandatory concepts perform as well or better than algorithms that apparently do the exact same thing which is supposed to be avoided by the aforementioned concepts?

    I agree that binding as few as possible VAOs per frame is good, but inding a single VAO just to conform to the spec and then setting up your arrays every frame? That's simply messed up.

  6. #6
    Member Regular Contributor malexander's Avatar
    Join Date
    Aug 2009
    Location
    Ontario
    Posts
    303
    I believe the problem with using VAOs is the lack of distinction between "bind for modify" and "bind for draw". Because the driver has no idea what you're going to do, it has to set up all the state for each vertex attribute location (x16 or x20), whether that location is enabled or not, so that all the vertex queries and state is available to you for a modify op. But if there existed a specialized "bind VAO for draw" call, it could ignore setting up the state for all disabled locations, and simply just set the disable state on those.

    That's probably why calling glVertexAttribPointer() on a small subset of locations is faster than a VAO switch, which sets up the state for all locations.

  7. #7
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    There isn't really "bind for modify" for VAOs except at the beginning, when you set them up, thus that argument sounds irrelevant. Also, assuming that binding a VAO will internally set 16 vertex attribute state even if e.g. 14 are disabled is also pretty naive and it is unlikely to be done so by any driver.

    I think a rather logical reason why it looks like VAOs are slower than VertexAttribPointer is that usually most apps only use 1-2 vertex attributes, and that drivers probably have more optimizations in place for VertexAttribPointer calls than VAO binds just because there are more applications using VertexAttribPointer than VAO binds. But once again, the theoretical maximum performance of VAO binds is definitely higher than individual VertexAttribPointer calls, it might just happen that the former is so underused that it's not well optimized.

    This generally applies to everything, even core profile vs compatibility profile. Lot of people already benchmarked it and said that compatibility profile looks faster. Why? Because all applications use compatibility profile so obviously those paths are more mature and well optimized at the moment, but doesn't mean that they are any better, in fact, it's quite the opposite.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  8. #8
    Junior Member Regular Contributor
    Join Date
    Mar 2009
    Posts
    152
    Quote Originally Posted by aqnuep View Post
    There isn't really "bind for modify" for VAOs except at the beginning, when you set them up, thus that argument sounds irrelevant.
    VAO state can be changed at any time so the driver has to treat each bind as "bind for edit" and "bind for draw". AFAIK it is not possible to optimize VAO in NV driver if it was they would do it.
    Last edited by randall; 04-19-2013 at 11:20 AM.

  9. #9
    Member Regular Contributor malexander's Avatar
    Join Date
    Aug 2009
    Location
    Ontario
    Posts
    303
    Also, assuming that binding a VAO will internally set 16 vertex attribute state even if e.g. 14 are disabled is also pretty naive and it is unlikely to be done so by any driver.
    Yet it's been shown in several threads that VAOs are measurably slower in many cases, so this doesn't give me a lot of faith that the driver is doing something smart. It almost seems like it has been implemented as a client-side convenience macro to set up global vertex state rather than a referenced server-side object. At very least, using a VAO should be just as fast as manually setting up the state seeing as it's an integral part of GL now.

  10. #10
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    Quote Originally Posted by randall View Post
    VAO state can be changed at any time so the driver has to treat each bind as "bind for edit" and "bind for draw". AFAIK it is not possible to optimize VAO in NV driver if it was they would do it.
    Why? You think the NV driver has optimized every single feature to the level that it cannot be improved any further? That's a pretty naive assumption.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •