Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 7 of 7

Thread: VAOs slower than glVertexAttribPointer on all implementations

Hybrid View

  1. #1
    Intern Contributor Godlike's Avatar
    Join Date
    May 2004
    Location
    Greece
    Posts
    67

    VAOs slower than glVertexAttribPointer on all implementations

    I was reading Valve/nVidia presentation on the lessons learned when Valve ported Source game engine to OpenGL (https://developer.nvidia.com/sites/d...to%20Linux.pdf).

    One thing that caught my eye is the statement that VAOs are slower than glVertexAttribPointer an all implementations and it's a recommendation to skip them (page 57). So the question is, why slower?

    If VAOs are indeed slower what it's the preferable way to avoid using them on GL3 core where there is no default VAO? Maybe create a single one, bind it once and use it all the time? Will this be efficient?

    Thanks!

    PS: The presentation is a very interesting reading by the way

  2. #2
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,144
    Quote Originally Posted by Godlike View Post
    Maybe create a single one, bind it once and use it all the time? Will this be efficient?
    That's the way some of us already use VAO.

    AFAIK VAO cannot beat single (or several) glVertexAttribPointer calls. But if you have dozen or several dozens of glVertexAttribPointers, than VAO should be better solution.
    It would be nice to see some concrete numbers from real-world applications comparing execution time with and without VAO.

  3. #3
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    Considering that the same paper had this chestnut:

    Quote Originally Posted by Insanity
    * Don’t use MapBuffer—because it returns a pointer, it causes driver serialization.

    * Even worse, it probably causes a CPU-GPU sync point.

    * Instead, use BufferSubData on subsequent regions, then BufferData when it’s time to discard.
    I would consider most of what's in that paper to be suspect on that basis alone. Not necessarily wrong. Just suspect.

    Also, the NVIDIA bias is really showing (what with the heavy shilling of DSA and all).

  4. #4
    Junior Member Newbie
    Join Date
    Aug 2012
    Location
    Switzerland
    Posts
    14
    I was wondering how you would even manage to implement vaos slower than the "manual setup"? Wouldn't the most naive implementation be to just "replay" the same command sequence every time the vao is bound?

  5. #5
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    I was wondering how you would even manage to implement vaos slower than the "manual setup"?
    Let's assume for the moment that all hardware works exactly like D3D does. That is, vertex formats are fixed (ie: changing them is costly), but the source data for those formats is not fixed (changing them is cheap). Given that, how would you implement `glVertexAttribPointer`?

    This function has to be able to change the vertex format and the source data. But a lot of people reuse the same format. Therefore, it would be reasonable to use a simple hashing method (ie: hash the format parameters to `VAP`) to check to see if they're changing the format. If that attribute's format is the same in this new call, then don't change the internal format. Just change the source data.

    Therefore, a series of rendering calls where you use the same vertex format, but with different source data, would perform optimally.

    Now consider binding a VAO for rendering purposes. Here, you have many attributes worth of vertex format data. Plus, thanks to OpenGL's "bind to modify", you can't even be sure that they're intending to use that format data yet until they render with it. So when you switch from one VAO to another, it's easier to just change all the format state even if the format state didn't actually change from one VAO bind to another.

    Therefore, a series of rendering calls where you use VAOs with the same format would not perform optimally.

    Plus, NVIDIA doesn't like VAOs. So they have no reason to make VAOs fast. And every reason to make VAOs slow. That way, they can create a self-fulfilling prophecy: "Look at this profiling data: VAOs are slow. Don't use them."

  6. #6
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,213
    Quote Originally Posted by Godlike View Post
    One thing that caught my eye is the statement that VAOs are slower than glVertexAttribPointer an all implementations and it's a recommendation to skip them (page 57). So the question is, why slower?
    Hmm. I too would be curious as to what use cases would make VAOs slower than raw pointer and enable calls. Changing the VAO state every time perhaps? I wonder if he actually applied them properly.

    His statement differs from my experience when testing VAOs. VAOs alone (for batch attr/index setup, one per static batch, with the data in VBOs) yields some speedup on NVidia (despite Alfonse's and the Valve quote's implication), but NV bindless (for batch attr/index setup) alone yields even more speedup. And last I checked, VAOs+bindless together were slower than bindless alone (which makes sense).

    Now IINADE (I am not a driver engineer), but I suspect the reason for this is that with VAOs, you're collecting state data in a single (likely-) contiguous state struct in the driver. That helps, and perhaps the driver can front-end load some batch setup work caching private state in the VAO, but every time you bind a VAO, you still have to go look this up from main memory amongs a bazillion others, taking the cache misses to make it available in the GL driver (i.e. this state data is not part of your app-side batch storage object which you've already pulled into cache). Whereas with bindless, you actually store exactly what the driver needs on the app side in your batch storage object (i.e. 64-bit GPU addresses) and there's no reason to go do other mem lookups just to bind and enable your attr and index lists. Further, you can get the batch VBO data in a state where it is hot and ready to render with up-front, once, and then render with it many times.
    Last edited by Dark Photon; 04-05-2013 at 05:32 AM.

  7. #7
    Intern Contributor Godlike's Avatar
    Join Date
    May 2004
    Location
    Greece
    Posts
    67
    There is a video with the presentation. It didn't help clear things

    http://www.youtube.com/watch?feature...&v=btNVfUygvio

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •