Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: Fastest way of moving and drawing VBOs

  1. #1
    Junior Member Newbie
    Join Date
    Aug 2012
    Posts
    23

    Fastest way of moving and drawing VBOs

    Hello all,

    I am creating a GUI system with colored geometry (i.e. the only textures in the GUI will be a bitmap text). I am using 3.1+ OpenGL. Since everything will be formed by a quad, and since it's a GUI, it needs to be fast. How would I approach this?

    Should I create a VAO for the entire GUI system and a VBO for each of the quads? Do I then move and resize the quads by updating the VBO's data with glBufferSubData()? I've never used VAOs before, so I have no real idea of the benefits.

    Or maybe I could have a single VBO quad and instance it with transformations?

    I don't know, I'm a total beginner to OpenGL, so maybe these aren't viable options. In that case, is there a best way of doing this for my specific purposes? For the GUI, I only have to be able to create quads, move them and resize them.

    Also, what about drawing the quads? From what info I found, DrawElements seems to be the best way, assuming I use an element array buffer and don't upload the indices every frame.

    Please, could I get an opinion on this?
    Last edited by Inagawa; 08-28-2012 at 10:25 AM. Reason: Added some info

  2. #2
    Junior Member Newbie
    Join Date
    Jul 2008
    Posts
    16
    I have heard of about 4 different ways to get data into VBOs and each is fastest on a different set of hardware. I like using GL_AMD_pinned_memory or glFlushMappedBufferRange.

  3. #3
    Junior Member Newbie
    Join Date
    Jul 2008
    Posts
    16
    Using AMD's pinned memory is really the best because it lets the GPU see the CPU memory. That way you can edit the cpu memory, and the gpu sees it already, just use one buffer and cycle through it, like a ring buffer. Just in case you wonder, the gpu can read the cpu memory VBO faster than it can process the vertices, so no speed is lost using it. However, it is only available on AMD, I wish it was on NVIDIA too.

  4. #4
    Junior Member Newbie
    Join Date
    Jul 2008
    Posts
    16

  5. #5
    Advanced Member Frequent Contributor
    Join Date
    Jan 2007
    Posts
    965
    For this kind of drawing vertex submission is highly unlilkely to be anything near your primary bottleneck. Having said that, I have seen cases where draw calls can mount up in terms of performance overhead, but these are generally limited to rather extreme and unlikely cases which won't be encountered in real world apps (e.g. running a few hundred character quads using D3D - not OpenGL - under VMWare's display driver) so again it's not something you need to worry overmuch about.

    The three main options that seem viable to me are : (1) streaming VBO with 4 verts per quad, (2) streaming VBO with 1 vert per quad and geometry shader, or (3) streaming VBO with 1 vert per quad and instancing. Of these (2) is to be avoided as having the geometry shader stage active will more than wipe out any gains you may get from just 1 vert per quad. That leaves (1) or (3) and generally it's a total wash - they come out roughly equal in performance. I've a preference for (3) as it results in less C/C++ code, but that's the only reason.

    Option (1) is the only one of these where there is a choice between glDrawArrays and glDrawElements, otherwise we're using glDrawArrays always. With option (1) you get to choose between GL_TRIANGLES with 6 indexes per quad or GL_TRIANGLE_FAN/STRIP with 5 per quad (and primitive restart enabled). GL_TRIANGLES may be slightly faster on slightly older hardware that may not have the best support for primitive restart. Either way your index buffer can be completely static so long as it's big enough.

    Option (3) is, as I said, my preferred approach. This doesn't need indexes, involves less code, and has a submission of 1 vert per quad but avoids the overhead of having a geometry shader stage enabled. You don't even need a VBO for your single quad either, as you can use the gl_VertexID builtin instead; so all that you do is set up x, y, w, h and s-low, t-low, s-high, t-high as per-instance data in a streaming buffer, then use gl_VertexID in your vertex shader to figure out the final values for each vertex, and you're done.

  6. #6
    Junior Member Newbie
    Join Date
    Jul 2008
    Posts
    16
    I have had a little time to play with glMultiDrawElementsIndirect, and it works very well for simplifying draw commands. I can draw all my stuff in the whole game in one call. Everything comes from buffers, and indirect command buffer, elements buffers, vertex attrib buffers, and texture buffers. The buffers can be updated according to the fastest method in that pdf I mentioned. I agree, stay away from geo shader, it gets wonky with serializing the vertex stream of chunks of unknown size, it really messes with performance. I think the instancing would be best, remember to use glVertexAttribDivisor/glVertexBindingDivisor to get the one attrib per instance to work. But with glMultiDrawElementsIndirect or glMultiDrawArraysIndirect you can get group some attribs by class and draw lots of disparate classes and jump all over the place with instance count, vertex and instance offsets. And all in one call. You can even source and change the buffers on the GPU and draw with null commands in the list to completely remove the cpu from the loop.

  7. #7
    Junior Member Newbie
    Join Date
    Aug 2012
    Posts
    23
    Awesome, although I understand about half of what you guys wrote, you've given me something to think about. I also have read most of the pdf from codepilot - thanks a lot for that, it's packed full with information.

    streaming VBO with 4 verts per quad
    That is what I'm doing now. I am drawing a GL_TRIANGLE_STRIP with glDrawElements. I only have 4 indices and they go like 0, 1, 2, 3.
    Well, at least I'm doing a part of it, I don't understand what you mean by streaming. Do I have as many VBOs as the GUI elements need and only update those that change their position/size?

    And can/should I try to put all of these VBOs in a VAO? Would it actually have any benefits? I've read this post http://www.opengl.org/discussion_boa...=1#post1183855
    Last edited by Inagawa; 08-29-2012 at 05:27 AM.

  8. #8
    Junior Member Newbie
    Join Date
    Jul 2008
    Posts
    16
    Say you are drawing 100 buttons, which are really just rectangles I guess. You don't want 100 vertex buffers. It is so much easier to have 1 vertex buffer with size for 400 vertices. Simply transfer any changes to the buttons to the 1 buffer object using one the six or so methods in the pdf, and draw again. I really like the pinned memory method because you can just write the changes to the buffer and the gpu sees it automatically.

  9. #9
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,726
    Quote Originally Posted by codepilot View Post
    I have had a little time to play with glMultiDrawElementsIndirect, and it works very well for simplifying draw commands. I can draw all my stuff in the whole game in one call. Everything comes from buffers, and indirect command buffer, elements buffers, vertex attrib buffers, and texture buffers. The buffers can be updated according to the fastest method in that pdf I mentioned. I agree, stay away from geo shader, it gets wonky with serializing the vertex stream of chunks of unknown size, it really messes with performance. I think the instancing would be best, remember to use glVertexAttribDivisor/glVertexBindingDivisor to get the one attrib per instance to work. But with glMultiDrawElementsIndirect or glMultiDrawArraysIndirect you can get group some attribs by class and draw lots of disparate classes and jump all over the place with instance count, vertex and instance offsets. And all in one call. You can even source and change the buffers on the GPU and draw with null commands in the list to completely remove the cpu from the loop.
    Does this actually improve performance in any noticeable way?

  10. #10
    Junior Member Newbie
    Join Date
    Jul 2008
    Posts
    16
    Well, I'm calling opengl from nodejs. So if I called glDrawElementsIndirect a bunch of times it would make a lot of trips from javascript land to c++ land, and that is really expensive for nodejs. Using just 1 call saves that amount. I would be interested in the speed up in plain c code for the difference. I imagine the multi commands are faster, and couldn't be slower, but how much, I don't know.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •