Performance of VAO's and array buffers

I’m working on some code for creating Vertex Array Objects (VAO) and array buffers to put in the VAO’s. I don’t have much experience in these area’s yet. I do have code that works, but I don’t have a clue about what choices would be best when considering performance. So basically I’m looking for some general guidelines to follow with respect to array buffers and vao’s for gaining good performance in most situations. Personal experiences are very welcome, or if you have some pointers to good online resources I’d be interested as well.

Questions that come to my mind when thinking about these things are:

  • What is considered a good size for an Array Buffer? I guess to small isn’t good, but too large might not be good either?

  • What is the difference between GL_DYNAMIC_DRAW and GL_STREAM_DRAW? Am I correct when I’m thinking that dynamic draw is used when data changes every now and then, but between changes the data is used many times? And stream draw is for data that changes almost always between draws?

  • Is interleaved data for vertices, color, normals and texture coordinates always better? What if they don’t have to be updated at the same time? For example say mostly when I have to update the data I only update the color… would it be better to interleave the color? Or would it be better to put color data in the end of the array buffer? Or would it be better to get a second array buffer specifically for the color?

  • Which function to use for updating data in array buffers? I understood that glBufferSubData should be considered even if changing all data. What about glMapBuffer? I found a blog on the internet that said glMapBuffer is more efficient when having more than 32k data points in a buffer (see this blog post from 2007). Is this still considered valid?

  • In old code I used VBO’s. But I think these are completely replaced by VAO’s for OpenGL 3.0/3.1 now? Or is there still a use for VBO’s?

  • I guess mostly I’ll use an ELEMENT_ARRAY_BUFFER as well to draw using indices, any particular situations in which this might be a bad choice? When using indices for drawing, does it matter a lot whether I’d be drawing triangles or triangle strips?

Anything else I should know about VAO’s or array buffers for gaining good performance? Perhaps good to mention that I’m targeting OpenGL 3.0 class hardware.

  1. bigger is a bit better so it’s better to pack smaller(rocks, bricks and stuff) objects in a larger one, but you don’t have to overdo it.

  2. yes, though it’s more like usage hints so it may not matter which one you choose

  3. i don’t think it matters, though interleaved data might be easier to load from a file.
    Though if you plan on only updating one of them interleaving may be a bit harder to get right.

  4. yea it’s valid, though it depends a bit on how you get the data, if you already have it ready to upload then use glBufferSubData since it’s easier, glMapBuffer is good for loading huge amounts of data directly from the HDD.

  5. Vertex array objects are just a way to abstract VBOs, it makes it easier to bind and set them up for rendering.
    Visit http://www.flashbang.se/ for a more detailed explenation.
    For 3.0/3.1 in forward compatible mode you do have to use glVertexAttribPointer instead of the old glVertexPointer so it is pretty different, but it is still VBO.

  6. i don’t know i never use it

  7. use larger arrays if possible, use texture arrays so you don’t have to rebind all the time and so you can pack a lot of especially static geometry in larger arrays.
    Macro level culling on a large scale is good if you know how to, ideally if you have a street scene then each house is a single VBO that you cull if it’s not visible.
    VAO tend to help as well, i seem to get at least a few FPS extra though it was already running at 300 fps, but my guess is if you have a lot of objects it would be a bonus to your project.

Thank you for your answers. I guess most things are pretty obvious, but when I’m at the point of getting those last few frames out of my application, I should just start testing what works best.

One of your answers I find remarkable though, do you really think it doesn’t matter whether using interleaved arrays or not? My guess was that this could have a huge influence on gpu cache hits/miss ratio, but it is a bit of a wild guess because I’m not really into these kind of things.

Perhaps the data on the GPU is always stored in a certain format, no matter how you upload it (that would be strange as well… would only cause extra work for the driver, which should be done by the programmer I’d say).

I don’t think it’s stored in a specific way, probably as the programmer wants to, but i do know that it’s not the main bottleneck for the GPU (fillrate is and should always be the bottleneck), so it really doesn’t matter as there is always extra room for more polys.

From what I know about how modern GPU caches work, I suspect you’ll have fewer stalls with interleaved vertices. The vertices in VRAM are the same layout as your applications specifies to OpenGL with those offset and stride arguments.