About small-size VBOs

I am just wondering if there will be huge performance penalty for using lots of (for example 10000) small-size VBOs.

Thanks.

Maybe, but the thing is you can merge those VBOs into one or more large ones and then just render the sections you like.

I’d suggest the latter as well, I got a performance gain over this method. alloc bigger vbos and fill them with multiple objects, if you do padding to biggest vertex-size, you can mix different vertexformats/sizes within the same vbo as well.

just make sure you have some sort of mechanism that doesnt call glVertexPointer… functions when you still use the same bound VBO

Thanks for your comments.

I agree that grouping small VBOs into larger one will help. My only concern is how to decide the size of the large VBO since the number of the renderables may not be fixed. What’s confusing me is if it’s possible to dynamically change the size of an existing VBO. I am not very clear about what’s the GL_STATIC_DRAW_ARB/GL_DYNAMIC_DRAW_ARB mean. Does it mean just the content of the VBO is static/dynamic or its dimension can also be dynamic.

Thanks for your any comments and sorry for my maybe naive questions.

STATIC_DRAW and DYNAMIC_DRAW are hints to opengl about how frequently you will be altering the data in the buffer. STATIC_DRAW is supposedly for data that changes very infrequently (less often than once per frame) while DYNAMIC_DRAW is intended to better support multiple random updates per frame (several glBufferSubData calls or a mapped buffer). These hints may have some impact on what type of memory the buffer is stored in, but the opengl driver is free to ignore them and do its own thing. STATIC_DRAW is typically your best bet, even if you are updating the buffer, but it is worth checking STREAM_DRAW and DYNAMIC_DRAW if you think they apply to your situation.

There is no way that I am aware of to resize a buffer except to completely respecify the buffer with a glBufferData call. You could try allocating a larger buffer in advance to limit the number of times you must manually resize it.

There is a size limit beyond which some opengl implementations will see a significant slowdown, so just be aware of that…Also, you might want to limit the size of each chunk allocated in your VBOs so that you can take advantage of unsigned short indices.

I understand better now :slight_smile: Thanks, AlexN and all other gurus.

Originally posted by CrazyButcher:
just make sure you have some sort of mechanism that doesnt call glVertexPointer… functions when you still use the same bound VBO
Well no, you’ll still need to call glVertexPointer per-object even if they do share the same VBO, unless you’re using the index buffer to decide where objects are in a VBO…which means you’ll always be using 32bit indices.

I use shorts, and it works well so far, but I dont have tons of vertices. once beyond the short limit a new chunk is used.

Originally posted by knackered:
unless you’re using the index buffer to decide where objects are in a VBO…which means you’ll always be using 32bit indices.
Huh? Why should it not be possible to use 16 Bit indices?

What about internal (driver level) resources managment? Suppose we have many objects and they share same VertexBuffer. And, if some objects currently not needing to render (say, they are invisible), then driver can free some memory for other purposes. But it can’t, because binding of buffer occur again each frame. direct 3d allow to us using “managed” resources if we expect such behavior, but GL’s driver must automatically doing it (i suppose all GL’s resources are managed, because there is no “lost device” situations).

Originally posted by holdeWaldfee:
[quote]Originally posted by knackered:
unless you’re using the index buffer to decide where objects are in a VBO…which means you’ll always be using 32bit indices.
Huh? Why should it not be possible to use 16 Bit indices?
[/QUOTE]You have multiple objects sharing the same vbo, you want to avoid calling glvertexpointer to set the current offset, each objects indices have effectively the vbo offset pre-baked into them. Now, unless we’re talking about very simple (as in simple for the current generation of cards) geometry for each object, you’re going to have to use 32 bit indices to address vertices above 65535. My scenes typically have 1 to 1.5 million triangles (let alone vertices) in them, what about yours?
Of course, the sensible approach is to bucket your geometry and call glvertexpointer for every bucket, but that’s not always practical.
A few years ago, I did ask for a gldrawelement offset parameter to be introduced as an extension (vertex fetch unit would add the offset to each index as it processed it), but this was obviously ignored because people have work arounds.

Originally posted by Nikolai Timofeew:
What about internal (driver level) resources managment? Suppose we have many objects and they share same VertexBuffer. And, if some objects currently not needing to render (say, they are invisible), then driver can free some memory for other purposes. But it can’t, because binding of buffer occur again each frame. direct 3d allow to us using “managed” resources if we expect such behavior, but GL’s driver must automatically doing it (i suppose all GL’s resources are managed, because there is no “lost device” situations).
glDrawRangeElements tells the driver which areas of the vbo are being accessed, so there’s opportunities for the driver to page out unused regions of the buffer based on the previous frames overall min/max elements.

DrawRangeElements in combination with VBO was slower for me, I also think it is mentioned in the nvidia pdf about VBOs.
and you are right, I am not pushing millions of vertices, so this approach works for me.

Actually, I use large VBO and put many objects into them and still use 16 bit indices. At the right places, I need to call glPointers. Also, I offsetted the indices myself. So every 65000 vertices, index starts at 0 and I need to point glPointers to the starting point.

CrazyButcher, there is a max number of indices that should be send with DrawRangeElements. Maybe you went above this?

This is my point, VBO should not be used like this, it’s mental - but you are forced into these schemes by the cost of calling glvertexpointer. For me, VBO should be an abstraction of a memory allocator, where calls to create are analogous to malloc - whereas most people write their own allocators on top of vbo, which seems just plain daft. The only time I write my own allocators in C is when I’m writing a small object allocator, where the size of each allocation is known to me, so I can get a jump on the standard functions. With VBO, the driver is best placed to know how to organise the memory based on hints. That way it can page out buffers that aren’t in use whenever it wants.
It should not be an expensive operation to switch from one VBO to another. It just shouldn’t.

Originally posted by knackered:
You have multiple objects sharing the same vbo, you want to avoid calling glvertexpointer to set the current offset, each objects indices have effectively the vbo offset pre-baked into them. Now, unless we’re talking about very simple (as in simple for the current generation of cards) geometry for each object, you’re going to have to use 32 bit indices to address vertices above 65535. My scenes typically have 1 to 1.5 million triangles (let alone vertices) in them, what about yours?
Of course, the sensible approach is to bucket your geometry and call glvertexpointer for every bucket, but that’s not always practical.

Is there really a meaningful difference between setting a new offset and switching the VBO?
16 Bit indices are faster than 32 Bit ones.
And when you batch per object type, you won’t get that many VBO switches anyway.

A few years ago, I did ask for a gldrawelement offset parameter to be introduced as an extension (vertex fetch unit would add the offset to each index as it processed it), but this was obviously ignored because people have work arounds.
What kind of work arounds?
How can you switch index offsets withing one draw batch?

well surely there’s a significant difference between changing ‘vertex declaration’ to changing buffer origin. (when I say vertex declaration, I mean the positions within a vertex of all attributes - it’s d3d terminology). Surely the hardware can switch buffer origin quicker than setting up its DMA stuff to take data from different address offsets.

By work arounds I mean the stuff vman mentioned, manually adding the offset to all indices.

glDrawRangeElements tells the driver which areas of the vbo are being accessed, so there’s opportunities for the driver to page out unused regions of the buffer based on the previous frames overall min/max elements.
That’s a bit backwards.

glDrawRangeElements allows the driver to specifically upload the used portion of the range. If it’s already been uploaded, there’d be no point in paging it out, because the driver has no clue as to whether or not you’re going to follow your current glDRE call with a second glDRE call to another part of that/those buffer(s).

And that’s assuming the driver actually treats glDRE differently from a regular glDrawElements.

Is there really a meaningful difference between setting a new offset and switching the VBO?
Yes. A VBO switch (and use) can provoke an upload of the vertex data, while a gl*Pointer switch does not. Usually (if you use glDRE exclusively, you could provoke one if the driver worked that way).

Surely the hardware can switch buffer origin quicker than setting up its DMA stuff to take data from different address offsets.
Why? That makes no sense; you’re still going to have to call glPointer again when you call up your new buffer object, so that the glPointer calls will be bound to the correct buffer. So, in one case, you’re just making some glPointer calls, while in the other case, you’re making a buffer change as well as glPointer calls.

All that being said, I don’t think there’s much to be gained from using VBO as, basically, a heap and then providing your own memory allocator on top of it. Drivers are generally optimized around having more reasonably sized buffer objects, one or so per mesh.

Indeed, the “heap” approach is just going to keep the driver from properly optimizing your buffer data. It’s going to force the driver to keep a big portion of vertex data resident in video memory, thus keeping you from having more room for textures/shaders.

It is, also, discouraged as a practice by both ATi and nVidia.

Now, if you have a bunch of really small objects (say, 250 verts a piece), it might be prudent to concatenate them into groups of buffer objects.

Originally posted by Korval:
That’s a bit backwards.
glDrawRangeElements allows the driver to specifically upload the used portion of the range. If it’s already been uploaded, there’d be no point in paging it out, because the driver has no clue as to whether or not you’re going to follow your current glDRE call with a second glDRE call to another part of that/those buffer(s).
And that’s assuming the driver actually treats glDRE differently from a regular glDrawElements.

I meant (and said) the driver could keep a running min/max for the whole frame, and based on a number of frames it could, like an unused texture, page that portion of the vbo back out if needs be… It’s just one random way the range information could be used…not necessarily should be.

Why? That makes no sense; you’re still going to have to call glPointer again when you call up your new buffer object, so that the glPointer calls will be bound to the correct buffer. So, in one case, you’re just making some glPointer calls, while in the other case, you’re making a buffer change as well as glPointer calls.
I’m not sure what you’re talking about and how it’s relevant to what I said… My main thrust was that if (as is documented) it’s an expensive operation to call glVertexPointer(offset) for whatever reason, then an alternative would be to introduce an offset which is always added to fetched indices by the card.

My main thrust was that if (as is documented) it’s an expensive operation to call glVertexPointer(offset) for whatever reason, then an alternative would be to introduce an offset which is always added to fetched indices by the card.
They’re expensive because it is the gl*Pointer calls that fundamentally bind a buffer object to the renderer. Calling glBindBuffer alone doesn’t make the connection between a buffer object and the renderer.

An offset would not change this.