PDA

View Full Version : About small-size VBOs



opengl_fan
08-11-2006, 04:43 PM
I am just wondering if there will be huge performance penalty for using lots of (for example 10000) small-size VBOs.

Thanks.

zeoverlord
08-11-2006, 06:25 PM
Maybe, but the thing is you can merge those VBOs into one or more large ones and then just render the sections you like.

CrazyButcher
08-12-2006, 04:38 AM
I'd suggest the latter as well, I got a performance gain over this method. alloc bigger vbos and fill them with multiple objects, if you do padding to biggest vertex-size, you can mix different vertexformats/sizes within the same vbo as well.

just make sure you have some sort of mechanism that doesnt call glVertexPointer... functions when you still use the same bound VBO

opengl_fan
08-12-2006, 11:18 AM
Thanks for your comments.

I agree that grouping small VBOs into larger one will help. My only concern is how to decide the size of the large VBO since the number of the renderables may not be fixed. What's confusing me is if it's possible to dynamically change the size of an existing VBO. I am not very clear about what's the GL_STATIC_DRAW_ARB/GL_DYNAMIC_DRAW_ARB mean. Does it mean just the content of the VBO is static/dynamic or its dimension can also be dynamic.

Thanks for your any comments and sorry for my maybe naive questions.

AlexN
08-12-2006, 11:57 AM
STATIC_DRAW and DYNAMIC_DRAW are hints to opengl about how frequently you will be altering the data in the buffer. STATIC_DRAW is supposedly for data that changes very infrequently (less often than once per frame) while DYNAMIC_DRAW is intended to better support multiple random updates per frame (several glBufferSubData calls or a mapped buffer). These hints may have some impact on what type of memory the buffer is stored in, but the opengl driver is free to ignore them and do its own thing. STATIC_DRAW is typically your best bet, even if you are updating the buffer, but it is worth checking STREAM_DRAW and DYNAMIC_DRAW if you think they apply to your situation.

There is no way that I am aware of to resize a buffer except to completely respecify the buffer with a glBufferData call. You could try allocating a larger buffer in advance to limit the number of times you must manually resize it.

There is a size limit beyond which some opengl implementations will see a significant slowdown, so just be aware of that...Also, you might want to limit the size of each chunk allocated in your VBOs so that you can take advantage of unsigned short indices.

opengl_fan
08-12-2006, 03:07 PM
I understand better now :) Thanks, AlexN and all other gurus.

knackered
08-12-2006, 05:12 PM
Originally posted by CrazyButcher:
just make sure you have some sort of mechanism that doesnt call glVertexPointer... functions when you still use the same bound VBO Well no, you'll still need to call glVertexPointer per-object even if they do share the same VBO, unless you're using the index buffer to decide where objects are in a VBO....which means you'll always be using 32bit indices.

CrazyButcher
08-13-2006, 03:50 AM
I use shorts, and it works well so far, but I dont have tons of vertices. once beyond the short limit a new chunk is used.

holdeWaldfee
08-13-2006, 06:06 AM
Originally posted by knackered:
unless you're using the index buffer to decide where objects are in a VBO....which means you'll always be using 32bit indices. Huh? Why should it not be possible to use 16 Bit indices?

Nikolai Timofeev
08-13-2006, 11:14 PM
What about internal (driver level) resources managment? Suppose we have many objects and they share same VertexBuffer. And, if some objects currently not needing to render (say, they are invisible), then driver can free some memory for other purposes. But it can't, because binding of buffer occur again each frame. direct 3d allow to us using "managed" resources if we expect such behavior, but GL's driver must automatically doing it (i suppose all GL's resources are managed, because there is no "lost device" situations).

knackered
08-14-2006, 03:21 AM
Originally posted by holdeWaldfee:

Originally posted by knackered:
unless you're using the index buffer to decide where objects are in a VBO....which means you'll always be using 32bit indices. Huh? Why should it not be possible to use 16 Bit indices? You have multiple objects sharing the same vbo, you want to avoid calling glvertexpointer to set the current offset, each objects indices have effectively the vbo offset pre-baked into them. Now, unless we're talking about very simple (as in simple for the current generation of cards) geometry for each object, you're going to have to use 32 bit indices to address vertices above 65535. My scenes typically have 1 to 1.5 million triangles (let alone vertices) in them, what about yours?
Of course, the sensible approach is to bucket your geometry and call glvertexpointer for every bucket, but that's not always practical.
A few years ago, I did ask for a gldrawelement offset parameter to be introduced as an extension (vertex fetch unit would add the offset to each index as it processed it), but this was obviously ignored because people have work arounds.

knackered
08-14-2006, 03:29 AM
Originally posted by Nikolai Timofeew:
What about internal (driver level) resources managment? Suppose we have many objects and they share same VertexBuffer. And, if some objects currently not needing to render (say, they are invisible), then driver can free some memory for other purposes. But it can't, because binding of buffer occur again each frame. direct 3d allow to us using "managed" resources if we expect such behavior, but GL's driver must automatically doing it (i suppose all GL's resources are managed, because there is no "lost device" situations). glDrawRangeElements tells the driver which areas of the vbo are being accessed, so there's opportunities for the driver to page out unused regions of the buffer based on the previous frames overall min/max elements.

CrazyButcher
08-14-2006, 05:18 AM
DrawRangeElements in combination with VBO was slower for me, I also think it is mentioned in the nvidia pdf about VBOs.
and you are right, I am not pushing millions of vertices, so this approach works for me.

V-man
08-14-2006, 05:50 AM
Actually, I use large VBO and put many objects into them and still use 16 bit indices. At the right places, I need to call gl*****Pointers. Also, I offsetted the indices myself. So every 65000 vertices, index starts at 0 and I need to point gl*****Pointers to the starting point.

CrazyButcher, there is a max number of indices that should be send with DrawRangeElements. Maybe you went above this?

knackered
08-14-2006, 07:50 AM
This is my point, VBO should not be used like this, it's mental - but you are forced into these schemes by the cost of calling glvertexpointer. For me, VBO should be an abstraction of a memory allocator, where calls to create are analogous to malloc - whereas most people write their own allocators on top of vbo, which seems just plain daft. The only time I write my own allocators in C is when I'm writing a small object allocator, where the size of each allocation is known to me, so I can get a jump on the standard functions. With VBO, the driver is best placed to know how to organise the memory based on hints. That way it can page out buffers that aren't in use whenever it wants.
It should not be an expensive operation to switch from one VBO to another. It just shouldn't.

holdeWaldfee
08-14-2006, 08:11 AM
Originally posted by knackered:
You have multiple objects sharing the same vbo, you want to avoid calling glvertexpointer to set the current offset, each objects indices have effectively the vbo offset pre-baked into them. Now, unless we're talking about very simple (as in simple for the current generation of cards) geometry for each object, you're going to have to use 32 bit indices to address vertices above 65535. My scenes typically have 1 to 1.5 million triangles (let alone vertices) in them, what about yours?
Of course, the sensible approach is to bucket your geometry and call glvertexpointer for every bucket, but that's not always practical.Is there really a meaningful difference between setting a new offset and switching the VBO?
16 Bit indices are faster than 32 Bit ones.
And when you batch per object type, you won't get that many VBO switches anyway.


A few years ago, I did ask for a gldrawelement offset parameter to be introduced as an extension (vertex fetch unit would add the offset to each index as it processed it), but this was obviously ignored because people have work arounds.What kind of work arounds?
How can you switch index offsets withing one draw batch?

knackered
08-14-2006, 09:05 AM
well surely there's a significant difference between changing 'vertex declaration' to changing buffer origin. (when I say vertex declaration, I mean the positions within a vertex of all attributes - it's d3d terminology). Surely the hardware can switch buffer origin quicker than setting up its DMA stuff to take data from different address offsets.

By work arounds I mean the stuff vman mentioned, manually adding the offset to all indices.

Korval
08-14-2006, 01:44 PM
glDrawRangeElements tells the driver which areas of the vbo are being accessed, so there's opportunities for the driver to page out unused regions of the buffer based on the previous frames overall min/max elements.That's a bit backwards.

glDrawRangeElements allows the driver to specifically upload the used portion of the range. If it's already been uploaded, there'd be no point in paging it out, because the driver has no clue as to whether or not you're going to follow your current glDRE call with a second glDRE call to another part of that/those buffer(s).

And that's assuming the driver actually treats glDRE differently from a regular glDrawElements.


Is there really a meaningful difference between setting a new offset and switching the VBO?Yes. A VBO switch (and use) can provoke an upload of the vertex data, while a gl*Pointer switch does not. Usually (if you use glDRE exclusively, you could provoke one if the driver worked that way).


Surely the hardware can switch buffer origin quicker than setting up its DMA stuff to take data from different address offsets.Why? That makes no sense; you're still going to have to call gl*Pointer again when you call up your new buffer object, so that the gl*Pointer calls will be bound to the correct buffer. So, in one case, you're just making some gl*Pointer calls, while in the other case, you're making a buffer change as well as gl*Pointer calls.

All that being said, I don't think there's much to be gained from using VBO as, basically, a heap and then providing your own memory allocator on top of it. Drivers are generally optimized around having more reasonably sized buffer objects, one or so per mesh.

Indeed, the "heap" approach is just going to keep the driver from properly optimizing your buffer data. It's going to force the driver to keep a big portion of vertex data resident in video memory, thus keeping you from having more room for textures/shaders.

It is, also, discouraged as a practice by both ATi and nVidia.

Now, if you have a bunch of really small objects (say, 250 verts a piece), it might be prudent to concatenate them into groups of buffer objects.

knackered
08-14-2006, 03:58 PM
Originally posted by Korval:
That's a bit backwards.
glDrawRangeElements allows the driver to specifically upload the used portion of the range. If it's already been uploaded, there'd be no point in paging it out, because the driver has no clue as to whether or not you're going to follow your current glDRE call with a second glDRE call to another part of that/those buffer(s).
And that's assuming the driver actually treats glDRE differently from a regular glDrawElements.
I meant (and said) the driver could keep a running min/max for the whole frame, and based on a number of frames it could, like an unused texture, page that portion of the vbo back out if needs be.. It's just one random way the range information *could* be used....not necessarily *should* be.


Why? That makes no sense; you're still going to have to call gl*Pointer again when you call up your new buffer object, so that the gl*Pointer calls will be bound to the correct buffer. So, in one case, you're just making some gl*Pointer calls, while in the other case, you're making a buffer change as well as gl*Pointer calls.I'm not sure what you're talking about and how it's relevant to what I said... My main thrust was that if (as is documented) it's an expensive operation to call glVertexPointer(offset) for whatever reason, then an alternative would be to introduce an offset which is always added to fetched indices by the card.

Korval
08-14-2006, 08:23 PM
My main thrust was that if (as is documented) it's an expensive operation to call glVertexPointer(offset) for whatever reason, then an alternative would be to introduce an offset which is always added to fetched indices by the card.They're expensive because it is the gl*Pointer calls that fundamentally bind a buffer object to the renderer. Calling glBindBuffer alone doesn't make the connection between a buffer object and the renderer.

An offset would not change this.

knackered
08-15-2006, 03:21 AM
yes, I know that nothing heavy happens until you glvertexpointer.
"An offset would not change this".
Yes it would. With an offset, you could bind a single vbo, set the buffer offsets using calls to gl**pointer, then for each object you render you would call glElementOffset(m_objectIndexOrigin). No more buffer setup is needed. But as I said, this won't be introduced because this offset can be prebaked into the indices, at the expense of making them 32bit rather than 16bit. It's very rare a single draw call exceeds 65535 vertices, but common for a whole scene.

Jan
08-15-2006, 03:46 AM
I have the same issue. I was doing terrain-rendering a while ago and i subdivided the terrain into patches, which would use different levels-of-detail. Pretty much the same, as Far Cry does it.

The thing is, i precalculated and stored the relative indices to render each patch at a given LOD.

I COULD have stored these in an index-buffer and rendered each batch with 2 lines of code, if there were such functionality to set an index-offset. But since there isn't, for each patch, i need to take the offset myself, add it to each index in the precalculated relative offset array, thus generating a new buffer, this time with 32 Bit indices, that i now need to send over the bus to the GPU.

If there were functionality to set an index-offset, i could have stored all the data on the GPU. It would reduce memory footprint, CPU cycles and bus-bandwidth.

Jan.

knackered
08-15-2006, 04:10 AM
Cool, another good use of an index offset.
Just did a search for the discussion we had when I first suggested it, can you believe it was 2 years ago? Still nothing added to the API.
http://www.opengl.org/discussion_boards/ubb/ultimatebb.php?ubb=get_topic;f=3;t=012219

Korval
08-15-2006, 12:10 PM
No more buffer setup is needed.And is basic buffer setup (ie: the buffer object(s) are not changing, so they're already loaded, and you're just dropping a few tokens into the bitstream) an actual performance issue?


If there were functionality to set an index-offset, i could have stored all the data on the GPU.Unless, of course, the GPU didn't support it, in which case it'd just be the API making the gl*Pointer calls for you.

There's no guarenteed performance benifit from having this offset. It may just be as heavyweight as gl*Pointer calls, in which case you have gained nothing.

knackered
08-15-2006, 01:20 PM
But you could gain a lot. If the GL driver were a simple GL-to-D3D wrapper (just *if*), it would be searching for a matching vertex-declaration format, sending the vertex-declaration change in the bitstream, configuring the streaming stuff etc. When the application *knows* that it's the same vertex format, just a different area of the vertex buffer.

I don't get it Korval, you were in favour of this when it was last discussed, what made you change your mind?
We know this offset is already in hardware because of the existence of the parameter in d3d->DrawIndexedPrimitive().

Korval
08-15-2006, 03:27 PM
I don't get it Korval, you were in favour of this when it was last discussed, what made you change your mind?I'm not against it; what I'm against is getting your hopes up that this is going to be a big performance win.

Maybe it will be, maybe it won't. There's no way to know (which I said on the original thread), and the people who do know won't say anything about it.

If it's there, I'll use it. But if it's not, I won't complain about its absence.


We know this offset is already in hardware because of the existence of the parameter in d3d->DrawIndexedPrimitive().A fair point.

Komat
08-15-2006, 03:38 PM
Originally posted by knackered:

We know this offset is already in hardware because of the existence of the parameter in d3d->DrawIndexedPrimitive(). Existence of that parameter does not mean that the hw must have explicit support for it. In case of indexing offset, that feature can be easily emulated by the driver. All the driver has to do is to advance address it gives to the hw as buffer beginning by offset * stride. Since many things that might be necessary to validate were already validated when the buffer was initially selected, change of offset might be significantly cheaper than the full bind.

Komat
08-15-2006, 03:42 PM
Originally posted by opengl_fan:

I am just wondering if it's possible to use function "glVertexAttribPointer" to set vertex data for conventional attributes.It is not possible. You have to use gl*Pointer functions corresponding to individual conventional attributes.

opengl_fan
08-15-2006, 04:27 PM
Originally posted by Komat:

Originally posted by opengl_fan:

I am just wondering if it's possible to use function "glVertexAttribPointer" to set vertex data for conventional attributes.It is not possible. You have to use gl*Pointer functions corresponding to individual conventional attributes. Thanks. I figured out the answer myself too and deleted my naive post :) Why it confused me is that I found function "glGetActiveAttribARB" will also return "conventional attribs", though this is good to identify what kind of vertex format the shader is using (really possible when some inter-app-and-shader attribute naming conventions are used).

It will be wonderful if openGL can reserve some fixed attribute indices for the conventional attribs so that we can use just one common API function :)

Korval
08-15-2006, 06:19 PM
Existence of that parameter does not mean that the hw must have explicit support for it.Wow, I was all set to rebut this, but then I realized the fault in my logic.

D3D already has a gigantic performance penalty from every Draw* call. A little thing like offsetting the pointer to the bound vertex buffers would be pretty meaningless compared to a switch to kernel mode.

In short, it could be emulated without the performance penalty seen in OpenGL, because OpenGL is already faster ;)

So, I guess it goes back to the "We don't have enough information, but would like to have it if it's available," stand.

Jan
08-16-2006, 01:23 AM
So far, OpenGLs philosophy seems to be, that, if a feature is useful and if a driver could emulate it, then the feature is added. This way on hardware that does support it, one benefits from it and on everything else it's at least no performance issue.

Also, when we have a feature that's very useful, but not yet widely supported (in hw), then the vendors will think about implementing it directly in their hardware.

Even if D3D sometimes needs to emulate it, there is still the possibility, that some hardware can speed it up.

When VBOs were introduced one "feature" was, that switching VBOs should be "lightweight". OK, so far so good, but if the gl***Pointer calls are still heavyweight and we need to call them any time we switch a VBO, then it's all a bit pointless, IMO.

Jan.

V-man
08-16-2006, 09:08 AM
Originally posted by Korval:
D3D already has a gigantic performance penalty from every Draw* call. A little thing like offsetting the pointer to the bound vertex buffers would be pretty meaningless compared to a switch to kernel mode.

In short, it could be emulated without the performance penalty seen in OpenGL, because OpenGL is already faster ;)

So, I guess it goes back to the "We don't have enough information, but would like to have it if it's available," stand. [/QB]It was my impression that D3D only supports what the hw supports. I'm guessing since this DrawIndexPrimitives was added to DX9, then all DX9 GPUs support offsetting.
On lesser GPUs, they may emulate it.

In general D3D emulates almost nothing (HAL). The REF device emulates.

Also, people keep repeating that glVertexPointer is expensive. I only saw a NV document say it's expensive for their own drivers and that was long ago. It may have been 3 years ago.

knackered
08-16-2006, 12:08 PM
Even if glVertexPointer did not cause any expensive operation to happen on the GPU, it's very rare that people render meshes from only one attribute (position) - therefore in order to respecify a position in the vertex buffer the app has to issue 4 or 5 gl**pointer calls. Now considering most apps are CPU bound, is it not a good idea to reduce the call overhead when switching meshes?
idr stated in the other thread that glvertexpointer is almost the equivalent of swapping textures. I can imagine that there's a unit on the GPU that is asynchronously caching the next part of the currently specified area of the vertex buffer, so a switch to another area of the vertex buffer is going to mean interupting that parallel process and telling it of the change, whereas enountering an index that is beyond the cache is a much more natural and efficient mechanism, sort of autonomous.....I'm guessing here, blindly guessing.

Korval
08-16-2006, 12:28 PM
It was my impression that D3D only supports what the hw supports. I'm guessing since this DrawIndexPrimitives was added to DX9, then all DX9 GPUs support offsetting.That's a possibility, but there's no guarentee of that. The driver itself can emulate the functionality by offsetting the pointers before actually executing the draw calls. The D3D layer doesn't need to know anything about it.


Also, people keep repeating that glVertexPointer is expensive. I only saw a NV document say it's expensive for their own drivers and that was long ago. It may have been 3 years ago.That's a fair point too; we don't have recent tests that demonstrate the problem. And even when we did, it was limitted to nVidia hardware/drivers.

If the case for this is going to be made to the ARB, it'd be a good idea to have some actual profiling data (and appropriate test cases) to show the IHVs.