dynamic VBO vs VA

What is the difference between using a dynamic VBO vs VA? I want to update my vertices/normals/texcoord often but I don’t have to update everyframe.

Why should I compute my vertices and write to my own RAM, then glBufferSubData when I can use VA?

Which is is preferable and why?

VBO will almost certainly be faster. Indeed, that is the entire point of VBO.

You should also consider that, in Longs Peak, there won’t be non-VBO vertex arrays. So I wouldn’t bother much with that code if you’re going to make the switch.

Hi,

Using VA, you only reduce the number of API call that are done using simple glVertex.

When using VBO, you reduce the number of API call and, also, you put vertice on the video card (or at a place nearest to the video card instead of your system memory who is to far away; this place is choosen by the driver). To be efficient, you need large batch of data.

nVidia paper : http://developer.nvidia.com/attach/6427

You do know, that you can map your VBO and directly write to it, don’t you?
In that case you don’t even need to copy the data with glBufferSubData.

I’ve heard at different occasions that it’s not good to write to a mapped buffer vertex. Something about non-cached memory.

Secondly, reading that nVidia paper, looks like I’m suppose to call glBufferData instead of glBufferSubData since I don’t care about the previous content.

I still don’t see the advantage of VBO over VA in this case.

You should also consider that, in Longs Peak, there won’t be non-VBO vertex arrays. So I wouldn’t bother much with that code if you’re going to make the switch.
Let’s ignore that for a moment :slight_smile:

But writing to mapped buffer will allow you asyncroneous access to GPU RAM, or am I mistaken?

Let us consider two cases how driver can transfer VA data to the GPU (I can’t think of anything else):

  1. It can copy whole dataset to an internal VBO buffer (creation such buffer in needed). In this case, manual copy to a VBO would be probably a bit faster (as the memory is reserved beforehead).

  2. Divide the VA into parts, and send them subsequently. Again, I believe, manual VBO copy would be faster, as the data can be read directly by the GPU.

If you call glDrawElements multiple times, for example, then each time you call it the driver will have to copy all the vertices from your pointers. It has no way (with basic VA’s) to know whether you’ve changed the vertices or not. With VBO it’s clearly defined when you’ve changed the vertices.

Originally posted by Zengar:
But writing to mapped buffer will allow you asyncroneous access to GPU RAM, or am I mistaken?

Sure, but the same applies to VA. The driver might copy the vertices to a VBO secretly anyway and then the GPU accesses them async.

I will do 1 call to glDraw{R}Elements.
This is just some dynamic object that I will compute on the CPU so the # of vertices are low.

The driver might copy the vertices to a VBO secretly anyway and then the GPU accesses them async.
But the driver will only do this copy once. As you point out, your usage pattern is that you don’t update the data every frame.

Furthermore, you don’t know the driver is doing a copy. It might give you the RAM outright. There’s no way to know, so best give the driver the chance to do the right thing.

Yes.
I guess I was just wondering more about what happens if you update every frame. I think there would be an advantage of using VA over VBO or there is no win at all.

I don’t have any cases where I have to update every frame, specially when the FPS is high enough.

vertex buffers are one of the core items that you should abstract in OpenGL. It’s simple, and means that if VBO’s are supported you’re using the fastest path. It’s a no-brainer.

Pretend you are already using Longs Peak and just stop using client vertex arrays. :slight_smile:

VBO is better because the GPU can pull directly from the array. With VA the driver has to copy the data and send it to the GPU. Its much more efficient when the GPU pulls.

You may think, why not copy from the VA into a temporary location from which the GPU can pull? Because the driver doesnt know how much data to copy - the VBO has an upper limit but the VA has only a base pointer and no upper bound.

You may think, with DrawArrays you know the upper limit, so you know how much to copy. But with DrawElements you only know the upper limit if you make a pass through the elements list and compute the maximum value. DrawElements is used much more frequently than DrawArrays.

If everything I’ve said isn’t reason enough, consider that if you ever wish to reuse data, VA forces the driver to copy the data from client memory on every draw call. Caching is impractical.

VAs are dead. Long live VBOs!

…and that’s from the horses mouth. (no offence Mr. Gold).

From my experience (on ATI, using interleaved data and glDrawElements), it’s a lot faster (8ms rendering time vs. 48ms) to upload the vertex data into a buffer object, before rendering. Different usage enums make almost no difference.
Best is to preallocate a buffer object and do one or more glBufferSubData calls to it or to upload static data into a static BO and never change it.
Also, using gl(Un)MapBuffer was almost as slow as general VA and allocating more than 54MB for one BO failed.

So yeah, VBOs are the way to go today.

Would be nice to test this on nVidia as well, as they had problems with speed and VBOs in the past.

EDIT: The testing wos done updating the VBO every frame.

I only use glDrawRangeElements. Never tried glDrawElements. Not even glDrawArrays and the others.

I was doing glBufferSubData since I have also heard that ATI doesn’t like glMapBuffer.

Reading that nVidia doc mentioned by hseb, I noticed it said it’s better to call glBufferData since that indicates you don’t care about previous data. It’s like D3D LOCK_DISCARD flag.

I wish I had access to all sorts of video cards as well.

vertex buffers are one of the core items that you should abstract in OpenGL. It’s simple, and means that if VBO’s are supported you’re using the fastest path. It’s a no-brainer.
It’s abstracted in the sense it puts many static models into 1 VBO. I could do the same for dynamic objects. The class also generates the offsetted indices.

I did a quick test with glBufferSubData vs. glBufferData and have to say, that glBufferSubData is, at least on my ATI system, a tiny bit faster (~0.5%).

Do you have some newer nVidia hardware to confirm this?

If you want to discard the previous data, and still upload your new data in several chunks, you can of course first do a glBufferData (NULL) to discard the old data, and allocate a new buffer, and then use glBufferSubData to fill the new buffer in several steps (eg. if you have not interleaved data and want to copy your data from several source arrays into your VBO).

Jan.

Why would you want to discard data in a buffer and then write data to the same buffer? This doesn’t make sense for VBOs, if you are not changing size or usage of the BO. Remember: This is not DX and will probably lead to poor performance, if you try discard data like this.

With glBufferData you are telling the driver, that you don’t care about the current content of the buffer anymore. Therefore, the driver might allocate another chunk of memory, if the previous data is still in use by pending rendering operations.

Now, if your data is not in one big chunk in RAM, so you cannot copy everything with one big glBufferData call, so you want to use glBufferSubData, it is still an advantage, to first tell the driver (via glBufferData (NULL)), that you are going to replace the whole buffer, so that the driver can decide, whether it lets you reuse the same memory, or whether it gives you a new chunk, since the old might still be in use by the graphics card.

As V-Man already pointed out, this is the advised way to replace whole buffer contents, since it allows the driver to optimize parallelism. Therefore, although it seems to be unnecessary overhead, it actually is an optimization.

Jan.

Yeah, my fault. I totally forgot that. I read about this in some nVidia paper some time ago.