dynamic VBO vs VA

system · May 9, 2007, 10:42pm

What is the difference between using a dynamic VBO vs VA? I want to update my vertices/normals/texcoord often but I don’t have to update everyframe.

Why should I compute my vertices and write to my own RAM, then glBufferSubData when I can use VA?

Which is is preferable and why?

Korval · May 10, 2007, 12:00am

VBO will almost certainly be faster. Indeed, that is the entire point of VBO.

You should also consider that, in Longs Peak, there won’t be non-VBO vertex arrays. So I wouldn’t bother much with that code if you’re going to make the switch.

hseb · May 10, 2007, 12:03am

Hi,

Using VA, you only reduce the number of API call that are done using simple glVertex.

When using VBO, you reduce the number of API call and, also, you put vertice on the video card (or at a place nearest to the video card instead of your system memory who is to far away; this place is choosen by the driver). To be efficient, you need large batch of data.

nVidia paper : http://developer.nvidia.com/attach/6427

Jan · May 10, 2007, 2:35am

You do know, that you can map your VBO and directly write to it, don’t you?
In that case you don’t even need to copy the data with glBufferSubData.

system · May 10, 2007, 5:21am

I’ve heard at different occasions that it’s not good to write to a mapped buffer vertex. Something about non-cached memory.

Secondly, reading that nVidia paper, looks like I’m suppose to call glBufferData instead of glBufferSubData since I don’t care about the previous content.

I still don’t see the advantage of VBO over VA in this case.

You should also consider that, in Longs Peak, there won’t be non-VBO vertex arrays. So I wouldn’t bother much with that code if you’re going to make the switch.
Let’s ignore that for a moment

Zengar · May 10, 2007, 5:51am

But writing to mapped buffer will allow you asyncroneous access to GPU RAM, or am I mistaken?

Let us consider two cases how driver can transfer VA data to the GPU (I can’t think of anything else):

It can copy whole dataset to an internal VBO buffer (creation such buffer in needed). In this case, manual copy to a VBO would be probably a bit faster (as the memory is reserved beforehead).
Divide the VA into parts, and send them subsequently. Again, I believe, manual VBO copy would be faster, as the data can be read directly by the GPU.

knackered · May 10, 2007, 6:31am

If you call glDrawElements multiple times, for example, then each time you call it the driver will have to copy all the vertices from your pointers. It has no way (with basic VA’s) to know whether you’ve changed the vertices or not. With VBO it’s clearly defined when you’ve changed the vertices.

system · May 10, 2007, 5:50pm

Originally posted by Zengar:
But writing to mapped buffer will allow you asyncroneous access to GPU RAM, or am I mistaken?

Sure, but the same applies to VA. The driver might copy the vertices to a VBO secretly anyway and then the GPU accesses them async.

I will do 1 call to glDraw{R}Elements.
This is just some dynamic object that I will compute on the CPU so the # of vertices are low.

Korval · May 10, 2007, 7:04pm

The driver might copy the vertices to a VBO secretly anyway and then the GPU accesses them async.
But the driver will only do this copy once. As you point out, your usage pattern is that you don’t update the data every frame.

Furthermore, you don’t know the driver is doing a copy. It might give you the RAM outright. There’s no way to know, so best give the driver the chance to do the right thing.

system · May 11, 2007, 12:23am

Yes.
I guess I was just wondering more about what happens if you update every frame. I think there would be an advantage of using VA over VBO or there is no win at all.

I don’t have any cases where I have to update every frame, specially when the FPS is high enough.

knackered · May 11, 2007, 2:22am

vertex buffers are one of the core items that you should abstract in OpenGL. It’s simple, and means that if VBO’s are supported you’re using the fastest path. It’s a no-brainer.

Michael_Gold · May 11, 2007, 7:11am

Pretend you are already using Longs Peak and just stop using client vertex arrays.

VBO is better because the GPU can pull directly from the array. With VA the driver has to copy the data and send it to the GPU. Its much more efficient when the GPU pulls.

You may think, why not copy from the VA into a temporary location from which the GPU can pull? Because the driver doesnt know how much data to copy - the VBO has an upper limit but the VA has only a base pointer and no upper bound.

You may think, with DrawArrays you know the upper limit, so you know how much to copy. But with DrawElements you only know the upper limit if you make a pass through the elements list and compute the maximum value. DrawElements is used much more frequently than DrawArrays.

If everything I’ve said isn’t reason enough, consider that if you ever wish to reuse data, VA forces the driver to copy the data from client memory on every draw call. Caching is impractical.

VAs are dead. Long live VBOs!

knackered · May 11, 2007, 7:59am

…and that’s from the horses mouth. (no offence Mr. Gold).

Jens_Scheddin · May 12, 2007, 3:27am

From my experience (on ATI, using interleaved data and glDrawElements), it’s a lot faster (8ms rendering time vs. 48ms) to upload the vertex data into a buffer object, before rendering. Different usage enums make almost no difference.
Best is to preallocate a buffer object and do one or more glBufferSubData calls to it or to upload static data into a static BO and never change it.
Also, using gl(Un)MapBuffer was almost as slow as general VA and allocating more than 54MB for one BO failed.

So yeah, VBOs are the way to go today.

Would be nice to test this on nVidia as well, as they had problems with speed and VBOs in the past.

EDIT: The testing wos done updating the VBO every frame.

system · May 13, 2007, 1:28am

I only use glDrawRangeElements. Never tried glDrawElements. Not even glDrawArrays and the others.

I was doing glBufferSubData since I have also heard that ATI doesn’t like glMapBuffer.

Reading that nVidia doc mentioned by hseb, I noticed it said it’s better to call glBufferData since that indicates you don’t care about previous data. It’s like D3D LOCK_DISCARD flag.

I wish I had access to all sorts of video cards as well.

vertex buffers are one of the core items that you should abstract in OpenGL. It’s simple, and means that if VBO’s are supported you’re using the fastest path. It’s a no-brainer.
It’s abstracted in the sense it puts many static models into 1 VBO. I could do the same for dynamic objects. The class also generates the offsetted indices.

Jens_Scheddin · May 13, 2007, 1:51am

I did a quick test with glBufferSubData vs. glBufferData and have to say, that glBufferSubData is, at least on my ATI system, a tiny bit faster (~0.5%).

Do you have some newer nVidia hardware to confirm this?

Jan · May 13, 2007, 4:31am

If you want to discard the previous data, and still upload your new data in several chunks, you can of course first do a glBufferData (NULL) to discard the old data, and allocate a new buffer, and then use glBufferSubData to fill the new buffer in several steps (eg. if you have not interleaved data and want to copy your data from several source arrays into your VBO).

Jan.

Jens_Scheddin · May 14, 2007, 12:20am

Why would you want to discard data in a buffer and then write data to the same buffer? This doesn’t make sense for VBOs, if you are not changing size or usage of the BO. Remember: This is not DX and will probably lead to poor performance, if you try discard data like this.

Jan · May 14, 2007, 12:36am

With glBufferData you are telling the driver, that you don’t care about the current content of the buffer anymore. Therefore, the driver might allocate another chunk of memory, if the previous data is still in use by pending rendering operations.

Now, if your data is not in one big chunk in RAM, so you cannot copy everything with one big glBufferData call, so you want to use glBufferSubData, it is still an advantage, to first tell the driver (via glBufferData (NULL)), that you are going to replace the whole buffer, so that the driver can decide, whether it lets you reuse the same memory, or whether it gives you a new chunk, since the old might still be in use by the graphics card.

As V-Man already pointed out, this is the advised way to replace whole buffer contents, since it allows the driver to optimize parallelism. Therefore, although it seems to be unnecessary overhead, it actually is an optimization.

Jan.

Jens_Scheddin · May 14, 2007, 7:52am

Yeah, my fault. I totally forgot that. I read about this in some nVidia paper some time ago.