PDA

View Full Version : dynamic VBO vs VA



V-man
05-09-2007, 10:42 PM
What is the difference between using a dynamic VBO vs VA? I want to update my vertices/normals/texcoord often but I don't have to update everyframe.

Why should I compute my vertices and write to my own RAM, then glBufferSubData when I can use VA?

Which is is preferable and why?

Korval
05-10-2007, 12:00 AM
VBO will almost certainly be faster. Indeed, that is the entire point of VBO.

You should also consider that, in Longs Peak, there won't be non-VBO vertex arrays. So I wouldn't bother much with that code if you're going to make the switch.

hseb
05-10-2007, 12:03 AM
Hi,

Using VA, you only reduce the number of API call that are done using simple glVertex.

When using VBO, you reduce the number of API call and, also, you put vertice on the video card (or at a place nearest to the video card instead of your system memory who is to far away; this place is choosen by the driver). To be efficient, you need large batch of data.

nVidia paper : http://developer.nvidia.com/attach/6427

Jan
05-10-2007, 02:35 AM
You do know, that you can map your VBO and directly write to it, don't you?
In that case you don't even need to copy the data with glBufferSubData.

V-man
05-10-2007, 05:21 AM
I've heard at different occasions that it's not good to write to a mapped buffer vertex. Something about non-cached memory.

Secondly, reading that nVidia paper, looks like I'm suppose to call glBufferData instead of glBufferSubData since I don't care about the previous content.

I still don't see the advantage of VBO over VA in this case.


You should also consider that, in Longs Peak, there won't be non-VBO vertex arrays. So I wouldn't bother much with that code if you're going to make the switch.Let's ignore that for a moment :)

Zengar
05-10-2007, 05:51 AM
But writing to mapped buffer will allow you asyncroneous access to GPU RAM, or am I mistaken?

Let us consider two cases how driver can transfer VA data to the GPU (I can't think of anything else):

1. It can copy whole dataset to an internal VBO buffer (creation such buffer in needed). In this case, manual copy to a VBO would be probably a bit faster (as the memory is reserved beforehead).

2. Divide the VA into parts, and send them subsequently. Again, I believe, manual VBO copy would be faster, as the data can be read directly by the GPU.

knackered
05-10-2007, 06:31 AM
If you call glDrawElements multiple times, for example, then each time you call it the driver will have to copy all the vertices from your pointers. It has no way (with basic VA's) to know whether you've changed the vertices or not. With VBO it's clearly defined when you've changed the vertices.

V-man
05-10-2007, 05:50 PM
Originally posted by Zengar:
But writing to mapped buffer will allow you asyncroneous access to GPU RAM, or am I mistaken?
Sure, but the same applies to VA. The driver might copy the vertices to a VBO secretly anyway and then the GPU accesses them async.

I will do 1 call to glDraw{R}Elements.
This is just some dynamic object that I will compute on the CPU so the # of vertices are low.

Korval
05-10-2007, 07:04 PM
The driver might copy the vertices to a VBO secretly anyway and then the GPU accesses them async.But the driver will only do this copy once. As you point out, your usage pattern is that you don't update the data every frame.

Furthermore, you don't know the driver is doing a copy. It might give you the RAM outright. There's no way to know, so best give the driver the chance to do the right thing.

V-man
05-11-2007, 12:23 AM
Yes.
I guess I was just wondering more about what happens if you update every frame. I think there would be an advantage of using VA over VBO or there is no win at all.

I don't have any cases where I have to update every frame, specially when the FPS is high enough.

knackered
05-11-2007, 02:22 AM
vertex buffers are one of the core items that you should abstract in OpenGL. It's simple, and means that if VBO's are supported you're using the fastest path. It's a no-brainer.

Michael Gold
05-11-2007, 07:11 AM
Pretend you are already using Longs Peak and just stop using client vertex arrays. :)

VBO is better because the GPU can pull directly from the array. With VA the driver has to copy the data and send it to the GPU. Its much more efficient when the GPU pulls.

You may think, why not copy from the VA into a temporary location from which the GPU can pull? Because the driver doesnt know how much data to copy - the VBO has an upper limit but the VA has only a base pointer and no upper bound.

You may think, with DrawArrays you know the upper limit, so you know how much to copy. But with DrawElements you only know the upper limit if you make a pass through the elements list and compute the maximum value. DrawElements is used much more frequently than DrawArrays.

If everything I've said isn't reason enough, consider that if you *ever* wish to reuse data, VA forces the driver to copy the data from client memory on every draw call. Caching is impractical.

VAs are dead. Long live VBOs!

knackered
05-11-2007, 07:59 AM
..and that's from the horses mouth. (no offence Mr. Gold).

Jens Scheddin
05-12-2007, 03:27 AM
From my experience (on ATI, using interleaved data and glDrawElements), it's a _lot_ faster (8ms rendering time vs. 48ms) to upload the vertex data into a buffer object, before rendering. Different usage enums make almost no difference.
Best is to preallocate a buffer object and do one or more glBufferSubData calls to it or to upload static data into a static BO and never change it.
Also, using gl(Un)MapBuffer was almost as slow as general VA and allocating more than 54MB for one BO failed.

So yeah, VBOs are the way to go today.

Would be nice to test this on nVidia as well, as they had problems with speed and VBOs in the past.

EDIT: The testing wos done updating the VBO every frame.

V-man
05-13-2007, 01:28 AM
I only use glDrawRangeElements. Never tried glDrawElements. Not even glDrawArrays and the others.

I was doing glBufferSubData since I have also heard that ATI doesn't like glMapBuffer.

Reading that nVidia doc mentioned by hseb, I noticed it said it's better to call glBufferData since that indicates you don't care about previous data. It's like D3D LOCK_DISCARD flag.

I wish I had access to all sorts of video cards as well.


vertex buffers are one of the core items that you should abstract in OpenGL. It's simple, and means that if VBO's are supported you're using the fastest path. It's a no-brainer.It's abstracted in the sense it puts many static models into 1 VBO. I could do the same for dynamic objects. The class also generates the offsetted indices.

Jens Scheddin
05-13-2007, 01:51 AM
I did a quick test with glBufferSubData vs. glBufferData and have to say, that glBufferSubData is, at least on my ATI system, a tiny bit faster (~0.5%).

Do you have some newer nVidia hardware to confirm this?

Jan
05-13-2007, 04:31 AM
If you want to discard the previous data, and still upload your new data in several chunks, you can of course first do a glBufferData (NULL) to discard the old data, and allocate a new buffer, and then use glBufferSubData to fill the new buffer in several steps (eg. if you have not interleaved data and want to copy your data from several source arrays into your VBO).

Jan.

Jens Scheddin
05-14-2007, 12:20 AM
Why would you want to discard data in a buffer and then write data to the same buffer? This doesn't make sense for VBOs, if you are not changing size or usage of the BO. Remember: This is not DX and will probably lead to poor performance, if you try discard data like this.

Jan
05-14-2007, 12:36 AM
With glBufferData you are telling the driver, that you don't care about the current content of the buffer anymore. Therefore, the driver might allocate another chunk of memory, if the previous data is still in use by pending rendering operations.

Now, if your data is not in one big chunk in RAM, so you cannot copy everything with one big glBufferData call, so you want to use glBufferSubData, it is still an advantage, to first tell the driver (via glBufferData (NULL)), that you are going to replace the whole buffer, so that the driver can decide, whether it lets you reuse the same memory, or whether it gives you a new chunk, since the old might still be in use by the graphics card.

As V-Man already pointed out, this is the advised way to replace whole buffer contents, since it allows the driver to optimize parallelism. Therefore, although it seems to be unnecessary overhead, it actually is an optimization.

Jan.

Jens Scheddin
05-14-2007, 07:52 AM
Yeah, my fault. I totally forgot that. I read about this in some nVidia paper some time ago.

AnselmG
05-15-2007, 01:10 AM
I just downloaded the NEHE lesson45 which demonstrates VBO rendering. I am using a Quadro FX 1500 board and here's the interesting thing:

when I activate VBOs I get 60 fps while rendering 526000 triangles. By using VAs I get 260 fps with the same demo!

can anyone explain this to me?

Jens Scheddin
05-15-2007, 08:31 AM
Originally posted by AnselmG:
I just downloaded the NEHE lesson45 which demonstrates VBO rendering. I am using a Quadro FX 1500 board and here's the interesting thing:

when I activate VBOs I get 60 fps while rendering 526000 triangles. By using VAs I get 260 fps with the same demo!

can anyone explain this to me? Why so many triangles? The standard tut has only 32k triangles. Maybe you exceeded the maximum size of the buffer object. The reason for bad performance with VBO could be too many ot too few data in a single BO.

V-man
05-17-2007, 04:11 AM
Originally posted by Jens Scheddin:
I did a quick test with glBufferSubData vs. glBufferData and have to say, that glBufferSubData is, at least on my ATI system, a tiny bit faster (~0.5%).

Do you have some newer nVidia hardware to confirm this? Yes, no sig diff between those 2 methods. I'm using cat 7.4

However, glMapBuffer sucks slightly more. A few percentage slower.