using VBO for dynamic data

i’m trying to use VBO to render terrain data. I’m currently using an algorithm similar to ROAM, so the terrain data constantly changes per frame.

what’s the most efficient way to use VBO to render this kind of data? The only way i can think of right now is to call glBufferDataARB per every frame. (since i need to copy those new vertex info per frame) I think there should be a better way, … hmmm … -¤µ-

thanks

You can also map/unmap the data and write to it directly. This will prevent you from having to keep a system-memory copy. But you’ll have to use a dynamic VBO anyway.

Y.

Hi grunt123,

You’ll want to call BufferData with a proper size and NULL data pointer. Usage should be STREAM_DRAW.

Then use Map/Unmap to get a pointer to the data, and write sequentially and contiguously into the buffer before unmapping it.

Thanks -
Cass

DO not use anything other than WRITE_ONLY if you want any performance improvement over immediate mode. glMapBufferARB is probably your best bet. Since mapping is time consuming operation, I would suggest you subdivide your data into multiple vbo, then when one portion changes, you can map that VBO only, modify, unmap. This should give you better performance than one VBO! Especially, you will lose out if you try to map and unmap each time you render.

I want to elaborate a bit on what Cass &
maximian said. The issue is that you want to achieve as much parallelism between the GPU and the CPU as possible. Calling BufferData with a NULL data pointer tells the driver, “I don’t need the data in that buffer anymore. Throw it away when the hardware is done with it.”

When MapBuffer is called, you will probably get a freshly allocated buffer. It’s up to the driver. If the hardware is all done with the old data, you may get the old buffer. If the hardware is not done with the old buffer, you will get a new block of memory to fill.

If you do not call BufferData with a NULL pointer and the hardware is not done with the buffer when you call MapBuffer, the driver will have to wait in the MapBuffer call until the hardware is done. This will kill performance.

Hope this helps.

thanks for the replies …

about having multiple VBOs, … so the performance doesn’t degrade when you have multiple VBOs? (i heard having multiple VAR could kill the performance.)

and, since i’m dealing with a set of data that constantly changes per frame, i don’t think there is any way to avoid mapping/unmapping per every frame? (even when you subdivide the data into multiple VBOs … you will need to map/unmpa at least one VBO per frame anyway.)

[This message has been edited by grunt123 (edited 12-02-2003).]

I am not sure if multiple vbo are slower. I only recommended it, since it might allow you to say modify one vbo will keeping the rest drawing.

One approach might be to do your vbo modification in a thread, and at the same time instructed the GPu to draw the other VBOs, depending on the relative speed, you could then issue a draw command for the just modified vbo.

Not sure if this works. Personally, VBOs should just be for static data, their implementation for dynamic data seems quite poor. Not that VARs are any better mind you.

Originally posted by grunt123:
[b]thanks for the replies …

about having multiple VBOs, … so the performance doesn’t degrade when you have multiple VBOs? (i heard having multiple VAR could kill the performance.)

and, since i’m dealing with a set of data that constantly changes per frame, i don’t think there is any way to avoid mapping/unmapping per every frame? (even when you subdivide the data into multiple VBOs … you will need to map/unmpa at least one VBO per frame anyway.)

[This message has been edited by grunt123 (edited 12-02-2003).][/b]

Yes, VBOs are intended to be lightweight. Very much unlike VAR where you wanted to
avoid changing your array range at all costs.

I believe the spec ISSUES discusses this in some detail.

In general it’s better to err on the side of more VBOs than too few, though going bananas in either direction will hurt. On the one hand, VBOs that are too large are hard for the driver to do synchronization with and buffer replacement. Too many VBOs and you increase the driver overhead of dealing with them.

Don’t worry too much about using lots of VBOs, but try not to go berzerk with it.

The downside to synchronization problems (one really large VBO) is much worse that the overhead of too many VBOs.

Thanks -
Cass

Could we get an update on this topic? There are so many threads on this topic but obivously the answer from Cass is the one I trust the most since he works for Nvidia.

In this threads its stated that the best way to update a VBO is to call glBufferDataARB with a null pointer and then use glMapBufferARB.

But in this tech paper:
http://www.ati.com/developer/gdc/PerformanceTuning.pdf

both for nvidia and ATI its said that its better just to use glBufferDataARB and glBufferSubDataARB. So I guess things have changed since this thread was originally posted?

Also, if I’m updating the entire buffer, is it better to use glBufferDataARB or glBufferSubDataARB?

Thanks

Malcolm

You are almost certainly better off not doing something ROAM-like, because it requires CPU involvement at too fine a granularity. Hughes Hoppe had an interesting paper (several, actually) at the 2004 Siggraph called Geometry Clipmaps which makes more sense for modern GPUs. I think all you need is hardware accelerated vertex shaders.

-Won

Sorry Won, maybe I did a no-no by reviving a dead thread but this thread is almost a year old so I doubt the original poster is looking for thoughts on ROAM. I brought it back to get updated info on VBO while keeping the information in the same spot for future searches.

Malcolm

Gl can be used as both an abstract and hw/ihv dependant api. I think VBO have landed in the wrong camp meaning they should go into hw/ihv camp since the reason for VBO was speed ie. server resources, and yet this falls flat on its face since endusers can’t figure out an efficient way to use it w/o relying on ihvs to tell them. Oh, the irony. Maybe we’re putting forth the idealism of abstractions w/o realizing that not everything should be abstract especially when dealing with speed efficient applications like games. Maybe we’re overdoing this abstraction thingy a bit.

I think VBO have landed in the wrong camp meaning they should go into hw/ihv camp since the reason for VBO was speed ie. server resources, and yet this falls flat on its face since endusers can’t figure out an efficient way to use it w/o relying on ihvs to tell them. Oh, the irony.
We already know the efficient way to use VBO: don’t upload data. It’s a basic fact that hardware doesn’t like that. ROAM and its ilk will never be as fast as static geometry.

Don’t blame VBO for not letting less efficient algorithms be faster than more efficient ones.

Originally posted by JD:
Maybe we’re overdoing this abstraction thingy a bit.
Agreed. This STATIC,DYNAMIC,STREAM business is confusing.

In the engine I’m currently using, the mesh is dynamic, it changes onces every few frames. Now I know that the mesh should be in video memory for best performance so I use STATIC_DRAW because that has the best chance of giving me video memory. To have to think in this counter intuitive manner to fool the driver into giving me what I want is silly. Perhaps what I should really be asking for is STREAM_DRAW but I’m not entirely sure what that is even after reading the spec. I’m guessing its ‘inbetween’ DYNAMIC and STATIC?

I don’t want to have to translate what I want before I call the API. i.e. I want video memory therefore I have to use STATIC_DRAW.

Perhaps the problem is that I’m coming from having used the VAR extension which did it the right way imo.

With VAR you could not only ask for video memory or agp, but you could with a few lines of code find out how much was available. You would know whether the memory given to you was video or agp.

Say you want 30Mb of video memory and say your target card will only give you 29Mb of video memory. With VBO you get that buffer all in AGP and you would be oblivious to the fact apart from when some poor sod with a slow AGP bus tries to run your app. With VAR you would KNOW it was in AGP and could change your app to ask for 29Mb if you wished. Surely the VAR approach was preferable.

If we were talking about 10% difference in performance then perhaps you could argue that its not worth worrying about and the driver should control the memory management. But as I showed with my previous post, we are for some systems, talking about 300% difference in performance. So it is critical that the programmer has more control and at least knows what the driver has done.

When I was getting 30M Tris/sec instead of the 90 I expected, how was I supposed to know that the problem was related to the memory that had been given to me by the driver? It was only because I knew I should be gettting 90 that I bothered investigating at all.

Edit: On reflection it may be that in the case of a card with 29Mb available as video memory for vbos and 30Mb requested that the driver would internally split the buffer into 29Mb video & 1Mb AGP but with no mechansim to find out, how do we know?

Agreed. This STATIC,DYNAMIC,STREAM business is confusing. <etc>
I’m sure people said something similar about not being able to directly manage texture objects too. Where are those people now? They don’t have a problem because it ultimately isn’t a problem. Even D3D is moving in that direction.

I don’t want to have to translate what I want before I call the API. i.e. I want video memory therefore I have to use STATIC_DRAW.
Would you prefer an API that changes every time the hardware goes through some kind of alteration? What happens if there is no video memory, if there is some other kind of scheme involved instead? Is PCIe memory the same as AGP memory, or, due to the increased bandwidth, would you consider it more acceptable? It’s better to let the driver decide such things.

More importantly, you believe that STATIC_DRAW gives you video memory. You believe this due to empherical data based on the performance characteristics of drivers (nVidia drivers; nothing has been said about how ATi cards handle this) today. What about the next driver release that might decide to dump them to AGP? What about the driver release that puts DYNAMIC_DRAW textures into video ram?

Perhaps what I should really be asking for is STREAM_DRAW but I’m not entirely sure what that is even after reading the spec.
Even if the spec is unclear, the nVidia document is very clear: n-draw-to-n-uploads. A buffer that is frequently uploaded to, typically once per use. This changes frequently enough that sync issues may crop up and the driver may want to double-buffer stuff, or orphan the buffer when you respecify it. Dynamic is less frequent than stream.

Remember, these hints aren’t simple, “Put me in X-RAM,” things; they are hints about the expected user behavior. They tell the driver how you intend to use the buffer so that it can make this usage pattern more optimal for you. It’s important for a proper abstraction to know what it is you intend to do with the memory.

The only real question is when you switch from one to the other. How many uploads per second does it take when you should use DYNAMIC rather than STATIC, and how many does it take to use STREAM rather than DYNAMIC?

Perhaps the problem is that I’m coming from having used the VAR extension which did it the right way imo.
VAR is very clearly not the right way to do it, for any number of reasons. Besides its horrible usage hint mechanism (the fact that nVidia had to tell you which hints to use to get specific memory should be a clue that it’s a bad hint mechanism), there’s the fact that the application is poking directly at driver-side memory, which is never a good thing except under specific conditions.

Say you want 30Mb of video memory and say your target card will only give you 29Mb of video memory.
You’re thinking like an opponent of driver-managed textures again. Equally importantly, you’re allocating 30MB of space. With 64-byte vertices, that’s almost 500,000 vertices all in one VBO. This is not a good idea right from the start. Huge VBO’s are a bad idea, precisely due to problems like this. I hope you’re not treating VBO like VAR, where you allocate a big block and manage it yourself.

Equally importantly, the amount of video memory available to you is not fixed; it varies depending on texture needs. You have two systems competing for the same resource. And remember, you can’t texture from AGP, but you can read vertices from there. So, ultimately, textures are more important. The more video memory you take up with VBO’s (if the driver doesn’t page them out to AGP), the fewer resident textures you can have and the greater the likelihood that, in real-world situations (as opposed to a demo), you’ll start trashing your textures. This is why you want the driver to manage your VBO’s.

On reflection it may be that in the case of a card with 29Mb available as video memory for vbos and 30Mb requested that the driver would internally split the buffer into 29Mb video & 1Mb AGP but with no mechansim to find out, how do we know?
VBO’s almost certainly cannot be split internally. One more reason not to treat VBO like VAR.

You have two choices. One, you can poke at the system and hope that it gives you what you think you want, and hope that it doesn’t change in another driver revision. Or two, you can trust the driver to know what’s best for you in real-world situations, and if you find yourself with a piece of AGP memory, just accept it as being better overall for you. Oh, and telling your users to set up their AGP apature better isn’t a bad idea either.

VBO is at the exact right place for an abstraction of vertex arrays. It’s general enough to cover all kinds of memory, and it allows the driver to manage the memory properly (moving from video to AGP and back when needed). Meanwhile, it allows the user to inform the driver of how he intends to use the memory. Not even texture object lets you do that.

I suggest you holding the terrain vertex data in a static VBO (maybe you need dynamic loading the data from disk , and also need dividing the terrain to some small chunck in my terrain engine ,chunck size is 64 x 64) .
you only need to create a dynamic Index Buffer for rendering LOD terrain (see Game Programming Gems 2)

Korval, so what you’re basically saying is that we should be looking at agp size ie. 32/64/128 and then use one of those numbers as target for our app? Ok, but what about people with 256mb video cards? Or 512mb? And do I always rely on agp speeds to define my optimums? I can not rely on the fat pipe on the card? So when I’m making a level how much memory can I use? How big can my VBOs can be? How about index buffers? Can they be gigabyte in size, why not? The api certainly doesn’t constrain me. So when my app works as expected on one hw and then not on other how do I know where my problems are? Do I just keep knocking off polys until app runs at speed? Where are the limits? So enduser calls me telling me his app doesn’t run fast what now? All he’s telling me he has a black box. Don’t you see that you need some hw information, to be dependant on hw minimum or cut off point? Surely, if this is the correct way of doing it as I think everyone agrees with me then we are hw dependant and shouldn’t pretend to be pushing hw abstraction idealism. It’s not a silver bullet, in fact it can be detrimental to our task.

Korval, so what you’re basically saying is that we should be looking at agp size ie. 32/64/128 and then use one of those numbers as target for our app?
I didn’t say anything about looking at the size of memory to determine something.

So when I’m making a level how much memory can I use?
Whatever you feel is appropriate for the minimum hardware you’re shooting for.

How big can my VBOs can be?
Not very. As in 4-8MB or less.

Can they be gigabyte in size, why not?
Because 1GB is a rediculous number to begin with and you know it. Much like you wouldn’t expect to get very far using a 4096x4096x128-bit floating point texture (256MB for one texture), you shouldn’t expect ludicrous things out of your hardware just because the API says it’s OK to ask.

Do I just keep knocking off polys until app runs at speed? Where are the limits?
Once again, look back at the texture object question. How can you tell when you’re using too many textures? How can you tell when you’re going to thrash? You can’t. All you can do is make a reasonable guess and see what happens.

Your “limits” should be those of reasonability. Is it reasonable to use 30MB of vertex data for a single object? Probably not. Is it reasonable to have a 1GB buffer? Almost certainly not. It’s a question of using basic common sense, likely informed through some experimentation.

I would welcome a more stringent or better defined gl specs to favor us not the ihvs. The specs are too loose right now and we have to rely on ihvs to tell us how their driver does things. The point of gl specs is thus diminished. Btw, can anyone tell me how long can a C++ function name can be? (can’t find it in the c++ specs). Sort of the same problem with C++. Too many undefined behaviors.

I would welcome a more stringent or better defined gl specs to favor us not the ihvs.
Feel free to use D3D then. Though they’re going to switch to a more OpenGL-like paradigm in their next revision, so you only have a fairly short window of opportunity to get at those low-level guts.

Btw, can anyone tell me how long can a C++ function name can be? (can’t find it in the c++ specs). Sort of the same problem with C++. Too many undefined behaviors.
Is this a serious question?

The technical answer is that it is compiler dependent; C++ doesn’t define a maximum length. However, most compilers either have no limit or have a reasonably large one (256 characters is plenty).