Memory consumption of vertex arrays w VBO

I work on an app that is memory intensive. We use lots of schemes to keep our footprint under the OS max. Of course we also have performance issues. I’ve recently put in some low res mesh code that I’m caching on the card using VBO’s. I have the impression, which I will verify myself pending your feedback, that the graphics driver not only caches these arrays on the card but also makes an additional copy in system memory. I would only really expect it to do this if the card had no more memory and needs to swap it out. But it looks like it does this regardless.

So here are the questions.

  1. Does the driver make an extra copy of the VBO’s in system memory?
  2. Is there a simple way to determine if there is enough fast memory available for a VBO before sending it to the card?

Thx,

–Vincent

My understanding is that VBO isn’t managed like textures are in the GL. That if your card is outta memory, and you request a VBO, it just fails. Anyone know better?

What flags are you setting when creating your VBO? Static or dynamic? It gives the driver hints about how you intend to use the VBO, which may impact if it keeps a local copy or not.

  1. Does the driver make an extra copy of the VBO’s in system memory?

Yes. Or, at least, you should assume it does. It has to on XP, but it doesn’t have to on Vista (due to the visualization of video memory in the Vista driver model).

  1. Is there a simple way to determine if there is enough fast memory available for a VBO before sending it to the card?

No. Creating a buffer object can fail with a GL error, but that would be the only way to tell if you had exceeded some limit. And drivers will reposition your buffer object anyway, so it isn’t something you should be concerned about.

…Yes. Or, at least, you should assume it does.

Yikes, So the advent of 1GB+ cards is basically useless if I’m not running 64bit or Vista. I suppose it’s a justification for moving on to Vista… I’ll try and run some tests to confirm this.

Thx --Vincent

Kindasorta, yes, as ARB didn’t/hasn’t-yet deliver(ed) on the intention to allow to make buffers more D3D-like (application-code refilling on lost buffers instead of rendering-API-managed).

or Vista. I suppose it’s a justification for moving on to Vista…
Vista would AFAIK get you nothing (unless you at the same time swap to D3D/DDraw - the memory management is AFAIK in DDraw). In OpenGL the promise is that buffers are valid after creation. This implies that to even be able to swap-out from VRAM to application memory, the “driver” needs to reserve that address space in the context of the calling process (to be able to allocate it and swap the data to, if needed), which implies that address space gets exhausted equally well in Vista vs. XP or earlier (actually, due to OS overhead and other issues, Vista could chew up your memory at an even greater rate).

The only options I see (so far) is either a 64-bit OS (Vista, XP, or one of the many non-MS OSes), or a gfx API where the implementation does not reserve memory for lost buffers (i.e. DX :frowning: ).

In OpenGL the promise is that buffers are valid after creation. This implies that to even be able to swap-out from VRAM to application memory, the “driver” needs to reserve that address space in the context of the calling process (to be able to allocate it and swap the data to, if needed), which implies that address space gets exhausted equally well in Vista vs. XP or earlier

That doesn’t make any sense. Vista does VRAM virtualization, so it can swap out video memory whenever it feels like. It also swaps it back in. The fact that buffers are valid after creation is meaningless, because after creation, they should be connected to a piece of VRAM (if the driver put it there).

Korval, I stand partially corrected.

While the “driver” (in case of Vista that may be driver+OS, but I read something…) indeed doesn’t have to reserve that space in the context of the calling application, it has to reserve it somewhere (to be able to swap, to be able to fulfill the promise of “an allocated buffer stays”). For NT-class operating systems there are but 2 areas available - user-mode and kernel-mode. Swapping is an MMU-setup thing, meaning the address space has to be available somewhere.

If the driver had the option of creating n processes, each with their own address space, and make them reserve/allocate page-file space on behalf of the kernel driver (i.e. splitting the resource load, which in this case is VA), it could indeed allow a total allocation of more than available_addresspace to either k-mode or u-mode.

AFAIK that’s not the case.

Also, when it comes to buffers, they are not “connected” to a piece of VRAM. Even if you successfully Map buffers they still (theoretically) could be just pointing to an area of system RAM - not VRAM (heck, in the case of UMA they are :slight_smile: ).

I hope it makes sense now.

I am in the same problem.

If I use glBufferDataARB STATIC, ogl reserves memory somewhere, so I could safe delete my data buffer (i.e. my application pointer to vertex data) and everything stills working well?

Am I correct?

Thank you

Lord Cir: If you’re asking if you can delete your local copy, then yes of course you can.

So the VBO data is duplicated in local system memory hu? Would be nice if there was a way to access it, or tell OGL where it can be accessed so that when it needs a GPU copy it can grab it from the location I give it… otherwise I’m duplicating everything in RAM since I need a copy for collision detection…

The VBO buffers are getting quite hefty in my project, upwards of 300 megs combined if the settings are put way up(procedural content). Thats with only 8 bytes per vert, and ushort’s for indices.

otherwise I’m duplicating everything in RAM since I need a copy for collision detection…

Well, I would hope that your collision detection data stores the data in a way that is optimal for collision detection (some kind of BSP or something) rather than simply using data in the form used for rendering.

So when i map a vertex buffer and write “directly” into it, what does the driver do? I always thought this allowed to write directly into GPU memory (therefore one was supposed to memcpy in multiples of 64 bytes). If the driver would duplicate that in sys-mem, i would either actually map the sys-mem copy and the driver would copy the data to the GPU when i unmap it, but then writing 64 bytes each would not be necessary, i think. OR it would copy the data back from the GPU at some time. All in all, the whole concept of mapping a buffer for “direct” access would be kinda pointless.

Jan.

I always thought this allowed to write directly into GPU memory (therefore one was supposed to memcpy in multiples of 64 bytes). If the driver would duplicate that in sys-mem, i would either actually map the sys-mem copy and the driver would copy the data to the GPU when i unmap it, but then writing 64 bytes each would not be necessary, i think. OR it would copy the data back from the GPU at some time. All in all, the whole concept of mapping a buffer for “direct” access would be kinda pointless.

The driver will do exactly and only what it feels is necessary to comply with the request. It could give you an AGP memory pointer, a video memory pointer, or a system memory pointer. It can even copy the buffer into a newly-allocated block of memory that it will upload into the buffer when you call UnMap.

The only assumption you should make is that it will be no slower than writing to your own temporary buffer and calling glBufferSubData.

Thank you for your replies.

Ya but I need the raw data to feed to the collision library in the first place, and while I could do this right when it is first generated this would be slow(not to mention use far to much memory), so instead I want to just feed it the geometry when something is actually near the geometry(its bounding box is hit or whatever). I guess I’ll have to perform a readback from the GPU or something since I don’t want to waste all that memory.

Ya, but your collision library will likely store it’s own optimized representation of your geometry, so you’ll probably end up with little piles of geometry scattered about no matter how you slice it. If your world is large enough to really worry about memory you might chunk it up into manageable pieces ahead of time, then page them in and out as the player buzzes about…

Another option is to generate more of your content/detail procedurally. Collision objects are usually decimated versions of the ones you render anyway, so you could get really spiffy procedurally while leaving you collision primitives relatively simple.

Thank you for the replies!