PBOs and Memory

henniman · December 26, 2007, 11:28am

Hi all,

i heard somewhere that that PBOs use up to
three times the memory they were allocated
with.

can somewone clarify this? is all this memory allocated
at video/agp/system memory?

best,
hendrik

link:
http://www.mathematik.uni-dortmund.de/~goeddeke/gpgpu/tutorial3.html

Zengar · December 26, 2007, 1:35pm

It all depends on the driver. It may choose the optimal memory type for you, bases on the buffer usage flag. Maybe there is also some memory mirroring going on. Still, the specification is silent about implementation details, so your question can’t be answered with sure, exept for people who actually wrote some drivers. Still, I am very surprised at this claim and would like to know where does the author have his evidence from.

henniman · January 6, 2008, 9:14am

that leads to a question about
the memory pointers returned by glMapBuffer():
are they actually video / agp memory mapped into
my programs address space or are they a newly allocated
chunk of memory that will be fetched via dma to
the ‘real’ location?

.h

Zengar · January 6, 2008, 10:51am

You can never know. Again, it depends on the driver. Maybe the Nvidia/ATI/Intel driver team can answer this question if you ask them directly (provided you are of certain importance to those companies )

tamlin · January 6, 2008, 12:43pm

As OpenGL can run over network, mapping a buffer only gives you pointer to local memory. Whether it’s really to a mapped local gfx card area, unified memory (e.g. Intel) or a temporary buffer to be sent over the bus once it’s unmapped, you never know.

You are however right that OpenGL may use three times the amounts of memory. In the networking case it’s not likely, but say we’re on Win32; the driver has to map the cards memory to kernel-mode address space (a good reason the cards don’t have a single 1GB PCI memory area ), that memory area (section) then has to be given write access to, to the process given the rights to use it. The driver then has to make sure this memory is still available, in case Windows switches resolution (or something) and invalidates almost all of the memory on the video card, and the only way to do that is by allocating another buffer (in the context of the calling process) and copy the contents to that buffer before disaster strikes. This applies to all buffers.

This is one of the areas OpenGL 3.x will help (I’m told), as it will finally be possible to tell it “I don’t want you to ‘manage’ this buffer. If it somehow gets invalidated, tell me and I’ll upload the data again”. Same as DirectX has done since inception (but on that side you had no option).

I suspect this is where you got “3x buffer memory required” from.

arekkusu · January 6, 2008, 12:55pm

Apple has an extension for this today: APPLE_object_purgeable

imported_jwatte · January 6, 2008, 9:41pm

The mappings of the hardware registers don’t take any physical RAM, only virtual address space (which is bad enough on 512 MB + cards – 64-bit, here we come!)

The GL driver doesn’t HAVE to keep the memory in system RAM if you’re not on Windows 98/ME. On Windows XP and up, the driver model will give you enough warning about mode switches that you can copy the data back from the card. However, the problem is that you’re not guaranteed to be able to allocate all that memory in system RAM at that point, which means you’re hosed in low-memory situations.

Thus, GL drivers likely keep one copy of the data in system RAM, and another copy in VRAM, for local graphics adapters. If you are streaming a lot of data (texture uploads, buffer objects, etc), then chances are that, transiently, the driver may use more memory than that, because it will store both the new data waiting to be used, and a copy of the old data, until such time as the new data has replaced it.

When you map the buffer, you may get a pointer to some hardware or AGP memory, or you may just get a pointer to a local bulk data buffer that has been mapped for future DMA to the card. Which you get varies by hardware, driver, operating system, and sometimes may even vary within the same run of the program based on local resource availability.