Pixelbuffers and what exactly is going on

First off, I have been reading up on pixel buffer objects in various sources but have failed to find the concrete details I am looking for. If I was just looking the wrong places then please feel free to redirect me.

In my understanding, calling something like
glBufferData(GL_PIXEL_PACK_BUFFER_ARB, bufferSize, NULL, GL_STATIC_READ_ARB);
will allocate bufferSize bytes somewhere in system memory and associate the currently bound buffer with this data. This will cost as any normal memory allocation.

Then calling something like
glMapBuffer(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB)
will give a pointer to this system memory buffer

When I call something like glReadPixels, data will be transfered into the bound pixel buffers area in system memory and this transfer will be using DMA.
The read into system memory will be required to complete before a following call to glUnmapBuffer returns, but can otherwise be delayed by the driver.

Is that a correct understanding? Originally I beleived that glMapBuffer was actually mapping a portion of device memory into io-address space so a read would be direct from the device, but other sources indicate that this is not the case.

If mapping is to an actual buffer area in system memory, then is it possible to let a pixel buffer object dma its data to another physical card without going around system memory? I have a video card (not graphics card, but one that generates sdi-video) which in its sdk documentations claims that

1: p=glMapBuffer(…)
2: sdk_copy_asynch_to_video§

would copy pixels directly between the cards. How would that be possible with the system memory buffer?

As far as I know, glBufferData allocates the memory on the GPU, then glMapBuffer copies the data to system memory and gives you a pointer to the copied data if reading only. If writing only, it copies the written system buffer to the GPU. This guess is reinforced by the fact that the pointer returned by mapping is invalid after a call to unmap, which indicates on the fly allocated memory(When mapping, not when calling glBufferData).
glReadPixels will read into GPU memory but it is a mistake to call glReadPixels while the buffer is mapped(From the reference pages, “A mapped data store must be unmapped with glUnmapBuffer before its buffer object is used. Otherwise an error will be generated by any GL command that attempts to dereference the buffer object’s data store.”)
But I think that in recent OpenGL versions you can call glMapBuffer asynchronously, that is, without waiting for a preceding call to glReadPixels to finish.
To answer your question I think that, no it is impossible to DMA to your other card without going through system memory. Mapping, copies the GPU buffer to system memory, through DMA and all, but still to system memory, so the example you post actually does that.

glBufferData allocates driver memory. Whether this is actual GPU memory or not is not something you know or should care about. What matters is that it should be efficient for your needs.

glMapBuffer, likewise, does not necessarily return a pointer directly to this buffer. It may, but it may not.

would copy pixels directly between the cards. How would that be possible with the system memory buffer?

Well, it’s not possible regardless of what the map pointer is. Even if the mapped pointer is a GPU memory address, “sdk_copy_asynch_to_video” would still have to download its data from the MPEG card, store it temporarily on the CPU, and then write it to this pointer. I’m pretty sure PCI cards cannot directly communicate with one another.

glMapBuffer, likewise, does not necessarily return a pointer directly to this buffer. It may, but it may not.

By this, are you referring to what would happen if the GPU allocates device memory and maps this to its IO address space in which case it is still an indirect pointer?

By this, are you referring to what would happen if the GPU allocates device memory and maps this to its IO address space in which case it is still an indirect pointer?

I mean it could be anything. The pointer could be address mapped space. It could just be regular old memory allocated on the client: if you map to read, then the driver does a DMA before passing the pointer to you. If you map to write, the driver does a DMA when you unmap it.

The point is that you don’t know what this pointer is. And in general, you shouldn’t care.

Thanks for the replies. I am just as happy with it this way as if it was direct. I just needed to know for sure when what happened, or even when it was unknown what happened, to understand how to optimize.

I have contacted the video card manufacturer with this new information and asked for an explanation about how they see this transfer happening without going through system memory.

So what happens if you do either of the following?

  • glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_WRITE);
    or
  • glMapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_READ_WRITE);

Will the driver first do a DMA from device memory, return
a pointer to driver memory, then do a DMA to device memory
after the pointer is unmapped?

So what happens if you do either of the following?

Again:

“The point is that you don’t know what this pointer is. And in general, you shouldn’t care.”

If you’re running on an AMD Fusion CPU/GPU combo, I’d imagine that it just gives you a pointer. Who knows what it does when running a discreet GPU. You can’t control the behavior of mapping; you can only assume that the driver is doing the best it can and leave it at that.

Alfonse: thanks for the info. You’re right that implementation details should not bother us. It’s just great to be able to at least have an inkling of what actually happens behind the scenes. Any info is better than no info.

Absolutely agree with you. Using buffer objects can be a total Ouigi board with lots of performance potholes to fall into if you don’t have have a good idea what’s going on behind-the-scenes or have decent guidance from a driver developer.

I’d highly recommend reading this thread thoroughly: VBOs strangely slow?. Possibly twice.