Pixel Buffer Object
What they are not
There are many misconceptions about what PBOs are. So this will explain what they are not.
PBOs are not connected to textures. They are only used to perform pixel transfers; the buffer objects used in this process do not become connected to the texture in any way.
In standard pixel transfer operations, the pixel transfer functions are not permitted to return until the client-side memory is no longer in use. For uploading (pixel unpack), this means that, at a minimum, the OpenGL implementation must copy the memory into an internal buffer in order to do an asynchronous DMA transfer. For downloading (pixel pack), this is much worse, as the entire download operation must take place immediately. If the source of the download is still in use, like a render target, this forces a partial or full flush.
By allowing OpenGL to manage the memory used as the source or destination of pixel transfer operations, OpenGL is able to avoid explicit synchronization until the user accesses the buffer object. This means that the application can be doing other things while the driver is downloading or uploading pixel data. Fence sync objects can be used to ask whether the process is complete without stalling the CPU.
Every function that performs a pixel transfer operation can use buffer objects instead of client memory. Functions that perform an upload operation, a pixel unpack, will use the buffer object bound to the target
GL_PIXEL_UNPACK_BUFFER. Functions that perform a download operation, a pixel pack, will use the buffer object bound to the
These functions only use buffer objects if one is bound to that particular binding point when the function is called. If a buffer is bound, then the pointer value that those functions take is not a pointer, but an offset from the beginning of that buffer.
PBOs are primarily a performance optimization (though in the days before Transform Feedback, they were a way to rasterize data directly to a buffer object). In particular, PBOs are a way to improve asynchronous behavior between the application and OpenGL.
Therefore, the first thing that you need to do in order to properly take advantage of them is to actually have something to do while you are waiting for the transfer to complete. If you are downloading pixel data, and you map the buffer for reading immediately after calling glReadPixels, you aren't getting anything from PBOs.
There are two circumstances for PBOs: uploading and downloading. Both have different needs.
In general, uploads are a fire-and-forget operation. You hand OpenGL some pixel data to store in a texture, and that's the end of it. The benefits of PBOs in this case are less pronounced, as most OpenGL drivers optimize client-side pixel transfers by copying the data to internal memory anyway. Most of what you gain is the ability to load data directly into the PBO itself, which means that OpenGL won't need to copy it. You may even be able to stream data directly from disk into a mapped buffer.
There are two key things to do here. The first is proper formatting of the data, as discussed in the Pixel Transfer Best Practices article. Those tips work just fine as well for PBOs; indeed, they are even more vital here.
The second key is to make sure that you do not start overwriting a buffer before it has finished uploading. You can have multiple buffers that you switch between (but not one per texture). You can also employ some buffer object streaming techniques, as well as using sync objects to detect when a transfer is finished.
Also, remember that you can upload multiple mipmaps into a buffer and transfer all of them with a quick succession of transfer calls. You can do this for several textures at a time.
Of course, PBOs take up memory, so you can't have too many of them.
This is really where PBOs shine, performance-wise. The savings when using PBOs for downloads are substantial. Again, as long as you have something to do during that time.
OpenGL implementations are very asynchronous. In rendering-heavy cases, it may well be the case that at the point when the GPU is rendering one frame, you are already starting to send rendering commands for the next frame. This will happen even if you swapped buffers between them.
This is all well and good. The problem is that, if you start downloading a render target, then immediately clear that same buffer and start rendering to it again, you will introduce a pipeline stall. OpenGL will have to wait for the DMA from the buffer to finish before it can issue new rendering commands. That's not as bad as downloading without PBOs, but it isn't optimal.
Therefore, if you are going to read from the render target every frame, you may want to consider having two render targets and switching between them. Obviously, with resolutions being as large as they are these days, this will take up a lot of memory. It's something of a memory vs. performance tradeoff.
However, it is also likely that you will be able to render to other things anyway. For example, if you are doing some shadow mapping, you have to render to the shadow map, then use that to shadow objects in the actual scene. Well, while you're rendering to your shadow map, you're not rendering to that buffer you want to read from. So creating a second render target may not even be necessary. You can hide the DMA in the shadow render pass, as long as you don't modify the main render target once you start the pixel transfer.