Using PBO and FBO for pixel readback

I have an application where I do rendering and then read back the rendered image. I do not need to use the readback data for rendering. Previously I used PBOs and framebuffers to read back the image, i.e.,

while()
{
   render()
   swapBuffers()
   glBindBuffer(GL_PIXEL_PACK_BUFFER,pbo1);
   glReadPixels()

   ptr = glMapBuffer(GL_PIXEL_PACK_BUFFER,pbo2);
   //read back image
   //swap pbo1 and pbo2 for next iteration
}

Now I want to use FBO instead of framebuffers. If I use a single fbo and a renderbuffer, I suspect that the GPU will not start to render the next frame until the renderbuffer data has been copied to PBO, thus causing a stall. I’m thinking of creating two renderbuffers and render to one while reading from another (like in my previous method with framebuffers), but a FBO only has one depth attachment point. I can use 2 FBOs, but I read somewhere that switching between FBOs are slow.

Speed is crucial in my application, so stalls have to be avoided. There is also not much for the CPU to do while waiting for the GPU to transfer the pixel data.

Any suggestions on what can I do? Thanks in advance

There is also not much for the CPU to do while waiting for the GPU to transfer the pixel data.

If that’s true, then why use PBOs at all? The only reason to use a PBO is for asynchronous pixel data transfer. So if you’re not going to get anything from having asynchronous pixel data transfer, you may as well not bother.

Any suggestions on what can I do?

These statements contain a lot of speculatives: “I suspect that…”, “I read somewhere that…”, etc.

The best way to answer any of these questions is to try them yourself and see what performance you get. Even with fairly synthetic tests, you should be able to get a feel for what will work for your needs.

Correct me if I’m wrong, but if I don’t use PBOs, the rendering thread will be blocked by glReadPixels() until data transfer is done. With PBOs, the thread can proceed to render the next frame while the data from the current frame is being copied to CPU memory simultaneously.

I agree the best way would be to try out myself, but relative beginners like me take some time to figure out and write the code, and we might not have a lot of time to spare to try out everything. That’s why I’m asking for experts’ opinions here for theoretically best ways first.

Correct me if I’m wrong, but if I don’t use PBOs, the rendering thread will be blocked by glReadPixels() until data transfer is done. With PBOs, the thread can proceed to render the next frame while the data from the current frame is being copied to CPU memory simultaneously.

True. But the rendering thread is on the CPU. You said that you didn’t have any other CPU work to do; making OpenGL commands counts as CPU work :wink:

This is actually not true. Using PBO “should” always be faster. (If the driver is written correctly). The reason why PBO is faster is because driver can use DMA to copy GPU memory to CPU memory. If PBO is not used and you call glReadPixels then the DMA is done from GPU memory to driver local buffer and then memcpy from this local buffer to your memory.

Explanation: Drivers cannot directly DMA to your buffer because the memory is not guaranteed to be locked (pinned). Doing it could cause system crash during memory page fault (page swap operation).