My program streams vertices into the GPU and streams the rendered (and antialiased) framebuffer back.
For vertex streaming I use the technique reccomended by Rob: http://www.opengl.org/wiki/Buffer_Object_Streaming
Now I want to maximise the performance of the framebuffer readback.
This is what I currently do:
Init:
create first renderbuffer with multisampling
create a framebuffer and attach AA renderbuffer to it
create second renderbuffer without multisampling
create another frame buffer and attach non-AA renderbuffer to it
Frame start:
Bind AA framebuffer to GL_FRAMEBUFFER
Clear to white
~Draw stuff~
Frame end:
Bind non AA framebuffer to GL_DRAW_FRAMEBUFFER
Bind AA framebuffer to GL_READ_FRAMEBUFFER
Resolve AA framebuffer to non-AA using glBlitFrameBuffer
Bind non AA framebuffer to GL_FRAMEBUFFER
Bind output pixel buffer to GL_PIXEL_PACK_BUFFER
Copy non AA framebuffer to PBO using glReadPixels
Map output PBO
memcpy output image from PBO to heap alloced buffer
Unmap output PBO
~Do something with output~
Repeat
Currently my ‘do something with output’ is just saving the result to a PNG file, but later will change to storage (as a PNG) in a memory or file cache.
So what algorithms, techniques, tricks are out there for speeding this up?
Only one I have found is to have two PBO’s and ping-pong them. Does using more than 2 provide more boost, or does the driver internally keep several of them when you ‘orphan’ the PBO?
What about multiple sets of framebuffer/renderbuffers?
The fastest I managed to get so far was by using several PBOs (I think it was 5).
Another interesting thing, when I tried using glBufferdata NULL before glReadPixels it was a lot slower than not using it.