Is anyone seeing faster performance with the NVidia FBO implementation than with render-to-frame-buffer + copy-to-texture? Or have you found any tips for making it run fast?
Rendering 17 FBO textures per frame (2 512x512, 13 256x256, 2 128x128) alpha blending 2000-8000 particles per texture, and using the “single FBO, glFramebufferTexture” switching method, I’m seeing “slower” performance (by 3ms per frame) than by rendering them to a corner of the back buffer and glCopyTexSubImage2D’ing them off to textures.
These are standard RGBA (8888) color textures bound to COLOR_ATTACHMENT0 with no depth buffer.
spasi:
attaching a texture with different dimensions is one of the slow cases for fbo.
That made a big difference. With a separate FBO per texture res, it’s now about 1ms faster than the copy-to-texture path on a 6800 Ultra. Amazing improvement considering how few FBO reconfigs are involved.
Is there much overhead in switching FBOs, or is most of the overhead in changing attachment bindings?
Thanks for the help.
There is a performance penalty in binding FBOs although much less than the dreadful context switches. Having said that, for optimal performance it is recommended to have a single FBO with multiple color attachments to it, you can verify my claims by reading this PDF file.
Dark Photon:
…whether there would be any value in sorting the “work queue” by texture size to minimize FBO switches. (Thanks again)
FWIW, with sorting and this test view, eliminating 5 out of 17 FBO switches, I didn’t see any measurable difference – certainly nothing like the 4ms gain in just segmenting textures by size to different FBOs.
Further testing suggests FBO switches are very cheap compared to rebinding textures of different sizes to an FBO.