FBO Performance

Is anyone seeing faster performance with the NVidia FBO implementation than with render-to-frame-buffer + copy-to-texture? Or have you found any tips for making it run fast?

Rendering 17 FBO textures per frame (2 512x512, 13 256x256, 2 128x128) alpha blending 2000-8000 particles per texture, and using the “single FBO, glFramebufferTexture” switching method, I’m seeing “slower” performance (by 3ms per frame) than by rendering them to a corner of the back buffer and glCopyTexSubImage2D’ing them off to textures.

These are standard RGBA (8888) color textures bound to COLOR_ATTACHMENT0 with no depth buffer.

Originally posted by Dark Photon:
Rendering 17 FBO textures per frame (2 512x512, 13 256x256, 2 128x128) alpha blending 2000-8000 particles per texture, and using the “single FBO, glFramebufferTexture” switching method
Are you using a single fbo for all the textures, or one per size (for a total of three)? If it’s only one, attaching a texture with different dimensions is one of the slow cases for fbo.

spasi:Are you using a single fbo for all the textures, or one per size (for a total of three)? If it’s only one, attaching a texture with different dimensions is one of the slow cases for fbo.
Thanks for the tip! I’ll work with this and follow up.

spasi:
attaching a texture with different dimensions is one of the slow cases for fbo.
That made a big difference. With a separate FBO per texture res, it’s now about 1ms faster than the copy-to-texture path on a 6800 Ultra. Amazing improvement considering how few FBO reconfigs are involved.

Is there much overhead in switching FBOs, or is most of the overhead in changing attachment bindings?

Thanks for the help.

Originally posted by Dark Photon:
Is there much overhead in switching FBOs, or is most of the overhead in changing attachment bindings?
I don’t think this can be answered right now, since we don’t have a final, mature implementation yet. You could try a few different configurations though (e.g. an fbo per texture for the 512x512 & 128x128 ones and a single 256x256 for the others).

There is a performance penalty in binding FBOs although much less than the dreadful context switches. Having said that, for optimal performance it is recommended to have a single FBO with multiple color attachments to it, you can verify my claims by reading this PDF file.

spasi:
You could try a few different configurations though (e.g. an fbo per texture for the 512x512 & 128x128 ones and a single 256x256 for the others).
Oh, no I wasn’t referring to having less than 1 FBO per texture size, but rather whether there woule be any value in sorting the “work queue” by texture size to minimize FBO switches. (Thanks again)

Java Cool Dude:
for optimal performance it is recommended to have a single FBO with multiple color attachments to it, you can verify my claims by reading this PDF file.
I actually developed using Simon’s tips in that PDF as a guide. The problem is that this suggestion by itself is inadequate to ensure sufficient performance to make FBOs a win over pure bulk copying of data around on the graphics card. Kudos to spasi for shedding some light on that.

Dark Photon:
…whether there would be any value in sorting the “work queue” by texture size to minimize FBO switches. (Thanks again)
FWIW, with sorting and this test view, eliminating 5 out of 17 FBO switches, I didn’t see any measurable difference – certainly nothing like the 4ms gain in just segmenting textures by size to different FBOs.

Further testing suggests FBO switches are very cheap compared to rebinding textures of different sizes to an FBO.

Dark Photon:
[QB]…eliminating 5 out of 17 FBO switches
Typo. 5 out of 9.