FBO switching overhead

Dear all,

I noticed in my program that when I use multiple FBO and switch between them for post-processing effects, my CPU usage can be pretty high (to 80%+). While for some simple effects, if there is no FBO switching incurred, the CPU usage is less 4%. I know there should be some overhead for such switching. But it’s kind of too high. Not sure if I should use attachment switching based approach instead (You know, I don’t want to do that if I don’t have to :slight_smile:

Any suggestion is welcome. Thanks.

With a CPU load that high it sounds like you’ve fallen off the fast path. i.e. Something is being done in software rather than on the GPU.

What formats are you using for the FBO and any associated textures?

Measured perf in this area might be a good guiding factor, but it has been suggested that in general this is not a good overall strategy for FBO usage.

Thanks for your reply first.

I am just using RGBA & unsigned char pixel format, although the size is kind of big (1920x1200).

Measured perf in this area might be a good guiding factor, but it has been suggested that in general this is not a good overall strategy for FBO usage.

Thanks for your reply first.

But according to

http://www.gamedev.net/reference/programming/features/fbo2/page2.asp

However if you can stay within these limits then it is possible to use one FBO to render to multiple textures, which is faster than switching between FBOs. While this isn’t an overly slow operation, avoiding unneeded operations is often good practise.

So it seems using one FBO and switching attachments is better.

I personally want to stick with my current multiple FBOs based implementation since it makes the code more elegant, as long as FBO switching overhead is not that high :slight_smile: Not sure if it’s a driver problem.

This is what NVidia says in a 2005 presentation “The OpenGL Framebuffer Object Extension”

FBO Performance Tips
• Don Don’t create and destroy t FBOs every
frame
• Try to avoid modifying textures used as
rendering destinations using TexImage TexImage, ,
CopyTexImage etc.

In order of increasing performance:
– Multiple FBOs
• create a separate FBO for each texture you want to
render to
• switch using BindFramebuffer BindFramebuffer() ()
– can be 2x faster than wglMakeCurrent wglMakeCurrent() () in beta NVIDIA
drivers
– Single FBO, multiple texture attachments
• textures should have same format and dimensions
• use FramebufferTexture FramebufferTexture() () to switch between
textures
– Single FBO, multiple texture attachments
• attach textures to different color attachments
• use glDrawBuffer

I don’t understand though, why multiple FBOs should be slower than constantly re-attaching different textures to the same FBO. FBOs are containers that were invented to keep a certain (validated) state of render buffers. Constantly (de)attaching textures causes repeated validation work, which is what FBOs ought to avoid.

Anyway, in my experience, the way of switching render targets (any of the known ways) almost never impacts performance, unless you’re doing hundreds or thousand of such switches in one frame.

Can anyone shown an example were he got actual performance gains of one way over another?

I think like so many additions to core GL, the FBO has to weather the eons, and that necessitates a design flexible enough to handle unforeseen complications or optimization opportunities that may arise in the future. (If your FBO lasts longer than 10 years, call your doctor.)

Thanks for your reply first.

I am just using RGBA & unsigned char pixel format, although the size is kind of big (1920x1200).

[/QUOTE]

On Apple drivers, still not sure what hardware you are on, RGBA is slower than BGRA because of swizzling. This may have an impact. Perhaps around 10% or so. Also (although this may have been partly related to a driver bug) I was under the impression that GL_UNSIGNED_BYTE was now frowned upon, I know GL_RGB is. For all texture operations and buffer operations I stick with GL_BGRA and GL_UNSIGNED_INT_8_8_8_8_REV religiously, and it took me from a 25fps mess back to 60fps. It is certainly a better set of formats to use, but I think the stunning difference it made for me was in part due to an ATI driver issue at the time.

I think it’s more likely that for some reason your texture size is the issue. Maybe lots of swizzling, or simply a size it does not like and punts back to the CPU side.

Referring to what others have posted we’ve been discussing ‘Bindings’ generally on the Apple OpenGL list. My experience so far (and this is limited to ATI) is that multiple FBOs, VBOs really are not a problem and you can bind and unbind them to your hearts content. Something I was surprised about as I thought bindings were considered expensive. I am waiting on some new NVidea hardware later this month and am glad to see their notes seem to tie up with what I now understand.

First I want to give my sincere thanks to all the people who have tried to help me and all your suggestions are really valuable to me.

Meanwhile I also want to get your forgiveness too :slight_smile:

The problem I reported turned out to be caused by my stupid bug, which makes my program always create new FBO and attached texture while not using the cached one.

Now using multiple FBO has no CPU usage problem any more.

Really sorry for my stupid bug and I should have hunted it earlier.

(Admin, if you like, you can delete this thread )

No don’t delete the thread, some useful info has come out of the discussion, particularly skynet’s post