8800 GTX + Vista + FBO = low performance

I’m using FBOs to render to a chain of N textures (N=12 in my test). I’m getting unexpected (low) performance with a 8800 GTX, 162.22 driver under Vista 32 bits (note: the problem already existed on older driver versions).

It does not matter if I’m rendering anything at all to the FBO, even without a single glClear, performance is horrible.

The interesting thing is that it seems to be independant of the resolution of the texture I’m trying to render to. For 12 textures, I always get around 65-75 fps (vsync disabled of course) whether I render to 64x64 textures or to 1024x1024 textures.

Anti-aliasing is disabled, threaded optimization in Nvidia driver is disabled.

There is no error when checking the framebuffer status (returns GL_FRAMEBUFFER_COMPLETE_EXT).

The same code running on a 7800 GTX under Win XP gives me hundreds of fps.

The texture is created before the FBO, and is using RGBA8 as internal format. No mipmaps, filter is GL_LINEAR/GL_LINEAR.

The FBO code looks like this:

glGenFramebuffersEXT(1, &m_fbo);
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, m_fbo);
glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_TEXTURE_2D, glTex2D->getGLObject(), 0);
glGenRenderbuffersEXT(1, &(m_depthRB));
glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, m_depthRB);
glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_DEPTH_COMPONENT24, m_width, m_height);
glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT,
GL_DEPTH_ATTACHMENT_EXT, GL_RENDERBUFFER_EXT, m_depthRB);
GLenum status = glCheckFramebufferStatusEXT(GL_FRAMEBUFFER_EXT);

At render-to-FBO time:

glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, m_fbo);
glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_TEXTURE_2D, glTex2D->getGLObject(), 0);
glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT,
GL_DEPTH_ATTACHMENT_EXT, GL_RENDERBUFFER_EXT, m_depthRB);
GLenum status = glCheckFramebufferStatusEXT(GL_FRAMEBUFFER_EXT);
glViewport(0, 0, m_width, m_height);

Any help or idea is appreciated, thanks.

Y.

Hi,

it looks like performance bug, so you should write a simple test case and post it here, otherwise we cannot see where the problem is. If nobody can find an error, send a bug report to nvidia.

This looks like it may be an issue that has already been addressed internally at NVIDIA.

The problem comes from the fact that on XP/Linux flushes were almost free, and were thus used rather liberally throughout the driver. In particular, the driver would flush basically whenever FBO bindings changed. On Vista flushes require a kernel transition as well as some additional processing by the driver and the OS, so they should be used much more sparingly than they were on XP/Linux. Things like ping-ponging between FBOs or between an FBO and the default framebuffer performed especially poorly.

This has already been addressed, and the fix should be in an upcoming driver release (I’m not sure exactly when). If reducing the number of FBO changes you do helps the performance then I’d say just wait for the fix to be released. If you think this isn’t the problem you’re seeing then go ahead and file a bug report (simple sample apps help a lot).

Thanks. I’ll try to produce a minimal app to replicate the problem. As far as I can see, the costs seems to be constant. I only have one FBO in the whole system, so that cannot be FBO switches (unless you count binding back the main framebuffer as a switch).

12 RTTs is not particularly high, I’m using those for a bloom effect. The scene is rendered to a texture first, then the bright areas extracted, and then there are a couple passes to downsample and blur (horizontally then vertically) the texture. I can’t really reduce that number (nor should I have to). Actually, I’m pretty sure that in the final program I’ll have a lot more passes since tons of effects are still missing.

Y.

Switching back and forth between the FBO and the main framebuffer does incur the performance cost as does changing the FBO’s attachments while the FBO is bound. I certainly agree that you shouldn’t have to reduce the number of FBO changes you’re doing, but you will have to wait a little while before you can get the full performance doing this on Vista.

Yes, I understand, and your theory is probably the good one, but I’m really surprised at how slow it is to switch from a FBO to the main framebuffer. I mean, apparently it’s taking an average of 1 millisecond each time I do this operation. I understand that a kernel switch and cpu overhead/states changing are necessary… but 1 millisecond ? Are those operations really so slow ? How does DX9/DX10 compare in this area ? Don’t they have to do a kernel switch too ?

Y.

I have to say that this is a major disappointment to hear. With 64 bit OSes and quad cores, still there is kernel mode switch that is terribly slow.

Well, to be honest, it makes OpenGL (or at least its NVidia implementation, don’t know how ATI compares) absolutely unusable in practise until it’s fixed if you’re trying to release a game with any modern/next-gen effects. I’m not that worried right now, because I’m still years away from a release, but then every time there is a problem. It feels a bit like a mirage: each driver has a serious flaw/bug, that supposedly get fixed in a later version, but a new generation of video cards arrives and the old drivers become low priority, while the ones for the new cards are not mature yet. At least, that’s been my experience in the past 2 years.

Y.

Originally posted by Ysaneya:
It feels a bit like a mirage: each driver has a serious flaw/bug, that supposedly get fixed in a later version, but a new generation of video cards arrives and the old drivers become low priority, while the ones for the new cards are not mature yet. At least, that’s been my experience in the past 2 years.
I just want to say that this is exactly how I feel too, though I’d extend 2 years to… effectively infinity. This applies to the both big two (I find it interesting that Intel seems to be the best of all to write short and correct bugfix release notes for driver revisions).

FYI, I have uploaded a test program that replicates the problem. The framerate should skyrocket to thousands of FPSes, but on Vista I get around 80 fps:

http://www.fl-tw.com/opengl/NVidiaFBOTest.rar

If ATI/Nvidia can’t get drivers done right with the current staff then they need to hire more programmers or fire the ones they have that are sub-par… I agree seems like you get one bug fixed and then another one hits you in the face… Very annoying even for a programmer doing it as a hobby, can’t image the pain for a coder that does it for a living.

That’s a stupid logic. You don’t get software better or faster done by hiring more programmers. That might work when building a house, but software-development is totally different.

If i was a driver-writer, i would feel very offended by this comment. I am sure they would make better drivers, if it was easy. The fact that nVidia, ATI and Intel all have problems, shows that it is not so.

Jan.

Give an infinite number of coders an infinite amount of time they will create Linux.

Therefore I’m sure hiring more programmers on any project would reduce all bugs to nothing.

Originally posted by Jan:
[b] That’s a stupid logic. You don’t get software better or faster done by hiring more programmers. That might work when building a house, but software-development is totally different.

If i was a driver-writer, i would feel very offended by this comment. I am sure they would make better drivers, if it was easy. The fact that nVidia, ATI and Intel all have problems, shows that it is not so.

Jan. [/b]
And you are entitled to your opinion, as I am. But having many sloppy coders vs. many sharp coders I will take the sharp coders. I guess you are a coder by profession.

Originally posted by pudman:
Give an infinite number of coders an infinite amount of time they will create Linux.

Since that is true for monkeys too, I do not think that this has a practical relevance to software development.


Therefore I’m sure hiring more programmers on any project would reduce all bugs to nothing.

Did you ever lead any bigger project? Each project has a limit after which adding more programmers actually makes things worse (unless you send them to the vacation).

You don’t get software better or faster done by hiring more programmers. That might work when building a house
It doesn’t even work when building a house. There’s a certain point when only X people can work on the foundation before they’re getting in each other’s way. And people can’t start on the second floor until the first floor is finished, etc.

nVidia and ATi’s problems in this area are not merely time, but actual implementation. They have to think up appropriate solutions for the problem that don’t break other things. They both started off in the right place: get it functional, even if it is slow. The next phase is performance testing and optimization. That takes time, not people.

Each project has a limit after which adding more programmers actually makes things worse
The Mythical man-month in action.

Korval: You are, of course, right. There’s always a limit on the amount of workers for a project. Only in software-development that limit is often reached much earlier, than in most other professions.

As long as you can’t break a project up into many mostly independent parts, it is usually difficult to work with more than 5 people at it. And your lead-programmer needs to be a very skilled person to manage his co-workers.

Just imagine taking over someone else’s project, because he got fired. It will take you weeks or even months until you know how all the code works, so that you can only continue the work of your predecessor.

Jan.

Since that is true for monkeys too, I do not think that this has a practical relevance to software development.
My point was that coders are monkeys. I think. The relevance being that when I code I prefer to be assisted by monkeys.

Did you ever lead any bigger project? Each project has a limit after which adding more programmers actually makes things worse (unless you send them to the vacation).
Maybe I mis-emphasized my statement and should have used sarcasm markup.

I work at a big corporation and am well aware of the benefits of one good coder versus any number of not-so-good coders.

Going back sort of to the topic, I’m not sure it’s safe to assume ATI/nVidia have bad coders because they can’t work out all of their driver issues. With a piece of hardware as complicated as modern day GPUs the coordination it takes to create the driver must be incredible.

My point was that coders are monkeys.
I’m not sure, but I think I resent the implication.

You are lucky, im not only using the 8800 and Vista, but my Vista is 64 bit! I get a whopping 0.5 FPS on your test program. I searched your post up after experiencing it on the program I am writing. Are you using GLEE by any chance? I suspected that my GLEE was doing something wrong with late initialization. I actually have been able to use the FBO perfectly fine (seemingly), but it seems to be a chance thing. Only works right when there is a blue moon, solar eclipse, and a planetary alignment. When it does work, it is wonderful, but when it doesnt it can take me up to 20 seconds per frame.