PDA

View Full Version : Extremely weird performance issue



eldritch
06-13-2007, 03:40 AM
Hi folks. We have been struggling with a very strange performance issue in our flight simulation app.

We have gotten reports of very bad frame rates on GeForce 7 series cards on various types of drivers. It also seems to happen primarily on Dual Core CPU's.

The expected frame rate should be about 70-100 when looking horizontally over the terrain. The low frame rates we have seen are seemingly constant about 20fps.

Now, what we just found out is that if you look straight up for about 5 seconds, the frame rate suddenly pops back where it should be, even when looking down on the terrain again. It seems that when you manage to get the frame rate above 60 for a few seconds, the performance problem corrects itself.

Our test computer for this issue is a Dell XPS 1710 with GeForce 7950. Nvidia driver version is 94.22, but we've tried with different versions.

This only happens on Windows. On the same computer with linux, we get full performance all the time.

We have tried with and without VBO's and display lists for the terrain rendering.

Any ideas why this happens?

Cheers!

mfort
06-13-2007, 04:20 AM
1. Go into Regedit and determine the current primary display card
by looking in HKey_Local_Machine\Hardware/DeviceMap\Video
and note the GUID (global unique indentifier assigned by Windows)
for the entry "\device\video0" which is the long string at the end
of the entry in brackets { }.

2. Edit HKey_Local_Machine\SYSTEM\CurrentControlSet\Contro l\Video\{guid}\0000,
where {guid} is the number derived from the above step.

3. Open the "0000" directory and enter a new DWORD called OGL_ThreadControl
and give it a value of 2. This will disable multithreading in the driver
for all OpenGL [OGL] applications.

It is due to driver "optimization" in NV drivers
dating from 8x.xx.
It is described in NV driver release notes.

eldritch
06-13-2007, 09:23 AM
Thanks! That fixed it. But most of our users are not that comfortable with editing registry settings.

We could of course do the registry changes in the application, but I don't know if I'm very comfortable with that either.

Is there any other way we could programatically disable this threading locally in our application?

Cheers

mfort
06-13-2007, 09:31 AM
This setting is system wide.
But you can modify the registry from you code
via Windows API.
In some recent drivers I saw a possibility to change
it from Display control panel.
Select Performance & Quality Settings,
select Advanced settings, browse "Threaded optimization".
I have this option on my Quadro NV card.

ebray99
06-13-2007, 09:40 AM
I've seen this problem too, but I haven't tried the registry setting (I'll try this later though). It showed up for me whenever I started rendering to a cube map via FBO. Sometimes the framerate pops back to where it should be and sometimes it just stays low. Not sure if this information helps anyone, but I figured I'd throw it out there in case it's related to cubemaps or FBOs somehow. =)

Kevin B

Jan
06-13-2007, 11:35 AM
You could read out the registry setting and restore it upon application exit.

OR you could have an option in your settings "disable driver threading" or so, with a small explanation, so that the user can choose, whether he wants to disable it.

Other than that, i don't know, it's a bad situation.

Jan.

knackered
06-13-2007, 11:57 AM
nvidia should be profiling their drivers at runtime and flip the switch themselves if it's messing things up.

pixelpaul
06-14-2007, 03:05 AM
One issue with NV 'profiling' (especially with visual simulation apps) is that the developer/integrator doesnt want the performance to alter from one frame to the next by too much. Predictable 'medium' performance is much easier to deal with and manage than occasional 'fast' and intermittent 'slow' for the same scene content.
As well as opengl threading try selecting a 'visual simulation' profile as it will reduce the amount of intelligence and analysis that the driver will apply to the opengl command stream.

MarcusL
06-15-2007, 05:54 AM
You can customize the nvidia thread by setting the process affinity to 1 CPU when you create the window & opengl context. Restore it afterwards (use the sysmask)

This is the only way I've found to control nvidia's threading from the app. Not nice, but it works pretty well.

I.e:


::SetProcessAffinityMask(::GetCurrentProcess(), 0x1),

CreateWindowEx(...)
wglCreateContext(...)

::GetProcessAffinityMask(::GetCurrentProcess(), &procMask, &sysMask);
::SetProcessAffinityMask(::GetCurrentProcess(), sysMask);

Ysaneya
06-15-2007, 08:09 AM
In those ages and days where everybody is expecting your programs to take advantage of dual/quad cores machines, I don't think it's a good decision to set the process affinity.

Y.

MarcusL
07-02-2007, 05:49 AM
Since I restore it afterwards, it's not an issue for the rest of the app. (We do run a bunch of intensive threads, and use multi-core)

I just want some control over nvidia's gl driver since it's auto-thread feature gives me problems.

wendaddy
07-03-2007, 09:56 AM
eldritch,
Could you post a link to your application so NVIDIA engineers can try to fix this?
Using the ThreadControl regkey is the best solution right now.

barthold
07-03-2007, 05:20 PM
eldritch, can you get me a repro application showing the problem? We'll take a look.

Thanks,
Barthold
NVIDIA

Keith Z. Leonard
07-05-2007, 12:00 PM
Our game shows a similar problem, in our setup though, we have complete control of the target computers, so I just disable thread optimizations from the driver control panel.

yooyo
07-06-2007, 05:20 AM
Is your game use QueryPerformanceCounter for timing? On dual core systems, when CPU have 2 perf. counters, those counters are not synced and you never know which core execute QueryPerformanceCounter counter. In some case you can get negative delta time between two frames (which is impossible, right?). This negative delta time might screw your physics, AI, and other calculations. There is a fix for XP. Google for KB896256.

Int13h
07-11-2007, 10:02 AM
We had the exact same problem and it had been haunting me for a long time. MarcusL:s "SetProcessAffinityMask" code fixed it straight away. Thanks alot.

I will send you a mail barthold and I can supply a repro application if you have not yet recieved one.

Greetz David

barthold
08-03-2007, 04:21 PM
Guys, please take a look at this. For optimal performance try not to do many queries into the OpenGL driver for each frame. For example, glGetError() or glGetFloatv() etc.

http://developer.nvidia.com/object/multi-thread-gdc-2006.html

Barthold

V-man
08-04-2007, 01:09 AM
It reminds me of the Apple solution: use the Core duo to improve performance in games like WOW. I think they reported some huge improvement like 400% or maybe memory is failing me.

MarcusL
08-08-2007, 01:03 AM
I've read that before. It's interesting, but doesn't say what happens if all cores are working 100%. I am not certain there is a benefit in a driver thread then. (But I'm not sure, multi-threading is not intuitive.. yet. :)

Our app does a fair number of glGet(), and it's not something I can easily fix at the moment.

However, I don't see why I should be getting variations in performance just because of that. Is it because the GL driver dynamically switches between multi-threading or not?

Btw, our render-thread is pretty much only doing rendering, so I'm not convinced that a asynch-layer to the gl driver helps that much. At least not in our case, since our current usage pattern implies a lot of syncronization.

MarcusL
10-29-2007, 07:59 AM
Opening this thread again, since the latest Nvidia drivers for XP (162.18) seem to go around my fix, and use threading no-matter-what.

So, CPU usage jumped by 20-25% on my quad-core (i.e. almost a whole extra core was used) and FPS is still the same.

Arrgh. And no option to disable this in the control panel either. Jeez.

Jackis
10-29-2007, 12:59 PM
Marcus,

new control panel has an option "Threading optimization". Disabling it has the same behaviour, as manual registry key changing.

MarcusL
10-30-2007, 03:03 AM
Not on my computer/card. :(

Dell 490/WinXp (4 cores) with GF 7800 GTX & 162.18.

Is it somehow missing because it's showing in Swedish? (Well, it's not listed in the help, which is in english, and the other options are, so I guess I'm out of luck somehow.)

I found the setting on another system we have here, but that one was running a Quadro card.

And I still don't like to poke the nvidia registry to make our apps run well, when they could've provided some sort of API for this.

knackered
10-30-2007, 05:45 AM
they have. join the nv developer programme.

MarcusL
10-30-2007, 06:15 AM
I've tried several times, but apparently a medical simulation company of 60 ppl that only does about 15 MUSD/year isn't worth their time.

Sigh. I need to stop bitching. I just have a bad mood day. :-/

CatDog
12-12-2007, 04:10 PM
I have the same problem: poor performance on GF7 when NVidias mutlithreading support is enabled. The current Version of nHancer (http://www.nhancer.com) has a checkbox for switching this feature (Compatibility->OpenGL Control->Multithreading).

Threading enabled, I get 40FPS. Threading turned off, it is 150FPS.

Here's an observation I made, using Sysinternals Process Explorer (PE), on a Dual Core Pentium D 3.2GHz and Geforce 7950GX2.

When multithreading is disabled (via nHancer), PE displays only one active thread: the application exe itself. There is also nvoglInt.dll loaded, but it doesn't consume any CPU time. Running the app with heavy load, results in 50% CPU load for the app thread. Since there's no multithreading in my app, that's what I expected.

But: by default, NVidia enables its multithreading feature. And here, the nvoglInt.dll obviously has it's own thread. When running my app, this thread consumes about 15-25% CPU load and my app also consumes only 15-25%. The total load NEVER exceeds 50%!

It looks like the two threads are running on the same core. I would expect CPU load to rise over 50% when the driver has it's own thread, but that is not the case here!

Any ideas are VERY appreciated!

CatDog

Ysaneya
12-13-2007, 03:30 AM
150 fps to 40 fps ? You should be happy. On my 2 machines, I go from 80 fps down to 2-5 fps when multithreading is enabled. I just don't get what NVidia's doing with it, because it's been enabled in their driver for months (if not years) now, and I don't see any sign of improvement.

I would love to get more details about this mutli-threading optimization from NVidia itself. Nobody knows a PDF with some recommendations/explanations about it ?

Y.

CatDog
12-13-2007, 04:42 AM
Hmm, so is this an issue for all OpenGL apps running on dual core/GF7 hardware?

I spend days and weeks on finding out what I am doing wrong. I rewrote half of the startup code, but nothing helped.

Oh, btw, during testing I made another interesting observation. In a very special situation, GLIntercept logged an error after wglSetPixelFormat(). That situation was:

create a context (ChoosePixelformat, SetPixelFormt, wglCreateContext) make it current do not deactivate that context by calling wglMakeCurrent(0,0) try to create a second context just as above
At the last step, GLIntercept logs something like this:

wglSetPixelBuffer() failed, glGetError() = GL_INVALID_OPERATION

(Note that I called GDI32.SetPixelBuffer, so that call to the wgl-Routine seems to come from there. Also, SetPixelBuffer does return a valid pixel format! As I see it, that failure indicates an internal error in the OpenGL driver.)

That GL_INVALID_OPERATION only occures when multithreading is enabled, but it is not directly related to the performance issue, because creating just a single context is slow also. It's just a weird symptom.

And if wglMakeCurrent(0,0) is called before SetPixelBuffer, the error vanishes also, but doesn't fix the performance thing.

CatDog

CatDog
12-13-2007, 06:23 AM
Here are the two GLIntercept logs:

Multithreading DISABLED using nHancer (http://links.mycelium.de/gliInterceptLog1.xml)

Multithreading ENABLED, being default driver setting (http://links.mycelium.de/gliInterceptLog2.xml)

Any comments?

sqrt[-1]
12-13-2007, 08:36 PM
I would not rule out a bug in GLIntercept in this case, but is does seem weird.

Have you tried running with GLIntercept FullDebug profile to see if you get any different results?

CatDog
12-14-2007, 12:50 PM
Hm... maybe you are right. Here is the FullDebug-Log: XML (http://links.mycelium.de/gliInterceptLog_Full.xml) TXT (http://links.mycelium.de/gliInterceptLog_Full.txt)

It doesn't show any error. The previous Logs were done with the "Authors Profile" and I can reproduce them perfectly.

Note this call
"glGetIntegerv(GL_MAX_DRAW_BUFFERS,...);"
that is missing in the full debug version. (I'm not doing this from my app!)

Anyway, even though it's a bug in GLIntercept, it is a bug in the driver too! Because I can only see it when multithreading is enabled. I'm pretty shure that the driver messes things up, and GLIntercept just only doen't know how to handle that mess. :)

After all, I only mentioned this to demonstrate, that there is a malicious impact on the application. And it's related to the multithreading option. It does something evil!

CatDog

sqrt[-1]
12-14-2007, 07:17 PM
That call to
glGetIntegerv(GL_MAX_DRAW_BUFFERS,...)
is an internal GLIntercept call, I fixed this about a week after the 0.5 release.

(I really should do another release with all the bug fixes)

CatDog
12-15-2007, 04:43 AM
I'd appreciate the new release to see if it changes anything. (Btw, thanks for GLIntercept! It's a very useful tool!)

In spite of that I'm curious about the reason for the reported errors. Because after all this seems to be a procedure to expose the driver bug from the application.

CatDog

tamlin
12-17-2007, 11:58 AM
Not sure if glIntercept does this already, but could it be useful to (have a switch to turn on to) display calling thread ID?

On a tangent, I have myself been bitten by both the two largest vendors implementations in the past when it comes to SMP systems (one loves k-mode spinlocks, the other unsuccessfully tries to implement their own version in user-mode) and performance and power consumption of any system often go straight to hell (I tried to find a more polite way to say it, but failed).

I think it's (way over) time they realized the only way to improve system performance overall is to use system provided locking primitives (on SMP systems) - mutex and semaphore comes to mind. Sucking insane amounts of CPU cycles (and therefore power) just so their power-sucking drivers will have a 0.1% edge on a 3D benchmark is just absurd, especially when it makes the remainder of an application run like on a C64 due to their spinlocks busy waiting.

CatDog
03-05-2008, 06:00 PM
It seems that I accidentally found a solution to the problem!

Here is what I did. (http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=235320#Post235320)

I don't know, which change in detail solved it, or if it was a combination of all. While messing with the triangles, I had the "threaded optimization" turned off all the time. After finishing all changes, I was curious and switched it back on - and couldn't believe my eyes! Absolutely no lagging anymore, process explorer reports both cores working when rendering. And I'm getting a significant speed increase with scenes that contain many batches.

Comments appreciated!

CatDog