View Full Version : NVidia multithreading problems

12-17-2008, 04:28 AM
I'm currently researching into methods using multiple GPUs in one system. I now have a setup of 2 Quadro FX 1700 cards in a dual-core machine. Using NV_gpu_affinity I am able to setup two rendering threads, each rendering on a separate GPU.

I'm making heavy use of occlusion queries, with up to 512 queries in-flight at the same time. Unfortunately, after some seconds, the driver either crashes or lockes up in glGetQueryObjectivARB(query, GL_QUERY_RESULT_ARB, &result). Without the occlusion queries, the program works as expected.
The lockup happens regardless of the "Threaded Optimization" setting in the driver settings.

A second phenomenon I experience is, that the whole thing slows down much more than to be expected. I was expecting that with two GPUs and two views, it would run slightly slower that just one view on one GPU. But I experience much worse performance. A first profiling told me, that very much time is spend on nvoglnt.dll (24%), ntoskrnl.dll(17%) and ntdll.dll(11%). Only 6% is spend in my own code. Especially ntoskrnl.dll and ntdll.dll seem to point out that there's much thread-synching going on...
But my rendering threads do no locking/synching while rendering, so I assume, its somewhere inside the driver :-/

Has anyone made similar experiences? How to harness the full power of two separate GPUs rendering in two separate threads?

thanks in advance!

12-17-2008, 05:15 AM
I suspect this is a driver issue - have you talked to nVidia about this?

One thing to try (if possible) is to use multiple processes. I've heard that this can be faster in certains codes.

12-17-2008, 08:18 AM
I've tried something similar, with 3 GTX 260s running in parallel under Linux, and found that performance does not scale as you'd expect. Each GPU was controlled from a separate thread with no resource sharing between contexts. The best I could do was a properly placed sleep or glFinish that would give back most of the expected performance, but not all of it (glFinish is a bit heavy-handed).

There was a memory leak with occlusion queries in nvidia drivers, it may not be fixed in released drivers yet. Try checking the query status with GL_QUERY_RESULT_AVAILABLE_ARB after issuing it, as a possible workaround.

12-17-2008, 08:44 AM
Instead of of heavy handed glFinish(), maybe glFlush() can help ?

12-17-2008, 09:45 AM
skynet, can you provide me with a repro case (Source code preferrable). We'll take a look.


(with my NVIDIA hat on)

12-17-2008, 12:43 PM
I forgot to mention that I'm using the "Quadro Release 178" 178.46 drivers on WinXP64.

Some new findings:

1. the occlusion-query lockup also appears on a single-gpu setup (no NV_gpu_affinity involved) where two threads (two separate contexts) render on one GPU at the same time.

2. NV_gpu_affinity _only_ enumerates both GPUs, if the driver settings are set to "Multi-Display-Performance-Mode". In all other modes, just one GPU is found. Why is that?

3. I have created a double-buffered window, took its Pixelformat-ID (GetPixelFormat()) and created an affinity-DC with this pixelformat. Now glGetIntegerv(GL_DOUBLEBUFFER) returns '1', even when this affinity-DC is made current (which itself has no window-provided framebuffer!). As soon as I call wglSwapLayerBuffers() or SwapBuffers() on the affinity-DC, the application crashes inside the driver. This should not happen. I expect the call to get ignored or to return an error, but not a crash.

One thing to try (if possible) is to use multiple processes. I've heard that this can be faster in certains codes.

I prefer threads, because I need both render-threads to share resources in main memory.

Instead of of heavy handed glFinish(), maybe glFlush() can help ?

I do not understand how inserting one of these should improve performance or stability?