CPU needs of a GPU computation...

I’m experimenting with my GPGPU framework to load balance a problem across CPUs as well. Each processor (or core of a processor, in this case) works on a division of the data set with the GPU usually taking a larger chunk. I’m testing on a dual-core AMD X2 4400 and a GeForce 7900 GTX, although I’ve previously tried this on a dual Xeon 3.0GHz with a GeForce 6800 GT as well.

If I run the GPU alone on its chunk of the data set it’s nice and fast. If I throw one core into the mix, the GPU computation time rises a negligible amount. But when I throw the second core in as well, the GPU skyrockets from about 450ms to over 3000ms. In both cases, the CPU computation time seems almost unaffected (which weakens my suspicions about memory bandwidth/cache saturation).

The computation is a synchronous OpenGL render in the main thread, with one additional thread for each core. I can understand how saturating both cores reduces the CPU availability to the OpenGL thread, but why does it suffer so badly? It only consists of a few function calls, while the majority of time should be locked in glTexImage2D/glReadPixels.

I hacked up an asynchronous implementation earlier with PBOs, just to see if it makes a difference, but it didn’t seem very asynchronous; the thread was blocked initiating the texture uploads and computation, and while reading the result back. I didn’t really gain much at all.

Any ideas?

Just a follow up on this…

I thought I’d play with thread priorities this morning. On Windows I set the GPU thread to THREAD_PRIORITY_HIGHEST but this didn’t improve the situation. However, once I’d also set the CPU threads to THREAD_PRIORITY_LOWEST there was a huge improvement in GPU performance. As good as using only one core!

The CPU thread performance degraded by only a tiny amount, so I think I’ve found my solution. :smiley:

We had a related discussion a while ago. It was about a realtime thread deadlocking with a render thread.

It seems that the display driver uses an extra thread with priority fixed somewhere between highest and normal. This would also explain your observer behaviour.