Not my post, but thought I’d toss in my understanding/suggestion anyway:
You HAVE to do glFlush AFTER the calculations, or else you get no parallelism at all. The glFlush forces resynchronization between the GPU and CPU. So if you draw out your scene and then immediately glFlush before your calculations, you just stall the CPU until the GPU can catch up. Then the GPU sits there doing nothing while the CPU updates the AI. That’s why the AI calculations occurred BEFORE the glFlush.
Actually, while I know that the GPU can do a lot of work without needing further CPU intervention, there must be some limit to it. If so, the other glXxxxx commands will be forced to stall along the way. I wonder if it wouldn’t be better to interleave your processing in between large portions of your rendering code to distribute the load. Of course, the better way to do that would be to have your rendering code and your parallel code in separate threads so that if the rendering code had to stall waiting for the GPU, the parallel code could get work done. If you want frame synchronization, you can still use a mutex/semaphore/conditionvariable/event or whatever else is handy to keep the two threads synchronized on a per-frame basis.
An extremely vague version would look like this:
void RenderThread(void)
{
while (!done)
{
RenderWorld();
glFlush();
glSwapBuffers();
WaitForSingleObject(AiReadyEvent);
SignalEvent(RenderReadyEvent);
}
}
void AIThread(void)
{
while (!done)
{
UpdateAiSystems();
SignalEvent(AiReadyEvent);
WaitForEvent(RenderReadyEvent);
}
}
OK, there’s probably all kinds of hideousness there. It would probably be better to have a third thread that received notifications as Render and AI became ready, and then released an event when both were ready. I was just trying to get the shortest possible version out there.
PLEASE NOTE : if you try to use the above code, note that the two threads perform their wait/signal in opposite orders. This is necessary to avoid deadlock. I do not know if the order I chose is “optimal”, or if it even matters. This approach does not scale up nicely to more than 2 threads. There really should be some kind of “manager” thread, like I said…
Also, thread programming is not for the faint of heart. Debugging gets entertaining. I wouldn’t recommend adding threading to an otherwise single-threaded application just for this parallelism. But if you’re multithreaded anyway, what the heck.
Lastly, it is possible that thread context switching will be so slow that you won’t be able to get anything useful done during the short stalls anyway. You’d just have to test and find out. Compare the render/calculate/flush single threaded performance to the multithreaded performance and see which is better.
As for SwapBuffers doing an implicit glFlush, I don’t know. I can imagine a way in which it wouldn’t need to, but using such a system would make time synchronization fairly difficult. I chose to ignore the issue in the preceding code.
Of course, I’m just a hobbyist OpenGL programmer, so any of you who do this “for real” can feel free to point out all of the things I’ve overlooked or misunderstood
Mac