to glFlush() or not to glFlush()...

Im trying to figure out conditions/reasons for implementation of the glFlush() command. I started this topic on another development site and am not totaly satisfied with the answers im getting…

here is the URL for the forum: CLICKY

Anyone here have any ideas/suggestions? Thank you in advance…

As noted in the other forum, glFlush() is used to force the execution of queued commands. Whether there are actually queued commands or not, you have absolutely no way to know. It is implementation-dependent and not query-able.

EDIT : to elaborate a bit, the fact that Flush returns immediately means it could be used to a certain extent to parallelize work between CPU and GPU. Example :

void loop()
{
  lots_of_rendering_commands();
  glFlush();
  lots_of_other_tasks();
  SwapBuffers();
}

With this loop, you hope that calls in you rendering function are non blocking and mostly executed by the GPU, so that once you’ve sent the commands, you can use the CPU for other tasks.
Now it is rather obvious that an implementation that gets a lot of commands won’t sit idle waiting forever (for obvious performance reasons; nobody wants jerky animation because of command queueing). Even then, if it is still waiting after this, the fact that you don’t send more rendering commands in a relatively long time (other tasks being done) should eventually trigger the execution of the queued commands sooner or later. Lastly, when the SwapBuffer calls is executed, it can be assumed that the driver has to complete pending commands before swapping the buffers.
To sum up, the glFlush() call in the code snippet above is mostly cosmetic, ie all the pseudo-parallelism you hope to get would probably be exactly the same without the glFlush() call.

glFlush() can’t be used for synchronization either, since you know that work on the GL side starts, but you never know when it finishes (commands execution is guarenteed to “complete in finite time”), so you can’t really efficiently mix CPU/GL commands.

glFinish on the other hand ensures that all previous work is completed before it returns. But of course it is a blocking call, so even if the work executed by the GL is done on the GPU, you can’t take advantage of the free CPU time.

I believe that glflush() is the same as SwapBuffers(). You use glFlush() when you are only using one drawing buffer where SwapBuffers() is used for double buffering.

From this point of view, it would be glFinish() being equivalent to SwapBuffers(), not glFlush(). As I suggested in my previous post, SwapBuffers() does an implicit glFinish() before actually swapping the buffers.

I agree with geohoffman49431: The only time you should have to do a glFlush() is when you’re rendering to the front buffer and you want to see your results. In all other cases you’re probably better off letting the driver figure out the best times to flush commands to achieve parallelization.

kehziah: SwapBuffers() doesn’t have to do a glFinish() before swapping buffers. True, all commands must be finished rendering before the buffers are swapped, however the swap request can be enqueued just like other rendering commands can be enqueued, so it’s most accurate to say that SwapBuffers() does an implicit glFlush() but not a glFinish().

glFlush is supposed to send queued commands to the hardware, that’s it. Strictly speaking, it just guarantee you that your results will be displayed in finite time (in the front buffer rendering case you mention). You don’t even know when it will return, immediately, at some undeterminate point in the processing of queued commands, or when all the work is done.
Bottom line is glFlush is so poorly defined it’s barely useful in anything but the networked case, otherwise you have to rely on assumptions about the implementation.

You’re right, SwapBuffers can be queued as well, so it makes it (somehow) equivalent to a glFlush. I was wrong.

The problem with the queueing of swapbuffers is latency. How many full frames will you queue before blocking? This could be related to the size of the FIFOs, but they are quite large by now, so that’s more of an implementor’s choice than a technical limitation that will decide how many frames you queue.

If you’re after benchmark numbers, chances are the more queueing the better. When you have the full list of commands for a frame, you have plenty of opportunities to optimize, and even more if you have several frames (analyze most used data to put in fast memory etc).

If you’re after accuracy (as in the case where what you draw is related to some time-driven process (e.g. user input, physics simulation etc…)), you might not want to queue that many frames, because what will be displayed at a given time will be the graphical result of an event that happened a noticeable amount of time before.

As I said, it’s a choice the implementor has, and seeing how most people are after insignificant framerate “improvements”, you can guess how most implementation work in this area.