glFlush & glFinish

Hello all,

Do I need to call either/both of these at the end of my game tick (after the rendering is done)?

I’ve seen a lot of conflicting opinions on the issue…

cheers,
g.

Flush forces all previous call to GL functions to start to be executed whereas Finish will wait until all the previous calls are all executed.

So it depends on your needs, but no need for both.

Redbook says that:
glFinish() flushes buffers and forces commands to begin execution as glFlush() does, but glFinish() blocks other OpenGL commands and waits for all execution is complete. Consequently, glFinish() does not return to your program until all previously called commands are completed.

glFinish() might be used to synchronize tasks or to measure exact elapsed time that certain OpenGL commands are executed.

The only use I had for glFinish was when measuring the performance of my rendering frame.

Actually you want to avoid both of them. Flush doesn’t seem to do anything at all and Finish produces a stall.

Many commercial games suffer from not having glFinish.
All OpenGL commands issued by application are placed in command queue. If the queue is large enough to store rendering calls for multiple animation frames, then your CPU will be ahead of GPU by a few frames.
So placing glFinish just before you issue rendering commands for next frame can be a good thing.

I never needed glFlush or glFinish, except for an extremely simple scene which was rendered at a ridiculous framerate, which somehow caused my app to stop for a short moment every 5 seconds or so. Calling glFlush solved the problem, but wasn’t necessary for larger scenes.
So I think glFlush isn’t really necessary, but won’t hurt.
Not sure about glFinish, though.

Both glFlush and glFinish have there uses as others have mentioned above. One particular use for glFlush is front buffer rendering. Without calling glFlush, there is no guarantee that that the commands will ever be executed. (swap implicitly flushes) Some applications have found themselves in the following situation:

Render to back-buffer
swap
render a small amout of UI to front buffer
repeat

The problem is that they never saw most if any of the UI. The reason was that the front buffer rendering didn’t start until the next swap had been issued, and by the time it showed up, it was getting obliterated by the next swap. The right way to do it is:

Render to back-buffer
swap
render a small amout of UI to front buffer
flush
repeat

As for glFinish, I highly recommend avoiding the glFinish per-frame trick mentioned above. In very specific cases, it might be helpful to do what is mentioned to closely synchronize input and results. However, there is a very high cost for implementing the method described above. The stall of the CPU to wait for the graphics engine to catch up, is followed by the GPU being stalled waiting for the CPU to build a large enough unit for it to work on. The result is that you have hurt the efficiency of both the CPU and the GPU.

-Evan

As long as you swap you should be OK.

Flush will dispatch graphics calls you have made that may be in a command queue awaiting dispatch to the graphics hardware/firmware, so if you’re not going to send graphics in a while but you want to ensure when you come back that drawing is complete or you haven’t wasted graphics idle time it is useful to flush.

Finish ensures all graphics rendering has been competed even to the pixel level and will block the thread execution until this is the case, so if you want to sync your CPU to graphics and ensure that all rendering you have performed up to that point is absolutely done this is your call. Some graphics calls implicitly finish, for example glReadPixels from the buffer you’re rendering to (technically this is implementations specific an one could imagine overly complex optimizations where finish is not essential).

glFinish can be used to reduce latency of the input and runtime loop, glFlush can ensure some particularly fill heavy operation is issued like skybox to keep the card busy before you hog the CPU but usually they will hurt your framerate, used together they can be effective. For example, glfinish, clear screen, poll input, update eye, draw skybox, flush, update most game calculations, draw scene, swap; but of course if you’re geometry heavy in your scene you’ve lost a chunk of T&L that might have been dispatched or transformed to an internal FIFO. The guys who make the graphics cards are overly obsessed with fps and buffering multiple frames, you as a software developer can worry about the overall user experience, and latency, it is worth trading some fps for latency so long as you know what you’re doing.

To be fair most of this has to do with parallel execution on a single CPU system, if you have graphics calls vying for CPU cycles with your game code then your easiest way to keep it efficient is simply avoid blocking in the graphics driver and hope the application can outpace the driver. That way your in reasonably good shape w.r.t. keeping the GPU busy. If you dick around with this you’re asking for trouble unless you know what you’re doing or are confident in your priorities for your user experience.

ehart, that’s a horrible piece of graphics code and aside from the questionable wisdom of switching to front buffered rendering like that on buffer copy platforms the fix risks at least occasional flicker. Heck if you were vsynched you might be OK or guaranteed to have a problem depending on how much you were drawing and where on the screen it was. Bad, bad code and the sort of stuff that breaks mysteriously leaving the poor intern who inherits is to clean up the mess. Maybe there’s marginal justification on something like perfhud.

I seem to have left out some important details on my front buffer suggestion. I was not reccomending the general code above, the suggestion was intended to be that if you were rendering to the frontbuffer (for a valid reason) that it can be important to call glFlush after doing so to ensure that your results get displayed. This can be especially true on compositing desktops, because the window system will need some notification that you rendered. As dorbie pointed out, front buffer rendering has some general issues, and I would generally suggest avoiding it for a better user experience.

As for the suggestions on latency and glFlush/glFinish, I have a couple alterations to suggest to the loop dorbie posed.

  1. Try to call glFinish as close to when you are to send additional drawing as possible. This minimizes the chances of running the GPU at full speed while the CPU is idle and vice versa.

  2. Avoid rendering your skybox first. It is much more efficient to draw it after all opaque geometry in the scene, because 50% or more of its pixels will end up being overwritten anyway.

  3. Be careful about throwing flushes into the stream. As dorbie mentioned, there is cost to them.

Finally, I would like to offer an alternative synchronization mechanism to the heavy handed finish others are recommending. If find you are in need of improved latency, a way to reliably do it without forcing a complete synchronization of the GPU and CPU is to send occlusion queries. If you send a single query per frame, you can then wait for that query result to be returned at a later time. The synchronus wait will allow you to synchronize to a place in the rendering stream other than the last command. This allows you to ensure that frame n-2 completed before you start frame n. Meanwhile, the GPU does not need to sit idle, and it can process frame n-1. This can improve CPU and GPU utilization greatly over the glFinish method.

Again, I do not recommend adding these sorts of things from the beginning. I suggest doing these only if you determine you need them. Evem if you do determine you need it, I suggest leaving it as optional.

-Evan

I suggest doing these only if you determine you need them. Evem if you do determine you need it, I suggest leaving it as optional.
I would say that you shouldn’t determine if you need them - you shuld assume some of your application users will need them and leave them optional (turned off by default). You newer know if future GPU’s will have ability to queue more frames than those you develop on.

It’s true that glFinish can be costly. I removed glFinish from my game last month and gained 2-3% of performance. It didn’t introduce any latency since I have glGetTexImage in my code (I read one 64x64 block per frame) which keeps latency less than 1/3 of frame (it’s the gap I left between rendering to this texture and reading it). NVIDIA drivers are capable of sending me texture contents in the middle of some other rendering operation without stalling CPU.
If not for glGetTexImage I would probably use your suggestion about occlusion queries or look for another, more backward-compatible solution.

I think that using occlusion queries for synchronization is a good idea. I have one suggestion though - you can place occlusion query to cover most of the frame (from beginning to about 70-80%) and wait for result at the end of frame - this way, CPU is informed, that current frame is nearly complete and can go on with the next one.