Synchronization

From OpenGL Wiki
Jump to navigation Jump to search

Synchronization is the process of ensuring that the OpenGL rendering pipeline has fully issued or executed the commands that you have given it.

The OpenGL specification usually defines the behavior of commands such that they will execute in the order given, and that every subsequent command must behave as if all prior commands have completed and their contents are visible. However, it is important to keep in mind that this is how OpenGL is specified and behaves, not how it is implemented.

Asynchronous action[edit]

OpenGL Rendering Commands are assumed to execute be asynchronous. If you call any of the glDraw* functions to initiate rendering, it is not at all guaranteed that the rendering has finished by the time the call returns. Indeed, it is perfectly legitimate for rendering to not even have started when this function returns. The OpenGL specification allows implementations the freedom to get around to rendering commands whenever it is best for them.

This is not a weakness of OpenGL; it is a strength. It allows for many optimizations of the rendering command pathway. Issuing a command to the internal rendering command buffer can be a fairly slow process, due to a CPU transition (on x86 hardware) out of protected mode and into unprotected mode. This transition eats up a lot of cycles, so if the internal driver can store 30 rendering commands and then issue all of them with only one transition, this is faster than making one transition for each of the 30 rendering calls.

The OpenGL API however is defined to be synchronous; each command acts as if all prior commands had completed entirely. As such, the OpenGL API will appear synchronous even if it executes asynchronously. If you issue a command which depends on the results of a prior command, the implementation will prevent any execution overlap between those two commands.

There are several OpenGL functions that can pull data directly from client-side memory, or push data directly into client-side memory. Functions like glTexSubImage2D, glReadPixels, glBufferSubData and so forth.

Because OpenGL is defined to be synchronous, when any of these functions have returned, they must have finished with the client memory. When glReadPixels returns, the pixel data is in your client memory (unless you are reading into a buffer object). When glBufferSubData returns, you can immediately modify or delete whatever memory pointer you gave it, as OpenGL has already read as much as it wants.

Legacy Note: The only OpenGL functions that behave differently are functions that end in Pointer. When these use client-side memory (which is no longer permitted in core OpenGL), the pointers are stored for a period of time. During that period, they must remain valid.

These are usually used for rendering calls; in which case, once the rendering call has returned, the memory to be read from client data has been read. Modifications to client memory after the rendering call will only affect future rendering calls, not those that have already passed.

This is generally why Buffer Objects are better than using client memory for rendering. A rendering call with buffers does not have to handle the possibility of the user changing the memory later, so it can simply write a few tokens into the command stream. With client memory, it must copy all of the vertices out of the client-side arrays.

All of this means that OpenGL is a very asynchronous renderer, even if it is defined synchronously.

Command state[edit]

An OpenGL Rendering Command can be in one of three conceptual states: unissued, issued but not complete, and complete.

A command is unissued if the command has been given to the OpenGL driver, but the driver has not yet given the command to the hardware to actually execute. When the rendering hardware starts running out of actually issued commands to process, the OpenGL driver can take some of the unissued commands and issue them.

An issued but not complete command is one that has been given to the hardware, but the full results of the command are not yet ready. The hardware has a queue of these commands; unless there is a hardware fault of some kind, the hardware will execute all of the commands in that queue.

A command is complete when it is out of the pipeline entirely. For rendering commands, this means that its effects have been written to the Framebuffer, Transform Feedback buffers, or other outputs, as appropriate to the current state. For pixel transfers to buffer objects, this means that the pixel data is now stored in the buffer object as requested. For pixel transfers from buffer objects, this means that the pixel data is now stored in the texture object that was uploaded to. And so forth.

Synchronization options[edit]

Asynchronous rendering is nice. However, it is often useful to synchronize your actions with OpenGL. And OpenGL provides several alternatives for doing so.

Explicit synchronization[edit]

OpenGL provides two simple mechanisms for explicit synchronization: glFinish and glFlush.

The simplest to understand is glFinish. It will not return, stopping the current CPU thread, until all rendering commands that have been sent have completed.

The behavior of glFlush is less simple to define.

Conceptually, the GPU has something called a "command queue". This is a list of commands written by the OpenGL driver at the behest of the user. Just about every OpenGL function will map to one or more commands that will be added to the command queue. Any command that is placed in the command queue will be read by the GPU and executed.

However, the command queue has a finite length. If you add too many commands in a short space of time, the driver cannot write them all to the GPU's command queue. What the driver can do is write them to internal memory. These commands are in the "unissued" state. Sometime later, the unissued commands are added to the GPU's queue. When the driver does this is the question.

The driver may set up some kind of asynchronous message that tells it when the GPU's queue is nearly empty so that it can add more if possible. However, this is generally not the case. OpenGL allows the driver to have the freedom to not make this check until you actually execute an OpenGL call. And even then, not all calls will make this check.

What this means is that it is theoretically possible for OpenGL to be sitting there, with lots of unissued commands in the driver's buffer, but with the GPU command queue being totally empty. The driver knows that there is work, but if you don't execute another OpenGL command (any command) for a while, it can never verify this check.

Normally this isn't a problem. But imagine this circumstance. You render a lot of stuff, all in a short space of time. You sort your data to achieve maximum efficiency, so submitting the data takes less time than rendering it.

Because you add a lot of commands in a short space of time, the driver has to buffer many of these commands. But if you don't make any OpenGL calls after you have submitted all of the data (because you're off doing non-OpenGL stuff), then the driver never has the chance to push these buffered commands into the command queue. Obviously, you will be issuing commands next frame, but you'd probably rather not wait that long for the GPU to start rendering.

The purpose of glFlush is to tell OpenGL to sit there and wait, halting the current CPU thread, until all commands have been added to the GPU's command queue. This won't take as long as glFinish, but it can still be time consuming.

Implicit synchronization[edit]

Some operations implicitly force a full glFinish synchronization, while others force the user to halt until all commands up to a certain point have completed. And some force a glFlush.

Swapping the back and front buffers on the Default Framebuffer may cause some form of synchronization (though the actual moment of synchronization event may be delayed until later GL commands), if there are still commands affecting the default framebuffer that have not yet completed. Swapping buffers only technically needs to sync to the last command that affects the default framebuffer, but it may perform a full glFinish.

Any attempt to read from a framebuffer to CPU memory (not to a buffer object) will halt until all rendering commands affecting that framebuffer have completed. Most attempts to write to a buffer object, either with glBufferSubData or mapping, will halt until all rendering commands using that part of the buffer object have completed. However, if you invalidate the buffer object before uploading to it, the implementation will be able to allocate new storage for the buffer and simply orphan the old one (deleting it later when it is no longer used). This will allow the buffer object to be immediately available for uploading new data. For more details, see this page on buffer streaming.

If you use the GL_MAP_UNSYNCHRONIZED_BIT flag with glMapBufferRange, OpenGL will forgo any synchronization operations; you are on your own however as to the consequences of modifying any parts of that buffer that may be in use.

Similarly, attempts to change texture data from CPU memory with commands like glTexSubImage2D can block until commands that use that texture have finished. They may not block, as some implementations will just allocate some CPU memory and copy the user's pixel data into that. They will do the DMA directly to the texture some time later. Modifying texture data with Pixel Buffer Objects will not force a synchronization; the transfer from the buffer to the texture will simply be deferred until the texture is no longer in use.

There may be a few commands that, on some implementations, cause a synchronization to some point in the command stream. OpenGL does not require these commands to do so, but implementations are free to do so if they deem it necessary. Framebuffer object binding and rendering may cause a sync to the last command that affected the previously bound framebuffer object.

Sync Objects[edit]

glFinish is a decent start on synchronization. However, it is often useful to be able to do the kind of synchronization that OpenGL itself does implicitly. That is, being able to sync to a specific point in the command stream.

You do this by creating a fence object. This is a token in the command stream that you can test to see if it has been completed. Since the stream is an ordered list, if the fence has completed, then every command issued before that fence was issued has also completed.

Reference[edit]