Difference between revisions of "Memory Model"
(Page for OpenGL memory model.)
Revision as of 14:57, 9 November 2012
A Memory Model defines the rules under which writes to various stored object data become visible to later reads to that data. For example, the memory model defines when writes to a Texture attached to a Framebuffer Object become visible to subsequent reads from that texture.
Under normal circumstances, a coherent memory model is rigidly enforced by OpenGL. In general, if you write to an object, any reads you do later will be visible. Because OpenGL behaves "as if" all operations happened in a specific sequence, it is up to the OpenGL implementation to make sure that subsequent writes have occurred when you issue a read.
For example, if you fire off a rendering command, OpenGL will not have finished executing that command by the time the function returns. If you issue a glReadPixels operation (without doing an asynchronous read) to read from the framebuffer, it is now on the OpenGL implementation to synchronize with all outstanding read operations. OpenGL will wait until all rendering is done, then perform the read and return.
Therefore, the basic rule of OpenGL is this: if you execute a command that writes some data or changes some state, it is immediate visible to any other command that can read this data.
What follows will be a list of exceptions to this basic rule.
Contexts and object state
Incoherent memory access
There are a number of advanced operations that perform what we call "incoherent memory accesses". :
- Writes (atomic or otherwise) via Image Load Store
- Writes (atomic or otherwise) via Shader Storage Buffer Objects
- Writes to variables declared as shared (but not patch)
When you perform any of these operations, any subsequent reads from almost anywhere are not guaranteed to see them. And by "almost anywhere", this includes (but is not limited to):
- Image load operations to that memory location from anywhere other than this particular shader invocation, using the specific image variable used to write the data.
- SSBO reading operations to that memory location from anywhere other than this particular shader invocation, using the specific buffer variable used to write the data.
- Reads from the texture via glGetTexImage, or if it is bound to an FBO, glReadPixels.
- Texture reads via samplers.
- If the image was a buffer texture, any form of reading from that buffer, such as using it for a Vertex Buffer Object or Uniform Buffer Object.
In short, you get almost nothing. Everything is asynchronous, and OpenGL will not protect you from this fact. All of these can be alleviated, but only specifically at the request of the user. It will not happen automatically.
Despite the above, there are some protections that OpenGL provides. What follows is a list of things that the specification does require incoherent memory accesses to guarantee about when data will be accessible.
First, within a single shader invocation, if you perform an incoherent memory write, it will always be visible to that variable for reading. You need not do anything special to make this happen. However, it is possible that, between writing and reading, another invocation may have stomped on that value.
Second, if a shader invocation is being executed, then the shader invocations necessary to cause that invocation must have taken place. For example, in a fragment shader, you can assume that the vertex shaders to compute the vertices for the primitive being rasterized have completed. This is called a dependent invocation. They get to have special privileges in terms of ordering.
Third, sometimes a fragment shader is executed for the sole purpose of computing derivatives for other shaders. All incoherent memory writes (as well as coherent memory writes) will be ignored by that invocation.
Invocation order and count
One problem with the above is what defines "subsequent invocations". OpenGL allows implementations a lot of leeway on the ordering of shader invocations, as well as the number of invocations. Here is a list of the rules:
- You may not assume that a vertex shader will be executed only once for every vertex you pass it. It may be executed multiple times for the same vertex. In indexed rendering scenarios, it is very possible for re-used indices to not execute the vertex shader a second or third time.
- The same applies to tessellation evaluation shaders.
- The number of fragment shader invocations generated from rasterizing a primitive depends on the pixel ownership test, whether early depth test is enabled, and whether the rendering is to a multisample buffer. When not using per-sample shading, the number of fragment shader invocations is undefined within a pixel area, but it must be between 1 and the number of samples in the buffer.
- Invocations of the same shader stage may be executed in any order. Even within the same draw call. This includes fragment shaders; writes to the framebuffer are ordered, but the actual fragment shader execution is not.
- Outside of invocations which are dependent (as defined above), invocations between stages may be executed in any order. This includes invocations launched by different rendering commands. While it is perhaps unlikely that two vertex shaders from different rendering operations could be running at the same time, it is also very possible, so OpenGL provides no guarantees.
The term "visibility" represents when someone can safely access the value written to an image from a shader invocation. There are two tools to ensure visibility; they are used to ensure visibility from two different contexts. There is the coherent qualifier and there is the glMemoryBarrier function.
coherent is used on image or buffer variables, such that writes to coherent qualified variables will be read correctly by coherent qualified variables in another invocation. Note that this requires the coherent qualifier on both the writer and the reader; if one of them doesn't have it, then nothing is guaranteed.
Note that coherent does not ignore all of the prior rules. In order for a write to become visible to an invocation, it must first have happened. Therefore, coherent can only really work if you know that the writing invocation has executed. Which usually means dependent invocations, as stated above.
There are other times you can know that a write has happened. In Compute Shaders, the barrier function ensures that all other invocations in a work group have reached that point in the computation. This works for Tessellation Control Shaders as well, for all of the invocations in a patch. So you know that all invocations in a work group/patch have reached that point, so all prior writes have been written. You still need the coherent qualifier on both the reading and writing variable, but it works.
coherent alone is not enough however. You also need to use a memory barrier, to effectively let OpenGL know that you're finished writing a batch of things and want to make them visible to someone else. The GLSL functions that do this have the word "memoryBarrier" in them (no relation to the glMemoryBarrier API function). The particulars of the function defines which reads or writes the function operates on:
- Provides a barrier for all of the below operations. This is the only function that doesn't require GL 4.3 or some 4.3 core extension.
- Provides a barrier for Atomic Counters.
- Provides a barrier for image variables.
- Provides a barrier for buffer variables.
- Provides a barrier for Compute Shader shared variables.
- Provides a limited barrier. It creates visibility for all incoherent memory operations, but only within a Compute Shader work-group. This can only be used in Compute Shaders.
Atomic Counter operations are always effectively coherent, due to their atomic nature (nothing can interfere with the read/modify/write operation). Memory barriers can still be employed if you wish to ensure the ordering between two separate atomic operations. But most uses of atomic counters don't need that.
Cross shader visibility
coherent is only useful in cases of shader-to-shader reading/writing where you can be certain of invocation order. If you want to establish visibility between two different rendering commands (which, as previously stated, have no ordering guarantees, you must use a much more powerful mechanism. This OpenGL function:
void glMemoryBarrier(GLbitfield barriers);
This function is a way of ensuring the visibility of incoherent memory access operations with a wide variety of OpenGL operations, as listed on the documentation page. The thing to keep in mind about the various bits in the bitfield is this: they represent the operation you want to make the incoherent memory access visible to. This is the operation you want to see the results.
For example, if you do some image store operations on a texture, and then want to read it back onto the CPU via glGetTexImage, you would use the GL_TEXTURE_UPDATE_BARRIER_BIT. If you did image load/store to a buffer, and then want to use it for vertex array data, you would use GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT. That's the idea.
Note that if you want image load/store operations from one command to be visible to image load/store operations from another command, you use GL_SHADER_IMAGE_ACCESS_BARRIER_BIT. There are other similar bits for other incoherent memory accesses.
Guidelines and usecases
Here are some basic use cases and how to synchronize them properly.
- Read-only variables
- If a shader only reads, then it does not need any form of synchronization for visibility. Even if you modify objects via OpenGL commands (glTexSubImage2D, for example) or whatever, OpenGL requires that reads remain properly synchronized.
- barrier invocation write/read
- Use coherent and an appropriate memoryBarrier* or groupMemoryBarrier call if you use a mechanism like barrier to synchronize between invocations. Remember that shared variables are incoherent, but the Tessellation Control Shaders outputs (per-vertex and per-patch) are coherent, so you don't need a memory barrier on top of the barrier.
- Dependent invocation write/read
- If you have one invocation which is dependent on another (the vertex shaders used to generate a primitive used for a fragment shader), then you need to use coherent on the variables and invoke an appropriate memoryBarrier* after you finish writing to the images of interest.
- Shader write/read between rendering commands
- One rendering command (or compute shader invocation) writes incoherently, and the other reads. There is no need for coherent here at all. Just use glMemoryBarrier with the appropriate access bit.
- Shader writes, other OpenGL operations read
- Again, coherent is not necessary. You must use a glMemoryBarrier appropriate to the operation of interest.