I am currently trying to understand the differences between the coherent and volatile qualifier. First, some quotes (in code tags for a better formatting) from the ARB_shader_image_load_store extension doc:
Short description of coherent and volatile:
Code :Qualifier Meaning ------------ ------------------------------------------------- coherent memory variable where reads and writes are coherent with reads and writes from other shader invocations volatile memory variable whose underlying value may be changed at any point during shader execution by some source other than the current shader invocation
Long description of coherent:
Code :Memory accesses to image variables declared using the "coherent" storage qualifier are performed coherently with similar accesses from other shader invocations. In particular, when reading a variable declared as "coherent", the values returned will reflect the results of previously completed writes performed by other shader invocations. When writing a variable declared as "coherent", the values written will be reflected in subsequent coherent reads performed by other shader invocations. As described in the Section 2.20.X of the OpenGL Specification, shader memory reads and writes complete in a largely undefined order. The built-in function memoryBarrier() can be used if needed to guarantee the completion and relative ordering of memory accesses performed by a single shader invocation. When accessing memory using variables not declared as "coherent", the memory accessed by a shader may be cached by the implementation to service future accesses to the same address. Memory stores may be cached in such a way that the values written may not be visible to other shader invocations accessing the same memory. The implementation may cache the values fetched by memory reads and return the same values to any shader invocation accessing the same memory, even if the underlying memory has been modified since the first memory read. While variables not declared as "coherent" may not be useful for communicating between shader invocations, using non-coherent accesses may result in higher performance.
Long description of volatile:
Code :Memory accesses to image variables declared using the "volatile" storage qualifier must treat the underlying memory as though it could be read or written at any point during shader execution by some source other than the executing shader invocation. When a volatile variable is read, its value must be re-fetched from the underlying memory, even if the shader invocation performing the read had previously fetched its value from the same memory. When a volatile variable is written, its value must be written to the underlying memory, even if the compiler can conclusively determine that its value will be overwritten by a subsequent write. Since the external source reading or writing a "volatile" variable may be another shader invocation, variables declared as "volatile" are automatically treated as coherent.
Code :(26) What sort of qualifiers should we provide relevant to memory referenced by image variables? RESOLVED: We will support the qualifiers "coherent", "volatile", "restrict", and "const" to be used in image variable declarations. "coherent" is used to ensure that memory accesses from different shader invocations are cached coherently (i.e., one invocation will be able to observe writes from another when the other invocation's writes complete). This coherence may mean the use of "coherent"-qualified image variables may perform more slowly than of otherwise equivalent unqualified variables. "volatile" behaves as in C, and may be needed if an algorithm requires reading image memory that may be written asynchronously by other shader invocations.
My understanding of their uses:
- only useful for dependent shader invocations (e.g. fragment shader invocations generated from a complete primitive after vertex shader has processed its vertices)
- memoryBarrier() function goes hand-in-hand with this qualifier (it does a cache/shared memory flush on coherent qualified variables and determines order of memory accesses), you can say when to flush (btw: is there an implicit memoryBarrier() call at the end of the shader, when there are coherent qualified variables and no memoryBarrier() was specified in the shader?)
- non-coherent qualified variables might be L-cached or resident in shared memory and hence (dependent) spawning threads on other SIMD processors might not observe their values directly
- use-case: e.g. read values from an image in a dependent shader invocation, which were written by an invocation in a previous shader stage (values might still be cached, so have to be flushed via memoryBarrier())
- coherent is implicitly inherent
- always fetches values directly from global memory (no caching)
- always writes values directly to global memory (no caching)
- might be more expensive than coherent (absolutely no temporary caching/memory access optimizations allowed)
- use-case: e.g. atomically increment a texel in an image (volatile qualified) for independent shader invocations (shaders might be executed on different SIMD processors and use shared memory for atomically incrementing)
- memoryBarrier() only useful for avoiding (compiler) memory access reordering, since volatile already guarantees a direct write to global memory
Other important points I forgot? Or are there any errors in my understanding of the doc? Let us collect some more facts for a better understanding of those two qualifiers.