Image Load Store

Revision as of 10:52, 2 November 2012 by Alfonse (talk | contribs) (Guidelines and usecases)

Jump to: navigation, search
Image Load Store
Core in version 4.5
Core since version 4.2
Core ARB extension ARB_shader_image_load_store
EXT extension EXT_shader_image_load_store

Image load/store is the ability of Shaders to more-or-less arbitrarily read from and write to images.


The idea with image load/store is that the user can bind one of the images in a Texture to a number of image binding points (which are separate from texture image units). Shaders can read information from these images and write information to them, in ways that they cannot with textures.

This can allow for a number of powerful features, including relatively cheap order-independent transparency.

If you think that this is a great feature, remember that there is no such thing as a free lunch. The cost of using image load/store is in user-specified memory coherency. By using image load/store, you take up the responsibility to manage what OpenGL would manage for you using regular texture reads/FBO writes.

Image variables

Formats and compatibility

Basic load store

Atomic operations

Memory coherency

Rendering is a very asynchronous process, but the OpenGL specification defines all of these operations to happen sequentially. Therefore, hardware and drivers jump through a few hoops in order to ensure sanity for the user.

For example, uploading data to an image will often be deferred to whenever the driver wants to get around to it. If you then render with that texture, the rendering operation itself will be deferred to when the texture upload is done. Similarly, if you write to images bound to a Framebuffer Object in one rendering operation, you can immediately issue a rendering operation that reads from those images (and writes to some other images, of course). The OpenGL implementation will automatically delay the second rendering command until the first has completed and flush internal caches so that texture reads will see the written data properly. And so forth.

This is brought up here because, by using image load/store, you are signing away your right to all of this. You must now manage this all yourself.

When you write to an image via an image store operation, any subsequent reads from almost anywhere are not guaranteed to see them. And by "almost anywhere", this includes:

In short, you get almost nothing. Everything is asynchronous, and OpenGL will not protect you from this fact. All of these can be alleviated, but only specifically at the request of the user. It will not happen automatically.


Despite the above, there are some protections that OpenGL provides. What follows is a list of things that the specification does require image load/store operations to guarantee about when data will be accessible.

First, within a single shader invocation, if you write something to an image variable, it will always be visible to that variable for reading. You need not do anything special to make this happen. However, it is possible that, between writing and reading, another invocation may have stomped on that value.

Second, if a shader invocation is being executed, then the shader invocations necessary to execute it must have taken place. For example, in a fragment shader, you can assume that the vertex shaders to compute the vertices for the primitive being rasterized have completed. This is called a dependent invocation. They get to have special privileges in terms of ordering.

Warning: This only applies to the shader invocations directly responsible for this shader invocation. Being in a fragment shader does not mean that all vertex shaders in a rendering command have completed. Only the ones needed for this particular fragment shader invocation have been executed.
Note: Geometry shaders have a caveat here. A GS may write multiple vertices and primitives. Therefore, you may only assume that the GS executed just far enough to write enough vertices needed to render the fragment shader's primitive.

Third, sometimes a fragment shader is executed for the sole purpose of computing derivatives for other shaders. All image store and atomic operations will be ignored by that invocation.

Invocation order and count

One problem with the above is what defines "subsequent invocations". OpenGL allows implementations a lot of leeway on the ordering of shader invocations, as well as the number of invocations. Here is a list of the rules:

  1. You may not assume that a vertex shader will be executed only once for every vertex you pass it. It may be executed multiple times for the same vertex. In indexed rendering scenarios, it is very possible for re-used indices to not execute the vertex shader a second or third time.
  2. The same applies to tessellation evaluation shaders.
  3. The number of fragment shader invocations generated from rasterizing a primitive depends on the pixel ownership test, whether early depth test is enabled, and whether the rendering is to a multisample buffer. When not using per-sample shading, the number of fragment shader invocations is undefined within a pixel area, but it must be between 1 and the number of samples in the buffer.
  4. Invocations of the same shader stage may be executed in any order. Even within the same draw call. This includes fragment shaders; writes to the framebuffer are ordered, but the actual fragment shader execution is not.
  5. Outside of invocations which are dependent (as defined above), invocations between stages may be executed in any order. This includes invocations launched by different rendering commands. While it is perhaps unlikely that two vertex shaders from different rendering operations could be running at the same time, it is also very possible, so OpenGL provides no guarantees.

Ensuring visibility

The term "visibility" represents when someone can safely access the value written to an image from a shader invocation. There are two tools to ensure visibility; they are used to ensure visibility from two different contexts. There is the coherent​ qualifier and there is the glMemoryBarrier function.

coherent​ is used on image variables, such that writes to coherent​ qualified variables will be read correctly by coherent​ qualified variables in another invocation. Note that this requires the coherent​ qualifier on both the writer and the reader; if one of them doesn't have it, then nothing is guaranteed.

Note that coherent​ does not ignore all of the prior rules. In order for a write to become visible to an invocation, it must first have happened. Therefore, coherent​ can only really work if you know that the writing invocation has executed. Which primarily means dependent invocations, as stated above.

There are other times you can know that a write has happened. In Compute Shaders, the barrier​ function ensures that all other invocations in a work group have reached that point in the computation. This works for Tessellation Control Shaders as well, for all of the invocations in a patch. So you know that all invocations in a work group/patch have reached that point, so all prior writes have been written. You still need the coherent​ qualifier on both the reading and writing variable, but it works.

coherent​ alone is not enough however. You also need to use a memory barrier, to effectively let OpenGL know that you're finished writing a batch of things and want to make them visible to someone else. The functions for this are of the form memoryBarrier*​ (no relation to the glMemoryBarrier API function). This is a small suite of functions, which represent different barrier cases. For image load/store, memoryBarrierImage​ is used to order image writes. Or you can use memoryBarrier​ to order all of these special writes.

Note that memoryBarrierImage​ requires GL 4.3/ARB_compute_shader.

Atomic operations are always effectively coherent​, due to their atomic nature (nothing can interfere with the read/modify/write operation). Memory barriers can still be employed if you wish to ensure the ordering between two separate atomic operations, but it is not necessary.

coherent​ is only useful in cases of shader-to-shader reading/writing where you can be certain of invocation order. If you want to establish visibility between two different rendering commands, you must use a much more powerful mechanism:

void glMemoryBarrier(GLbitfield barriers​);

This function is a way of ensuring the visibility of image load/store operations with a wide variety of OpenGL operations, as listed on the documentation page. The thing to keep in mind about the various bits in the bitfield is this: they represent the operation you want to be able to do after making the image load/store visible. This is the operation you want to see the load/store results.

For example, if you do some image load/store operations on a texture, and then want to read it back onto the CPU, you would use the GL_TEXTURE_UPDATE_BARRIER_BIT​. If you did image load/store to a buffer, and then want to use it for vertex array data, you would use GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT​. That's the idea.

Note that if you want image load/store operations from one command to be visible to image load/store operations from another command, you use GL_SHADER_IMAGE_ACCESS_BARRIER_BIT​.

Guidelines and usecases

Here are some basic use cases and how to synchronize them properly.

Read-only image variables
If a shader only reads images, then it does not need any form of synchronization for visibility. Even if you modify objects via OpenGL commands (glTexSubImage2D, for example) or whatever, OpenGL requires that image reads remain properly synchronized.
barrier​ invocation write/read
Use coherent​ and an appropriate memoryBarrier/Image​ call if you use a mechanism like barrier​ to synchronize between invocations.
Dependent invocation write/read
If you have one invocation which is dependent on another (the vertex shaders used to generate a primitive used for a fragment shader), then you need to use coherent​ on the variables and invoke a memoryBarrier/Image​ as appropriate after you finish writing to the images of interest.
Shader image write/read between rendering commands
There is no need for coherent​ here. Just use glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT​) between the two rendering/dispatch commands.
Shader image writes, read by other OpenGL operations
Again, coherent​ is not necessary. You must use a glMemoryBarrier appropriate to the operation of interest.