Difference between revisions of "Image Load Store"

From OpenGL.org
Jump to: navigation, search
(Memory coherency)
(Moved the coherency section to its own page, since it's shared.)
(2 intermediate revisions by the same user not shown)
Line 6: Line 6:
  
 
'''Image load/store''' is the ability of [[Shader]]s to more-or-less arbitrarily read from and write to images.
 
'''Image load/store''' is the ability of [[Shader]]s to more-or-less arbitrarily read from and write to images.
 +
 +
{{stub}}
  
 
== Overview ==
 
== Overview ==
Line 13: Line 15:
 
This can allow for a number of powerful features, including relatively cheap order-independent transparency.
 
This can allow for a number of powerful features, including relatively cheap order-independent transparency.
  
If you think that this is a great feature, remember that there is no such thing as a free lunch. The cost of using image load/store is in [[#Memory coherency|user-specified memory coherency.]] By using image load/store, you take up the responsibility to manage what OpenGL would manage for you using regular texture reads/[[Framebuffer Object|FBO]] writes.
+
If you think that this is a great feature, remember that there is no such thing as a free lunch. The cost of using image load/store is that all of its write operations are [[Memory Model#Incoherent memory access|not automatically coherent]]. By using image load/store, you take up the responsibility to manage what OpenGL would manage for you using regular texture reads/[[Framebuffer Object|FBO]] writes.
  
 
== Image variables ==
 
== Image variables ==
Line 24: Line 26:
  
 
== Memory coherency ==
 
== Memory coherency ==
 +
{{main|Memory Model#Incoherent memory access}}
  
Rendering is a [[Synchronization|very asynchronous process, but the OpenGL specification defines all of these operations to happen sequentially]]. Therefore, hardware and drivers jump through a few hoops in order to ensure sanity for the user.
+
Writes and atomic operations via image variables are not automatically coherent. Therefore, you must do things to ensure that writes have occurred before you can read those values.
 
+
For example, uploading data to an image will often be deferred to whenever the driver wants to get around to it. If you then render with that texture, the rendering operation itself will be deferred to when the texture upload is done. Similarly, if you write to images bound to a [[Framebuffer Object]] in one rendering operation, you can immediately issue a rendering operation that reads from those images (and writes to some ''other'' images, of course). The OpenGL implementation will automatically delay the second rendering command until the first has completed and flush internal caches so that texture reads will see the written data properly. And so forth.
+
 
+
This is brought up here because, by using image load/store, you are ''signing away your right to all of this.'' You must now manage this all yourself.
+
 
+
When you write to an image via an image store operation, any subsequent reads from almost anywhere are ''not'' guaranteed to see them. And by "almost anywhere", this includes:
+
 
+
* Image load operations from anywhere other than this particular shader invocation.
+
* Reads from the texture via {{apifunc|glGetTexImage}}, or if it is bound to an FBO, {{apifunc|glReadPixels}}.
+
* Texture reads via [[Sampler (GLSL)|samplers]].
+
* If the image was a buffer texture, ''any'' form of reading from that buffer, such as using it for a [[Vertex Buffer Object]] or [[Uniform Buffer Object]].
+
 
+
In short, you get almost nothing. Everything is asynchronous, and OpenGL will not protect you from this fact. All of these can be alleviated, but [[#Ensuring visibility|only ''specifically'' at the request of the user]]. It will not happen automatically.
+
 
+
=== Guarantees ===
+
 
+
Despite the above, there are some protections that OpenGL provides. What follows is a list of things that the specification does require image load/store operations to guarantee about when data will be accessible.
+
 
+
First, within a single shader invocation, if you write something to an image variable, it will always be visible to that variable for reading. You need not do anything special to make this happen. However, it is possible that, between writing and reading, another invocation may have stomped on that value.
+
 
+
Second, if a shader invocation is being executed, then the shader invocations necessary to execute it must have taken place. For example, in a fragment shader, you can assume that the vertex shaders to compute the vertices for the primitive being rasterized have completed. This is called a ''dependent invocation''. They get to have special privileges in terms of ordering.
+
 
+
{{warning|This only applies to the shader invocations ''directly responsible'' for this shader invocation. Being in a fragment shader does not mean that ''all'' vertex shaders in a rendering command have completed. Only the ones needed for this particular fragment shader invocation have been executed.}}
+
 
+
{{note|Geometry shaders have a caveat here. A GS may write multiple vertices and primitives. Therefore, you may only assume that the GS executed ''just far enough'' to write enough vertices needed to render the fragment shader's primitive.}}
+
 
+
Third, sometimes a fragment shader is executed for the sole purpose of computing derivatives for other shaders. All image store and atomic operations will be ignored by that invocation.
+
 
+
=== Invocation order and count ===
+
 
+
One problem with the above is what defines "subsequent invocations". OpenGL allows implementations a ''lot'' of leeway on the ordering of shader invocations, as well as the number of invocations. Here is a list of the rules:
+
 
+
# You may not assume that a vertex shader will be executed only once for every vertex you pass it. It may be executed multiple times for the same vertex. In indexed rendering scenarios, it is very possible for re-used indices to not execute the vertex shader a second or third time.
+
# The same applies to tessellation evaluation shaders.
+
# The number of fragment shader invocations generated from rasterizing a primitive depends on the pixel ownership test, whether early depth test is enabled, and whether the rendering is to a multisample buffer. When not using per-sample shading, the number of fragment shader invocations is undefined within a pixel area, but it must be between 1 and the number of samples in the buffer.
+
# Invocations of the same shader stage may be executed in ''any'' order. Even within the same draw call. This includes fragment shaders; writes to the framebuffer are ordered, but the actual fragment shader execution is not.
+
# Outside of invocations which are dependent (as defined above), invocations between stages may be executed in any order. This ''includes'' invocations launched by different rendering commands. While it is perhaps unlikely that two vertex shaders from different rendering operations could be running at the same time, it is also very ''possible'', so OpenGL provides no guarantees.
+
 
+
=== Ensuring visibility ===
+
 
+
The term "visibility" represents when someone can safely access the value written to an image from a shader invocation. There are two tools to ensure visibility; they are used to ensure visibility from two different contexts. There is the {{code|coherent}} qualifier and there is the {{apifunc|glMemoryBarrier}} function.
+
 
+
{{code|coherent}} is used on image variables, such that writes to {{code|coherent}} qualified variables will be read correctly by {{code|coherent}} qualified variables in another invocation. Note that this requires the {{code|coherent}} qualifier on ''both'' the writer and the reader; if one of them doesn't have it, then nothing is guaranteed.
+
 
+
Note that {{code|coherent}} does not ''ignore'' all of the prior rules. In order for a write to become visible to an invocation, it must first ''have happened''. Therefore, {{code|coherent}} can only really work if you know that the writing invocation has executed. Which primarily means dependent invocations, as stated above.
+
 
+
There are other times you can know that a write has happened. In [[Compute Shader]]s, the {{code|barrier}} function ensures that all other invocations in a work group have reached that point in the computation. This works for [[Tessellation Control Shader]]s as well, for all of the invocations in a patch. So you know that all invocations in a work group/patch have reached that point, so all prior writes have been written. You still need the {{code|coherent}} qualifier on both the reading and writing variable, but it works.
+
 
+
{{code|coherent}} alone is not enough however. You also need to use a memory barrier, to effectively let OpenGL know that you're finished writing a batch of things and want to make them visible to someone else. The functions for this are of the form {{code|memoryBarrier*}} (no relation to the {{apifunc|glMemoryBarrier}} API function). This is a small suite of functions, which represent different barrier cases. For image load/store, {{code|memoryBarrierImage}} is used to order image writes. Or you can use {{code|memoryBarrier}} to order all of these special writes.
+
 
+
Note that {{code|memoryBarrierImage}} requires GL 4.3/{{extref|compute_shader}}.
+
 
+
Atomic operations are always effectively {{code|coherent}}, due to their atomic nature (nothing can interfere with the read/modify/write operation). Memory barriers can still be employed if you wish to ensure the ordering between two separate atomic operations, but it is not necessary.
+
 
+
{{code|coherent}} is ''only'' useful in cases of shader-to-shader reading/writing where you can be certain of invocation order. If you want to establish visibility between two different rendering commands, you must use a much more powerful mechanism:
+
 
+
void {{apifunc|glMemoryBarrier}}}(GLbitfield {{param|barriers}});
+
 
+
This function is a way of ensuring the visibility of image load/store operations with a wide variety of OpenGL operations, as listed on the documentation page. The thing to keep in mind about the various bits in the bitfield is this: they represent the operation you want to be able to do ''after'' making the image load/store visible. This is the operation you want to ''see'' the load/store results.
+
 
+
For example, if you do some image load/store operations on a texture, and then want to read it back onto the CPU, you would use the {{enum|GL_TEXTURE_UPDATE_BARRIER_BIT​}}. If you did image load/store to a buffer, and then want to use it for vertex array data, you would use {{enum|GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT​}}. That's the idea.
+
 
+
Note that if you want image load/store operations from one command to be visible to image load/store operations from another command, you use {{enum|GL_SHADER_IMAGE_ACCESS_BARRIER_BIT​}}.
+
 
+
=== Guidelines and usecases ===
+
 
+
Here are some basic use cases and how to synchronize them properly.
+
 
+
; Read-only image variables
+
: If a shader only reads images, then it does not need any form of synchronization for visibility. Even if you modify objects via OpenGL commands ({{apifunc|glTexSubImage2D}}, for example) or whatever, OpenGL requires that image reads remain properly synchronized.
+
; {{code|barrier}} invocation write/read
+
: Use {{code|coherent}} if you use a mechanism like {{code|barrier}} to synchronize between invocations.
+
; Dependent invocation write/read
+
: If you have one invocation which is dependent on another (the vertex shaders used to generate a primitive used for a fragment shader), then you need to use {{code|coherent}} on the variables and invoke a {{code|memoryBarrier/Image}} as appropriate after you finish writing to the images of interest.
+
; Shader image write/read between rendering commands
+
: There is no need for {{code|coherent}} here. Just use {{apifunc|glMemoryBarrier|(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT​)}} between the two rendering/dispatch commands.
+
; Shader image writes, read by other OpenGL operations
+
: Again, {{code|coherent}} is not necessary. You must use a {{apifunc|glMemoryBarrier}} appropriate to the operation of interest.
+
  
 
== Limitations ==
 
== Limitations ==
 
{{stub}}
 
  
 
[[Category:OpenGL Shading Language]]
 
[[Category:OpenGL Shading Language]]
 
[[Category:Shaders]]
 
[[Category:Shaders]]

Revision as of 16:03, 9 November 2012

Image Load Store
Core in version 4.5
Core since version 4.2
Core ARB extension ARB_shader_image_load_store
EXT extension EXT_shader_image_load_store

Image load/store is the ability of Shaders to more-or-less arbitrarily read from and write to images.

Overview

The idea with image load/store is that the user can bind one of the images in a Texture to a number of image binding points (which are separate from texture image units). Shaders can read information from these images and write information to them, in ways that they cannot with textures.

This can allow for a number of powerful features, including relatively cheap order-independent transparency.

If you think that this is a great feature, remember that there is no such thing as a free lunch. The cost of using image load/store is that all of its write operations are not automatically coherent. By using image load/store, you take up the responsibility to manage what OpenGL would manage for you using regular texture reads/FBO writes.

Image variables

Formats and compatibility

Basic load store

Atomic operations

Memory coherency

Writes and atomic operations via image variables are not automatically coherent. Therefore, you must do things to ensure that writes have occurred before you can read those values.

Limitations