[QUOTE=obfuscator;1260667]I want to store a computed value for every quad into an image via vertex shader. The store should only be executed for the first vertex of a quad…
Now the question: Is it safe to assume (defined behaviour) that this stored value is visible for all fragments of the triangle strip / quad or can it happen that it is only visible for one triangle of the strip and the other uses an obsolete or undefined value?[/QUOTE]
I’m sure no expert on side-effect synchronization, but from skimming ARB_shader_image_load_store, I don’t see anything that guarantees that side-effect writes in a vertex shader will be synchronized before reads from a resulting fragment shader. This text is relevent:
Shader Memory Access Ordering
The order in which texture or buffer object memory is read or written by shaders is largely undefined…
- While a vertex … shader will be executed at least once for each unique vertex specified by the application …, it may be executed more than once for implementation-dependent reasons…
- The relative order of invocations of different shader types is largely undefined. However, when executing a shader whose inputs are generated from a previous programmable stage, the shader invocations from the previous stage are guaranteed to have executed far enough to generate final values for all next-stage inputs…
Shader Memory Access Synchronization
…To permit cases where textures or buffers may be read or written in different pipeline stages without the overhead of automatic synchronization, buffer object and texture stores performed by shaders are not automatically synchronized with other GL operations using the same memory…
- SHADER_IMAGE_ACCESS_BARRIER_BIT: Memory accesses using shader image load, store, and atomic built-in functions issued after the barrier will reflect data written by shaders prior to the barrier. Additionally, image stores and atomics issued after the barrier will not execute until all memory accesses (e.g., loads, stores, texture fetches, vertex fetches) initiated prior to the barrier complete.
…
The following guidelines may be helpful in choosing when to use coherent memory accesses and when to use barriers
…
- Data written by one shader invocation and consumed by other shader invocations launched as a result of its execution (“dependent invocations”) should use coherent variables in the producing shader invocation and call memoryBarrier() after the last write. The consuming shader invocation should also use coherent variables.
I’d pay close attention to the last one. Also this text from the GLSL 4.4 Specification/User’s Manual seems pertinent:
8.17 Shader Memory Control Functions [aka memoryBarrier* functions]
…void memoryBarrierImage () - Control the ordering of memory transactions to images
issued within a single shader invocation…
When these functions return, the results of any memory stores performed using coherent variables performed prior to the call will be visible to any future coherent access to the same memory performed by any other shader invocation. In particular, the values written this way in one shader stage are guaranteed to be visible to coherent memory accesses performed by shader invocations in subsequent stages when those invocations were triggered by the execution of the original shader invocation (e.g., fragment shader invocations for a primitive resulting from a particular geometry shader invocation).
So it sounds like, if you’re in a frag shader, you can (with some special sauce, as described above) access image information stored by one of the verts causing that fragment to be created. But if it’s not one of the prompting verts, you can’t make that assumption. Since the GPU rasterizes tris and not quads, I think you might be running afoul of the rules with what you’re trying to do.
Couple alternative options come to mind: first, just generate the data in the vertex shader and send it down normally using interpolators. Or, just send a single point down the pipe, gen your data in the vtx shader (the data you were going to image store for the frag shader executions), and then use a geom shader to spread the point into a quad, giving all resulting verts (and their frags) access to this shared data without any image load/store hocus pocus, then pass that down to the fragments through normal interpolators.