PDA

View Full Version : barrier() in tess control shaders



prideout
10-15-2011, 08:42 PM
Some questions about the barrier() instruction:

(1) It implies that all control shader invocations within a given patch do not necessarily have the same program counter, which goes against my intuition. Given the 32 limit on patch size, I assumed that each patch is processed in a single warp.

(2) The only reason you'd need to synchronize threads is if the shader had RW access to a shared memory space. The "patch" qualifier can be applied only to "out" variable, not to temporaries. Wouldn't it be useful to apply "patch" to temporaries?

(3) What's the equivalent to barrier() in D3D hull shaders?

(4) When I tried to write a highly efficient control shader that makes use of patch-level shared memory (i.e., patch out) and barrier(), I ran into driver issues with both major vendors. Has anyone out there had any better luck than me?

prideout
10-17-2011, 02:07 PM
My post was horribly written. Let me rephrase:

What is the raison d'Ítre of the barrier() instruction?

The answer in my head was something like this:

Given a "patch out float foo", barrier() can provide a synchronization point between invocations that write to foo, and invocations that read from foo.

However, this answer implies that using "patch out" as scratch space is a supported operation. AMD and NVIDIA both seem to be allocating intermediate results into per-thread registers.

Dark Photon
10-17-2011, 04:19 PM
What is the raison d'Ítre of the barrier() instruction? ... The answer in my head was something like this: Given a "patch out float foo", barrier() can provide a synchronization point between invocations that write to foo, and invocations that read from foo.

Here's the spec language:


A tessellation control shader may also read the per-vertex outputs of other tessellation control shader invocations, as well as read and write shared per-patch outputs. The tessellation control shader invocations for a single patch effectively run as a group. A built-in barrier() function is provided to allow synchronization points where no shader invocation will continue until all shader invocations have reached the barrier.

So sounds alot like what you said and what I'd expect. Similar to barrier( CLK_LOCAL_MEM_FENCE ) in OpenCL, __syncthreads() in CUDA, and memoryBarrier() in ARB_shader_image_load_store (http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt)/OpenGL 4.2.

When you have some threads potentially operating on the data previously read/written by other threads, you need a way to ensure the data is in place (a sync point) before you let the threads continue and start using the data.

prideout
10-17-2011, 05:17 PM
Cool, thanks Dark Photon.

With AMD and NVIDIA, writing to a "patch out" variable from invocation A and reading back from invocation B does not work, even with an intermediary barrier(). The shader compiles without errors, but I see garbled rendering.

Their compilers are probably (incorrectly) allocating intermediary temporaries into a per-thread register space.