PDA

View Full Version : Compute Shader - correct memory barrier usage



ewanRi
08-16-2016, 11:36 AM
I'm hoping to be able to read and write to potentially the same elements of a SSBO as part of a fluid sim using compute shaders but I'm having trouble with syncing. I have a test shader that is run 16 times, with three options below that hopefully shows what I'm trying to do.


layout (std430, binding=8) coherent buffer Debug
{
int debug[ ];
};

shared int sharedInt;

layout (local_size_x = 16, local_size_y = 1, local_size_z = 1) in;

void main()
{
/////// 1. ///////
sharedInt = debug[0];
memoryBarrierShared();
barrier();
debug[0] = sharedInt[0] + 1;
memoryBarrierShared();
barrier();

// Print debug[0]: 1


/////// 2. ///////
atomicAdd(debug[0], 1);

// Print debug[0]: 16


/////// 3. ///////
sharedInt = debug[0];
memoryBarrierShared();
barrier();
atomicExchange(debug[0], debug[0]+1);
memoryBarrierShared();
barrier();

// Print debug[0]: 1
}

*Just to be clear, I'm not running the above code as it is, I'm commenting out the different options.

The result I'm trying to get for all of them is for debug[0] to equal 16, which I'm printing on the CPU side after the shader has been invoked. I need to use something like the 1st or 3rd option in my simulation as I need to read and write to the SSBO in the same thread. Even though the 2nd option prints out 16, if I try to read from debug[0] it will still be the initial value of 0.

I'm not sure that I'm understanding the role of the shared variable, and as I understand memoryBarrierShared() should make the read and write of sharedInt visible to every thread in the work group, though if I make sure there is only one work group dispatched it is the same result.

Thanks for any help.

ewanRi
08-17-2016, 02:45 AM
I had this cleared up for me a bit with the following, though I'm not sure quite how to do what I'm trying to do.


/////// 1. ///////

// all invocations read debug[0] into the shared variable (presumably 0)
sharedInt = debug[0];

// syncing. but since all wrote the same value into the SSBO, they all would read the same value from it,
// since values written in one invocation are always visible in the same invocation.
memoryBarrierShared();
barrier();

// all invocations do the addition and add 1 to that shared variable (but not write to the shared variable)
// then they all write the result of the addition (1) to the SSBO
debug[0] = sharedInt[0] + 1;

// another syncing that does not help if the shader ends here.
memoryBarrierShared();
barrier();

// since they all write 1, there is no other output possible than a 1 in the SSBO.
// Print debug[0]: 1


/////// 2. ///////
// all invocations tell the "atomic memory unit" (whatever that is exactly)
// to atomicly add 1 to the SSBO.
// that unit will now, sixteen times, read the value that is in the SSBO,
// add 1, and write it back. and because it is does so atomicly,
// these additions "just work" and don't use old values or the like,
// so you have a 16 in your SSBO.
atomicAdd(debug[0], 1);

// Print debug[0]: 16


/////// 3. ///////

// as above, but this has even less effect since you don't read from sharedInt :)
sharedInt = debug[0];
memoryBarrierShared();
barrier();

// all invocations read from debug[0], reading 0.
they all add 1 to the read value, so they now have 1 in their registers.
// now they tell the "atomic memory unit" to exchange whatever there is in
// debug[0] with a 1. so you write a 1 sixteen times into debug[0] and end up with a 1.
atomicExchange(debug[0], debug[0]+1);
memoryBarrierShared();
barrier();

// Print debug[0]: 1