Do I need to call it before reads or after writes (it’s not clear from the api tbh, as it says memory accesses) on SSBOs/Images?
You issue the barrier on the source invocation. The barrier causes memory accesses to become visible to subsequent invocations (or later parts of the current one, if you just need ordering).
Is this scoped on the shader invocation only or is this valid between shader invocations.
They’re scoped on the kind of memory access. If you use memoryBarrierImage
, then any writes via image load/store will become visible to subsequent readers in subsequent invocations (or later in the current one). If you use memoryBarrier
, then all writes from anywhere will become visible.
The only one that scopes based on the type of invocation is groupMemoryBarrier
, which is like memoryBarrier
, but only within a compute shader workgroup. So any changes you have made via any operation are visible to others in the work group, but not necessarily to anyone else. memoryBarrierShared
technically is scoped to a work group, but only because shared
variables are work group local.
since images are coherent
Images are not coherent. Furthermore:
since images are coherent aren’t reads and writes synchronized?
Coherency only applies to visibility. Atomic operators perform a read/modify/write as a single, atomic action. That requires ordering. Coherency has nothing to do with ordering; it only means that writes will be visible. Coherency does not say anything about the order of such visibility.
If you do two writes to the same memory, even coherently, even in the same invocation, you have no guarantees that the second will happen after the first. You need a memory barrier to provide ordering.
Atomic operations not only require ordering, but locking. They have to ensure that nothing can happen to that memory after the read and before the write. Again, coherency is insufficient to ensure that.
why are there no atomic operations on floats?
1: Because some hardware doesn’t support it. Floating point math is very complex. Being able to perform a read/increment/write on IEEE-754 floats is non-trivial, much more difficult than for two’s complement integers.
2: There are extensions for hardware that can.