I am trying to read back to host memory something that’s computed by a compute shader, and I want it to be non-blocking. So, I have two threads like this:
Now, assuming I don’t use GL_MAP_COHERENT_BIT, I understand I need glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT). The question is, where does it go to? Does it go to Thread 1 before glFenceSync() or to Thread 2 after glClientWaitSync()?
Will I get non-blocking behaviour at all given that documentation on GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT says it may cause additional synchronization operations?
Finally, will using GL_MAP_COHERENT_BIT give me a coherent memory read in the above case without using a barrier?
The glMemoryBarrier() documentation is too vague, however the one on glBufferStorage() does provide the answer (emphasis mine):
If GL_MAP_COHERENT_BIT is not set and the server performs a write, the application must call glMemoryBarrier with the GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT set and then call glFenceSync with GL_SYNC_GPU_COMMANDS_COMPLETE (or glFinish). Then the CPU will see the writes after the sync is complete.
Therefore, in my example, glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT) goes to Thread 1, before glFenceSync().
Speaking on whether GL_MAP_COHERENT_BIT is sufficient in the context of my example to get rid of that barrier, the answer is yes, according to the same glBufferStorage() documentation:
If GL_MAP_COHERENT_BIT is set and the server does a write, the app must call FenceSync with GL_SYNC_GPU_COMMANDS_COMPLETE (or glFinish). Then the CPU will see the writes after the sync is complete.