Usage of GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT

Hi,

I am trying to read back to host memory something that’s computed by a compute shader, and I want it to be non-blocking. So, I have two threads like this:

========== Thread 1 ===========

glDispatchCompute(…);

syncObj = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);

========== Thread 2 ===========

glClientWaitSync(syncObj, 0, ~GLuint64(0));

memcpy(…, mappedSsboMemory, …);

===========================

Now, assuming I don’t use GL_MAP_COHERENT_BIT, I understand I need glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT). The question is, where does it go to? Does it go to Thread 1 before glFenceSync() or to Thread 2 after glClientWaitSync()?

Will I get non-blocking behaviour at all given that documentation on GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT says it may cause additional synchronization operations?

Finally, will using GL_MAP_COHERENT_BIT give me a coherent memory read in the above case without using a barrier?

OK, I’ve found the answers to my own questions.

The glMemoryBarrier() documentation is too vague, however the one on glBufferStorage() does provide the answer (emphasis mine):

If GL_MAP_COHERENT_BIT is not set and the server performs a write, the application must call glMemoryBarrier with the GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT set and then call glFenceSync with GL_SYNC_GPU_COMMANDS_COMPLETE (or glFinish). Then the CPU will see the writes after the sync is complete.

Therefore, in my example, glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT) goes to Thread 1, before glFenceSync().

Speaking on whether GL_MAP_COHERENT_BIT is sufficient in the context of my example to get rid of that barrier, the answer is yes, according to the same glBufferStorage() documentation:

If GL_MAP_COHERENT_BIT is set and the server does a write, the app must call FenceSync with GL_SYNC_GPU_COMMANDS_COMPLETE (or glFinish). Then the CPU will see the writes after the sync is complete.

Thx for the information! That was something I was curious about myself.