atomic counter buffer mapping

I am encountering some strange behaviour when working with atomic buffers.
If i’m only Writing to them while creating them with GL_DYNAMIC_DRAW and mapping them with GL_MAP_WRITE they work perfectly.
If i’m trying to also read what is in them, by creating them with GL_DYNAMIC_COPY and mapping with GL_MAP_READ | GL_MAP_WRITE i’m getting a varying mapping time. The time increases in a linear manner with the number of fragments that access the atomic counter gpu side.

App working fine, no gl errors.
I really don’t understand this difference in behaviour.

Buffer creation (once):


glGenBuffers(1, &atomicbuffer);
glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER,0, atomicbuffer);
glBufferData(GL_ATOMIC_COUNTER_BUFFER, sizeof(GLuint), 0, GL_DYNAMIC_COPY);

Buffer usage (per frame):


glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, atomicbuffer);
GLuint* ptr = (GLuint*)glMapBufferRange(GL_ATOMIC_COUNTER_BUFFER, 0, sizeof(GLuint),GL_MAP_WRITE_BIT | GL_MAP_READ_BIT); <------------------ not ok speed
//GLuint* ptr = (GLuint*)glMapBufferRange(GL_ATOMIC_COUNTER_BUFFER, 0, sizeof(GLuint),GL_MAP_WRITE_BIT);                           <------------------- OK speed, comment'd
memory_fragmentcount = ptr[0];                                                                                                                                        
memory_necessary = memory_fragmentcount*FRAGMENTSIZE;
ptr[0] = 0; 
glUnmapBuffer(GL_ATOMIC_COUNTER_BUFFER); 
glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, unit_atomic, atomicbuffer);

Btw, there is no difference whatsoever in using glMapBuffer on the entire buffer or glMapBufferRange.

Really, if it would be a sync problem (more fragments/hardware units/etc) modifying the same resource (the atomic buffer) shouldn’t the writing be more problematic/slow or at the very least equally bad?

None of what you say is at all surprising.

If you map a buffer for writing (meaning that reading from it is undefined), and OpenGL is doing write operations (which atomic counter ops are), then there is no reason for OpenGL to delay your writes. You are clearly saying, “I don’t care about the results of the atomic counter operation.” If you did care, you’d be reading the values.

Therefore, the driver will realize that you’re doing writes to a buffer that OpenGL is using, so it’ll just allocate a scratch piece of RAM for you to write to. Once everything gets synchronized and it comes time to execute this command, it’ll copy what you’ve written from that scratch RAM into the actual buffer.

I will assume that you’re using a proper memory barrier here (because without calling glMemoryBarrier, your read or write operations will produce undefined behavior). If you called [var]glMemoryBarrier(GL_BUFFER_UPDATE_BARRIER_BIT​)[/var], then you’re forcing OpenGL to perform a full synchronization if you try to read from the buffer. Like, say, mapping it for reading.

If i’m trying to also read what is in them, by creating them with GL_DYNAMIC_COPY and mapping with GL_MAP_READ | GL_MAP_WRITE i’m getting a varying mapping time.

That’s not what COPY means. COPY means “I will neither be reading from nor writing to the buffer”. It means that only OpenGL operations will upload data to it (transform feedback, image or atomic counter writes, etc), and only OpenGL operations will read from it (buffer texture sampling, vertex array reads, etc).

You should pick either READ or DRAW if you want to perform a read/modify/write operation.

Thanks Alfonse.
Yes, using proper barrier.
Reread spec at copy, really just a massive blunder on my part, need to be more sober when coding:)