How do i give an offset into a persistently mapped Uniform Buffer efficiently?

I have 2 Uniform Buffers for my Light implementation. I want to update both buffers every frame with as little overhead as possible, based on this presentation: Beyond porting.
The uniform storage(s) will be mapped persistently for the lifetime of my program and to ensure i do not overwrite data while it is used or still needed, every frame, I want to write to the uniform storage with an offset of the size of the previous update. when the offset + the upload size are greater than the size of the storage, I reset the offset to 0 and start writing over the old data from previous frames. to really guarantee that this data isn´t used either, i would have a waiting synchronization point there to wait until the memory is stated to be available.

Uploading the data into the buffer works fine, but i am having trouble making the shaders aware of the offset in the Uniform storage. after i uploaded updated data into the uniform storage, and i have both old and the updated data in it, i want the shaders to use the offset of the newest upload to fetch the uniform data from the uniform block. I have already got it to work by using a shader uniform variable for the offset, but it seems somewhat inelegant to me.

maybe there is a better way to give an uniform block an offset into a buffer?

first, here is my code right now:

the actual data of all the lights in my 3D World:


std::vector<glm::vec4> allLightData;
allLightData.resize(MAX_LIGHT_COUNT * 3); //every light can have a maximum of 3 vec4´s of data. 0 - position(x,y,z,w), 1 - color(r,g,b,brightness), 2(only for spotlights) - frustum(xDir, yDir, zDir, angle) 

glCreateBuffers(1, &lightDataBuffer);
glBindBuffer(GL_UNIFORM_BUFFER, lightDataBuffer);

lightDataBuffer_base = OpenGL::uniformBufferBaseCount++; // the uniform binding index i am using later for the shader uniform block
glBindBufferBase(GL_UNIFORM_BUFFER, lightDataBuffer_base, lightDataBuffer);

lightDataBufferSize = sizeof(glm::vec4) * 3 * MAX_LIGHT_COUNT * 3; // 3 times the size of light data i support on my CPU to avoid synchronization
glBufferStorage(GL_UNIFORM_BUFFER, lightDataBufferSize, NULL, GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT);

lightDataPtr = (glm::vec4*)glMapBufferRange(GL_UNIFORM_BUFFER, 0, lightDataBufferSize, GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT); //mapping to this pointer once, updating the storage with it forever


the indices for every light:


std::vector<unsigned int> allLightIndices;

allLightIndices.resize(MAX_LIGHT_COUNT * 2); //every light is represented by 2 indices, the first being the position in the lightData array, the other being the length of data in vec4s 
glCreateBuffers(1, &lightIndexBuffer);
glBindBuffer(GL_UNIFORM_BUFFER, lightIndexBuffer);

lightIndexBuffer_base = OpenGL::uniformBufferBaseCount++;
glBindBufferBase(GL_UNIFORM_BUFFER, lightIndexBuffer_base, lightIndexBuffer);

lightIndexBufferSize = sizeof(unsigned int) * 2 * MAX_LIGHT_COUNT * 3;
glBufferStorage(GL_UNIFORM_BUFFER, lightIndexBufferSize, NULL, GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT);

lightIndexPtr = (unsigned int*)glMapBufferRange(GL_UNIFORM_BUFFER, 0, lightIndexBufferSize,  GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT);


after my allLights array has been updated by my program, i upload the data to the storages like this


static unsigned int prevLightDataCount = 0; 
lightDataBufferOffset += prevLightDataCount; // increment the offset by the size of the last upload
prevLightDataCount = lightDataCount;
if (sizeof(glm::vec4) * (lightDataCount + lightDataBufferOffset) > sizeof(glm::vec4) * 3 * MAX_LIGHT_COUNT) {
	lightDataBufferOffset = 0;

	//wait sync
	glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
}

std::memcpy(&lightDataPtr[lightDataBufferOffset], &allLightData[0], lightDataCount * sizeof(glm::vec4)); // copy CPU memory to GPU mapped pointer


if (lightIndexCount) {
	static unsigned int prevLightIndexCount = 0;
	lightIndexBufferOffset += prevLightIndexCount;
	unsigned int paddedlightIndexCount = lightIndexCount + (lightIndexCount % 4);
	prevLightIndexCount = paddedlightIndexCount;

	if (sizeof(unsigned int) * (paddedlightIndexCount + lightIndexBufferOffset) > sizeof(unsigned int) * 2 * MAX_LIGHT_COUNT) {
		lightIndexBufferOffset = 0;
		glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
		//wait sync
	}

	std::memcpy(&lightIndexPtr[lightIndexBufferOffset], &allLightIndices[0], paddedlightIndexCount * sizeof(unsigned int));

		
}

Then, before drawing, i update the offset uniform in the shaders, which access the uniform block array like this:


//GLSL
uniform unsigned int lightCount;
uniform unsigned int lightDataBufferOffset;
uniform unsigned int lightIndexBufferOffset;

layout(std140) uniform LightDataBuffer{
	vec4 lightData[MAX_LIGHT_COUNT*3*3];
};

layout(std140)uniform LightIndexBuffer{
	vec4 lightIndices[MAX_LIGHT_COUNT/2*3];
};

void main(){
     //the ambient light is always the first vec4 of allLightData (the updated data block)

     vec3 ambientLight = vec3((lightData[lightDataBufferOffset].x), (lightData[lightDataBufferOffset].y), (lightData[lightDataBufferOffset].z));
     //...
}

thank you :slight_smile:

i think you can control if the changed memory will be immediately “visible” to opengl by using GL_MAP_FLUSH_EXPLICIT_BIT:
https://www.khronos.org/opengl/wiki/Buffer_Object#Persistent_mapping
https://www.khronos.org/opengl/wiki/GLAPI/glFlushMappedBufferRange

[QUOTE=stimulate;1286732]…every frame, I want to write to the uniform storage with an offset of the size of the previous update.

…i am having trouble making the shaders aware of the offset in the Uniform storage. … I have already got it to work by using a shader uniform variable for the offset, but it seems somewhat inelegant to me.

maybe there is a better way to give an uniform block an offset into a buffer?[/QUOTE]

Are you looking for glBindBufferRange()? Then your shaders shouldn’t even need to know about this offset.

I want to update both buffers every frame with as little overhead as possible…

On that thread, if you’re using NVidia GL drivers, you might alternatively look at passing in the pointer to the current UBO offset via NVidia’s bindless extensions. In particular, NV_shader_buffer_load.

i have had already tried glBindBufferRange, and i thought it was exactly what i needed. But I could never get it to work, i couldnt tell why, but i kept getting “invalid value”-error.
the binding index has been used before and should be valid, the offset and size made sense and everything…

this is my update function with it. when i take it out i have to pass the offset as a uniform. but i want to update uniforms as rarely as possible, so this would be really convenient.



	static unsigned int prevLightDataCount = 0; 
	lightDataBufferOffset += prevLightDataCount; //apply offset 

	prevLightDataCount = lightDataCount;
	if (sizeof(glm::vec4) * (lightDataCount + lightDataBufferOffset) > sizeof(glm::vec4) * 3 * MAX_LIGHT_COUNT) {
		lightDataBufferOffset = 0;

		//wait sync
		glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
	}
	
	

	std::memcpy(&lightDataPtr[lightDataBufferOffset], &allLightData[0], lightDataCount * sizeof(glm::vec4));

	

	OpenGL::checkOpenGLErrors("OpenGL::uploadLightData()1:");
	if (lightIndexCount) {
		static unsigned int prevLightIndexCount = 0;
		lightIndexBufferOffset += prevLightIndexCount;
		unsigned int paddedlightIndexCount = lightIndexCount + (lightIndexCount % 4);
		prevLightIndexCount = paddedlightIndexCount;
		if (sizeof(unsigned int) * (paddedlightIndexCount + lightIndexBufferOffset) > sizeof(unsigned int) * 2 * MAX_LIGHT_COUNT) {
			lightIndexBufferOffset = 0;
			glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
			//wait sync
		}

		

		std::memcpy(&lightIndexPtr[lightIndexBufferOffset], &allLightIndices[0], paddedlightIndexCount * sizeof(unsigned int));

		
	}
        // here the binding indices (_base) are 5 and 6 (which should be supported), the buffer names are 10 and 11. 
        //In the first cycle the offsets are 0, lightDataCount is 3 for one light + ambientLightColor and lightIndexCount is 2 for one light
	glBindBufferRange(GL_UNIFORM_BUFFER, lightDataBuffer_base, lightDataBuffer, (int)(lightDataBufferOffset * sizeof(glm::vec4)), lightDataCount * sizeof(glm::vec4));
	glBindBufferRange(GL_UNIFORM_BUFFER, lightIndexBuffer_base, lightIndexBuffer, (int)(lightIndexBufferOffset * sizeof(unsigned int)), lightIndexCount * sizeof(unsigned int));
	
	OpenGL::checkOpenGLErrors("OpenGL::uploadLightData():");

and here is how the buffers are created in the first place


        allLights.resize(MAX_LIGHT_COUNT);
	allLightIndices.resize(MAX_LIGHT_COUNT); //LightIndex {uint, uint}
	allLightData.resize(MAX_LIGHT_COUNT * 3 + 1);
	allLightData[0] = glm::vec4(ambientLight, 1.0f);
	lightDataCount = 1;

	
	glGetIntegerv(GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, &uniformBufferAlignSize);
	glGetIntegerv(GL_MAX_UNIFORM_BUFFER_BINDINGS, &uniformBufferBindingCount);
	
	unsigned int mapFlags = GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT;
	unsigned int bufferFlags = mapFlags | GL_DYNAMIC_STORAGE_BIT;
	glCreateBuffers(1, &lightDataBuffer);
	glCreateBuffers(1, &lightIndexBuffer);

	lightDataBuffer_base = OpenGL::uniformBufferBaseCount++;
	glBindBuffer(GL_UNIFORM_BUFFER, lightDataBuffer);
	glBindBufferBase(GL_UNIFORM_BUFFER, lightDataBuffer_base, lightDataBuffer);
	lightDataBufferSize = sizeof(glm::vec4) * 3 * MAX_LIGHT_COUNT * 3;
	glBufferStorage(GL_UNIFORM_BUFFER, lightDataBufferSize, nullptr, bufferFlags);
	lightDataPtr = (glm::vec4*)glMapBufferRange(GL_UNIFORM_BUFFER, 0, lightDataBufferSize, mapFlags);

	OpenGL::checkOpenGLErrors("OpenGL::initLights()1:");

	glBindBuffer(GL_UNIFORM_BUFFER, lightIndexBuffer);
	lightIndexBuffer_base = OpenGL::uniformBufferBaseCount++;
	glBindBufferBase(GL_UNIFORM_BUFFER, lightIndexBuffer_base, lightIndexBuffer);
	lightIndexBufferSize = sizeof(unsigned int) * 2 * MAX_LIGHT_COUNT * 3;
	glBufferStorage(GL_UNIFORM_BUFFER, lightIndexBufferSize, nullptr, bufferFlags);
	lightIndexPtr = (unsigned int*)glMapBufferRange(GL_UNIFORM_BUFFER, 0, lightIndexBufferSize, mapFlags);

	glBindBuffer(GL_UNIFORM_BUFFER, 0);
	OpenGL::checkOpenGLErrors("OpenGL::initLights():");

i actually never used glBindBufferRange(…) myself (yet), but i can remember reading about it here in this forum:

–> the offset parameter must be a multiple of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT
–> the size parameter must be at least GL_UNIFORM_BLOCK_SIZE_DATA

sinze uniform blocks have a very limited size (GL_MAX_UNIFORM_BLOCK_SIZE, at least 16kB in OpenGL 4.5), one could allocate a multiple of that value for only 1 buffer, map that persistently and bind different parts of it for ALL used uniform blocks

example:
layout (std140, binding = 0) uniform MatricesBlock { mat4 someMat4s[xyz] };
layout (std140, binding = 1) uniform LightsBlock { … some lights here … };
layout (std140, binding = 2) uniform WhateverBlock { … some else here … };

then:

glBufferData(GL_UNIFORM_BUFFER, 3 * max_uniform_block_size, NULL, GL_STREAM_DRAW);
glBindBufferRange(GL_UNIFORM_BUFFER, 0, mybuffer, 0 * max_uniform_block_size, max_uniform_block_size);
glBindBufferRange(GL_UNIFORM_BUFFER, 1, mybuffer, 1 * max_uniform_block_size, max_uniform_block_size);
glBindBufferRange(GL_UNIFORM_BUFFER, 2, mybuffer, 2 * max_uniform_block_size, max_uniform_block_size);

max_uniform_block_size should be GL_MAX_UNIFORM_BLOCK_SIZE, rounded up to a multiple of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT (if its not already a multiple of it)

Thanks alot! it works now, it took me a while to figure out that by GL_UNIFORM_BLOCK_SIZE_DATA the size of my individual uniform block was meant and not some constant.