I just added uniform block support, but the performance is terrible. I'm sharing a UBO with multiple programs. On the plus side, if a uniform changes infrequently, I need to update it just once and every program gets it. The problem is that some uniforms (e.g. model view matrix) need to be updated in the UBO for every draw call. This creates an implicit sync and the driver has to block quite a bit.

I tried glMapNamedBufferRangeEXT with GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT, but the performance is still in the toilet.

Is there a way to make shared uniform blocks fast for frequently changing uniforms?

Thanks for reading.