I'm porting my engine to different platforms and working on GL performance, which is far inferior to my d3d implementation.

Inspecting more, I have found that the main performance lags are in uniform buffer map/unmap, check out below screenshot of the PerfStudio result.
Same d3d app (with a simple scene) is about 50% faster than GL, when scene complexity (and so more uniform buffer maps) become higher, I get exponentially lower performance for GL.


my uniform map/unmaps are like this (very much like the d3d calls) :

Code :
glBindBuffer(GL_UNIFORM_BUFFER, buff);
glMapBufferRange(target, 0, size, GL_MAP_INVALIDATE_BUFFER_BIT|GL_MAP_WRITE_BIT);
// memcpy and unmap ...

I have tried different calls for mapping and I couldn't succeed with better results.
Could you give me a hint or something about this issue ? or is this normal for current drivers ?

btw, I'm testing this on v4.2.12002 ATI (5750) drivers