I've already seen the "VBOs strangely slow?" thread. I've read through it twice, it all makes sense. However, my problem is a bit different in that it's dealing with UBOs and not VBOs. Although, behind the scenes are they entirely the same?

glMapBufferRange makes good sense to me, as it seems to mimic (for the most part) what DirectX has always had in terms of buffer object locking/unlocking.

The simple case that I currently have working is a shader program with a constant block that's updated via a UBO. The constant block is structured as follows:

Code :
uniform DF_GLOBALS
{
    mat4    WorldView;
    mat4    WorldViewProj;
 
    vec4    BackBufferInfo;
    vec3    CameraInfo;
    vec2    ViewportInfo;
};

As you can see, there's a few hundred bytes of data there.

However, the problem is largely with glMapBufferRange and to a smaller degree, glUnmapBuffer.

I'm currently mapping the entire buffer and using the following flags to request that the driver discard the possibly in use buffer memory, and hand me back a pointer to new memory, if necessary:

Code :
GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT

That call to glMapBufferRange alone is taking just under 1 millisecond. Maybe (hopefully) I'm doing something wrong?

Using either GL_DYNAMIC_DRAW or GL_STREAM_DRAW at buffer creation time makes no difference.

The old school method of glBufferData( NULL ) to discard in conjunction with glMapBuffer does help quite a bit (relatively speaking). But even then, it's still around the 0.1 to 0.2 ms range, per update.

This becomes unbearably slow very quickly if I try to draw many objects which their WorldView and/or WorldViewProj matrices updated. In that case I'm making many calls to glMapBufferRange per frame. Is there a better way I should be doing that?

Hardware is ATI HD4850 with latest drivers.

Any ideas? Thanks.