First off I read this thread but since the last post here comes from 2.5 yrs ago I thougt I could add something here. Basically, I am experiencing the same problems as the author of the aformentioned thread. I have a GF 240 GT with some of the latest drivers.

I render 625 meshes and, obviously, need a world transform matrix for each. Also, there is view proj matrix passed to the shader (this is set only once as it is constant for all objects, so only world transform needs to be updated). Using traditional uniform variables approach I manage to render everything in less than 2ms, which is a little over 500 FPS (before recording the time I call glFinish).

Now I switched to a constant buffer. When I update the buffer's data with MapBufferRange the performance hurts immensely taking around 120ms to render a frame. On the other hand, when I update the buffer's data with glBufferSubData, the CPU time needed to execute API calls is less than 1ms (!) *but* that is before calling glFinish. After calling glFinish the measured time is around 9ms, which gives 120 FPS or so.

The thing that bothers me most is the difference in timing taken before and after calling glFinish. If rendering all objects takes less than 1ms and calling glFinish is so expensive I guess OGL is simply buffering all commands. If so then I think it's quite a lot of data to buffer.

Has anyone ever decided to abandon the use of goold oldie variable uniforms and switched completely to using uniform buffers?