I am currently porting a part of the engine to use Uniform Buffer objects, but I have some performance questions.

For the current version I just placed all my uniforms into a single big uniform buffer per object and simply update the whole buffer each frame. This is just the first step to get things running.

I was first using glMapBuffer and glUnmapBuffer to copy the new uniform buffer data to the OpenGL Buffer Object. I tried with both GL_DYNAMIC_DRAW and GL_STREAM_DRAW. I was running into rather severe CPU issues while mapping the buffer. I read in the following thread:
http://www.opengl.org/discussion_boa...rm+buffer+slow (bottom) to use glBufferData with the actual buffer data instead of NULL and than it runs a lot faster. I still have to properly compare my CPU timings between the old version using uniforms and new version using uniform buffer object, but I expect that uniform buffer object should run faster when I have properly split my uniform buffers into good logical sets.

But when I now compare the result of rendering about 1024 spheres into my G-Buffer the overall GPU time has gone up a bit. From around 13-14 msec to 17-18 msec. I am using GPU queries to measure the timings. Is it normal that rendering with uniform buffer object is slower than using uniforms directly? I read somewhere that the uniform buffers are stored in device global memory and then copied into device local memory when they are bound, so perhaps this could explain the slowdown on the GPU side itself. Otherwise I don't see any real reason why rendering with uniform buffers should be slower on the GPU side.

I tried not updating the uniform buffers anymore and then my overall time with uniform buffers goes down to 15-16 msec, which is still not as fast as not using uniform buffers. So I guess splitting the uniform buffers into more logical units and less frequent updates also won't help then on the GPU side.

So the question is, are these results normal or am I doing something wrong somewhere?

Kind Regards,