Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: Uniform Buffer Objects performance issues

  1. #1
    Intern Newbie
    Join Date
    Jul 2009
    Posts
    38

    Uniform Buffer Objects performance issues

    First off I read this thread http://www.opengl.org/discussion_boa...29-help-needed but since the last post here comes from 2.5 yrs ago I thougt I could add something here. Basically, I am experiencing the same problems as the author of the aformentioned thread. I have a GF 240 GT with some of the latest drivers.

    I render 625 meshes and, obviously, need a world transform matrix for each. Also, there is view proj matrix passed to the shader (this is set only once as it is constant for all objects, so only world transform needs to be updated). Using traditional uniform variables approach I manage to render everything in less than 2ms, which is a little over 500 FPS (before recording the time I call glFinish).

    Now I switched to a constant buffer. When I update the buffer's data with MapBufferRange the performance hurts immensely taking around 120ms to render a frame. On the other hand, when I update the buffer's data with glBufferSubData, the CPU time needed to execute API calls is less than 1ms (!) *but* that is before calling glFinish. After calling glFinish the measured time is around 9ms, which gives 120 FPS or so.

    The thing that bothers me most is the difference in timing taken before and after calling glFinish. If rendering all objects takes less than 1ms and calling glFinish is so expensive I guess OGL is simply buffering all commands. If so then I think it's quite a lot of data to buffer.

    Has anyone ever decided to abandon the use of goold oldie variable uniforms and switched completely to using uniform buffers?

  2. #2
    Intern Contributor Godlike's Avatar
    Join Date
    May 2004
    Location
    Greece
    Posts
    70
    What do you mean by "Now I switched to a constant buffer"? Do you have a single buffer with 625 matrices?

    I've done some benches myself on UBOs and I found them slower that glUniform* for these kind of situations. I will post the results when I go home.

  3. #3
    Intern Newbie
    Join Date
    Jul 2009
    Posts
    38
    Sorry for not being specific. But "switched to a constant buffer" I mean I have a constant buffer which only holds two matrices, world and viewProj. This constant buffer is updated before each draw call. I'm doing it this way to be consistent with DX10/11.

  4. #4
    Senior Member OpenGL Pro
    Join Date
    Jan 2012
    Location
    Australia
    Posts
    1,117
    Has anyone ever decided to abandon the use of goold oldie variable uniforms and switched completely to using uniform buffers?
    No but I will be most interested in your results. From what I have read uniform buffers have to be copied to registers (uniforms) prior to use so they do have an overhead.
    That was from older articles and may be out of date.

    I have been caught out benchmarking with buffering of commands. My assumption is that OpenGL does not
    actually buffer that much but sends commands to the gpu where they get stuck in queues. Certain OpenGL commands require a response from the gpu and that is
    where the driver suspends waiting for the gpu to execute that command. (Certainly that is how channel control programs worked for mainframe front-end processes when I used to write that
    code many eons ago )

  5. #5
    Intern Contributor Godlike's Avatar
    Join Date
    May 2004
    Location
    Greece
    Posts
    70
    Quote Originally Posted by maxest View Post
    Sorry for not being specific. But "switched to a constant buffer" I mean I have a constant buffer which only holds two matrices, world and viewProj. This constant buffer is updated before each draw call. I'm doing it this way to be consistent with DX10/11.
    One last question that actually matters. Do you use a single UBO for all the meshes or one per mesh?

    One UBO for all meshes looks like this I guess:

    Code :
    for mesh in meshes do
        update UBO 0
        draw mesh
    endfor

  6. #6
    Intern Newbie
    Join Date
    Jul 2009
    Posts
    38
    Quote Originally Posted by Godlike View Post
    One last question that actually matters. Do you use a single UBO for all the meshes or one per mesh?

    One UBO for all meshes looks like this I guess:

    Code :
    for mesh in meshes do
        update UBO 0
        draw mesh
    endfor
    Yes, I have only one constant buffer. Moreover, it is set only once so the code should not suffer any redundant API overhead. The only extra function I call for each iteration is glBufferSubData to update data in the constant buffer under slot 0. Basically GL Intercept logs this for each mesh:
    Code :
    glBufferSubData( ??? )
    glDrawElements( ??? ) GLSL=4  Textures[ (0,4) (7,2) ]

  7. #7
    Intern Contributor Godlike's Avatar
    Join Date
    May 2004
    Location
    Greece
    Posts
    70
    Quote Originally Posted by maxest View Post
    Yes, I have only one constant buffer. Moreover, it is set only once so the code should not suffer any redundant API overhead. The only extra function I call for each iteration is glBufferSubData to update data in the constant buffer under slot 0. Basically GL Intercept logs this for each mesh:
    Code :
    glBufferSubData( ??? )
    glDrawElements( ??? ) GLSL=4  Textures[ (0,4) (7,2) ]
    I may have an idea on what is wrong.

    The draw calls are not executed the time you send them. In most implementations they are stacked in a command buffer and the driver decides when to send for execution. When you update the buffer the previous draw call depends on that buffer so the driver cannot mess with it because it will affect the previous draw call. What the driver can do is either wait for the dependency to be resolved (prev draw call is done) or it can create a copy (CopyOnWrite) and the new draw call will use the copy. Both solutions are a bit expensive.

    What you can easily do to test this theory is to use one UBO per drawcall. I bet that you will see improvement.

  8. #8
    Intern Newbie
    Join Date
    Jul 2009
    Posts
    38
    As I said, I'm already using one constant buffer...

  9. #9
    Intern Contributor Godlike's Avatar
    Join Date
    May 2004
    Location
    Greece
    Posts
    70
    And what I am trying to say is that by using one buffer you are not using OpenGL with an optimal way. The sequence you describe has read/write dependency problems because every update to the buffer depends on the previous draw call.

    // Iteration 0
    write UBO 0
    draw mesh 0
    // Iteration 1
    write UBO 0 -> wait for "draw mesh 0" to be done reading the UBO 0
    draw mesh 1
    // Iteration 2
    write UBO 0 -> wait for "draw mesh 0" and "draw mesh 1" to be done reading the UBO 0
    draw mesh 2

    I am no trying to convince you to use something else. But the root of your problem most likely is what I described and if you want to solve it you need different approach.

  10. #10
    Intern Newbie
    Join Date
    Jul 2009
    Posts
    38
    Ough, sorry, I misread your post. I thought you wanted me to use one UBO what I'm already doing. Your idea makes sense, I will give it a try once I'm done with my current work .

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •