Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 3 123 LastLast
Results 1 to 10 of 24

Thread: UBO poor performance [GL 3.1]

  1. #1
    Junior Member Newbie
    Join Date
    May 2009
    Posts
    18

    UBO poor performance [GL 3.1]

    I try use UBO, but i have poor performance with him.

    Code w/o UBO:

    Code :
    mat4 matLocal = ...;
    mat4 matMVP = ...;
    vec2 uvBase = ...;
    vec2 perlinMovement = ...;
    vec3 localEye = ...;
    glUniformMatrix4fv(uniform_matLocal, 1, false, matLocal);
    glUniformMatrix4fv(uniform_matMVP, 1, false, matMVP);
    glUniform2fv(uniform_uvBase, 1, uvBase);
    glUniform2fv(uniform_perlinMovement, 1, perlinMovement);
    glUniform3fv(uniform_localEye, 1, localEye);

    Code w/ UBO:

    Code :
    struct BlockPerBatch
    {
    	mat4 matLocal;
    	mat4 matMVP;
    	vec2 uvBase;
    	vec2 perlinMovement;
    	vec3 localEye;
    };
     
    BlockPerBatch blockPerBatch;
     
    glBindBuffer(GL_UNIFORM_BUFFER, ubo_BlockPerBatch); // once for all batches
     
    ...
     
    blockPerBatch.matLocal = ...;
    blockPerBatch.matMVP = ...;
    blockPerBatch.uvBase = ...;
    blockPerBatch.perlinMovement = ...;
    blockPerBatch.localEye = ...;
    glBufferData(GL_UNIFORM_BUFFER, sizeof(blockPerBatch), &blockPerBatch, GL_DYNAMIC_DRAW);

    Shader:

    Code :
    #version 140
     
    ...
     
    uniform BlockPerBatch
    {
    	mat4 matLocal;
    	mat4 matMVP;
    	vec2 uvBase;
    	vec2 perlinMovement;
    	vec3 localEye;
    };
     
    ...

    w/o UBO - ~250 FPS
    w/ UBO - ~225 FPS

    GeForce 9600GT
    Win7 Driver 190.89
    OpenGL 3.1

    What i do wrong?

  2. #2
    Super Moderator Frequent Contributor Groovounet's Avatar
    Join Date
    Jul 2004
    Posts
    936

    Re: UBO poor performance [GL 3.1]

    Do you reallocate your buffer at each frame?

    UBO comes well together with MapBufferRange or MapBuffer and apparantely even glBufferSubData would be faster.

    Have a look on the MapBufferRange API, that's THE way to go!

  3. #3
    Senior Member OpenGL Pro
    Join Date
    Sep 2004
    Location
    Prombaatu
    Posts
    1,401

    Re: UBO poor performance [GL 3.1]

    ... Also be sure to group by frequency of update. E.g. Per-frame, per-sector, per-object, per-culator, per-fume, ....

  4. #4
    Junior Member Newbie
    Join Date
    May 2009
    Posts
    18

    Re: UBO poor performance [GL 3.1]

    Quote Originally Posted by Groovounet
    Do you reallocate your buffer at each frame?

    UBO comes well together with MapBufferRange or MapBuffer and apparantely even glBufferSubData would be faster.

    Have a look on the MapBufferRange API, that's THE way to go!
    In example from spec used glBufferData:

    Code :
        void render()
        {
            glClearColor(0.0, 0.0, 0.0, 0.0);
            glClear(GL_DEPTH_BUFFER_BIT|GL_COLOR_BUFFER_BIT);
     
            glUseProgram(prog_id);
     
            glEnable(GL_DEPTH_TEST);
            glMatrixMode(GL_MODELVIEW);
            glLoadIdentity();
            glTranslatef(0, 0, -4);
            glColor3f(1.0, 1.0, 1.0);
            glBindBuffer(GL_UNIFORM_BUFFER, buffer_id);
            //We can use BufferData to upload our data to the shader,
            //since we know it's in the std140 layout
            glBufferData(GL_UNIFORM_BUFFER, 80, colors, GL_DYNAMIC_DRAW);
            //With a non-standard layout, we'd use BufferSubData for each uniform.
            glBufferSubData(GL_UNIFORM_BUFFER_EXT, offset, singleSize, &colors[8]);
            //the teapot winds backwards
            glFrontFace(GL_CW);
            glutSolidTeapot(1.33);
            glFrontFace(GL_CCW);
            glutSwapBuffers();
        }

    SubData only for update one uniform in block.
    I try glBufferSubData for all - fps is equal glBufferData.

    Quote Originally Posted by Brolingstanz
    ... Also be sure to group by frequency of update. E.g. Per-frame, per-sector, per-object, per-culator, per-fume, ....
    I sure...

  5. #5
    Super Moderator Frequent Contributor Groovounet's Avatar
    Join Date
    Jul 2004
    Posts
    936

    Re: UBO poor performance [GL 3.1]

    Really glBufferSubData and glBufferData are not good solutions.
    This sample works but it's a not point to use in real applications. Calling glBufferSubData for a single data update would be worth that using glUniform* which is still possible to do within a uniform buffer.
    Calling glBufferData is like a "C++ new" with OpenGL, you don't want to do so to upload you data!

    Create and allocate the buffer once with glBufferData but update with the MapBufferRange API. Parallel, async and a fine grain control.

    You can actually use a single buffer to pack all your "block per" kind of data as far as you keep the uniforms group together.

    Example of a single uniform buffer:
    128 bytes Per-frame
    64 bytes Per-object
    16 bytes Per-batch

    Don't forget that GPU have a memory bust size with a minimun of 64 bytes usually, there is a balance to find to reach a good granularity and that's why I like the single grouped uniforms buffer approached.

    And then you can have just the right amount of byte pick up and update with MapBufferRange even in parallel as far as it doesn't overlap.

    Even If you have some huge amount of uniforms you could use several CPU threads to update the buffer data per block and send those data as you go in parallel.



  6. #6
    Junior Member Newbie
    Join Date
    May 2009
    Posts
    18

    Re: UBO poor performance [GL 3.1]

    I try MapBufferRange later, tnx...

    I have update drivers to 191.07 WHQL:

    w/o UBO - ~250 FPS
    w/ UBO - ~240 FPS

    Result is better...

  7. #7
    Super Moderator Frequent Contributor Groovounet's Avatar
    Join Date
    Jul 2004
    Posts
    936

    Re: UBO poor performance [GL 3.1]

    Humm

    What's you result with glUniform?

  8. #8
    Advanced Member Frequent Contributor
    Join Date
    Apr 2003
    Posts
    652

    Re: UBO poor performance [GL 3.1]

    Groovounet:
    Do you have data to backup your claim that MapBufferRange is faster than glBufferData for _small_ buffers?

    I'm using two UBOs to store per-View and per-Object matrices. These two buffers are not bigger than 240Bytes each. Whenever I need to change one of them, I upload the whole contents via glBufferData. This gives the driver a hint "the old data is no longer needed", and if the old contents is stll in use, it might use a double-buffer scheme internally to not stall the pipeline.

  9. #9
    Member Regular Contributor
    Join Date
    Oct 2006
    Posts
    349

    Re: UBO poor performance [GL 3.1]

    You can achieve the same effect by calling glBufferData(null) and glMapBuffer.

    I've tested on a few different drivers, and there's no clear winner between glBufferData and glMapBuffer. The only significant difference occurs when streaming data, where MapBuffer pull ahead (i.e. it allows you to write directly to the mapped region, and avoid allocating a temporary client-side buffer).

  10. #10
    Super Moderator Frequent Contributor Groovounet's Avatar
    Join Date
    Jul 2004
    Posts
    936

    Re: UBO poor performance [GL 3.1]

    I never considered that glBufferData could not stall actually. How does glBufferSubData affect your performances?

    I have seen MapBufferRange with quite large buffers that's why I pack everything in a single buffer, to keep it large enough.
    I quite assume that the MapBufferRange "access" parameter give the hits to the drivers.

    For small buffer ... When you get all your uniforms in single uniform buffer it's not that small ...

    (PS: I'm going to digg a bit more on this topic, I'll let you know with numbers!)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •