Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: The order in which a model vertexes are loaded, influences the performances?

  1. #1
    Junior Member Newbie
    Join Date
    Jan 2016
    Posts
    15

    The order in which a model vertexes are loaded, influences the performances?

    Hi everyone

    I'm trying to figure out a strange behaviour of my code. 2 Models almost equal that differ in specifying their vertexes in a spatially sorted and unsorted manner, shows very different performances.


    I got 2 models:
    1) let's called it 'original' made of about 640K vertexes.
    2) a LOD of the original. I obtain it by an octree subdivision...it has almost 600K vertexes.

    Most important, I'm performing a point based rendering, using point sprites (oriented disc taken from a texture atlas). Every vertex has its normal and color that are loaded in 3 different VBOs.

    My code is quite simple:

    activate depth test
    activate alpha test (set function)

    //render loop
    clear color and depth buffer
    activate shaders
    do the rendering (point_sprite etc etc render call etc etc) -> then up to the vert and frag shaders
    disable shaders
    glutPostRedisplay();
    glutSwapBuffers();


    Recording in a benchmark the time needed for execute each frame I noticed that the Original model perfom better than its LOD...2 times better at least...
    When loading the two different models the procedure is exactly the same and nothing else changes...
    Playing around with the code I noticed that my LOD has the vertexes specified in a certain spatial order (as result of the octree subdivision) while the original no....
    If I shuffle the vertexes positions in my LOD the result is that their performances are comparable as one would expect...


    I made further tries and I noticed that by disabling the depth test, the LOD model appears more 'consistent', like its front face is compact, while the original is far more fuzzy...
    is it because the first (or last) vertexes drawn are contiguous?
    Many inevitable artifacts affect my rendering like aliasing...I was wondering If some of these issues may discard many fragments and lead to a faster rendering...while the LOD being sorted is less affected by them but slows down...

    If you can help I would be very grateful (it's an urgent matter)

    best regards

  2. #2
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,526
    One possible factor: if primitives overlap, that imposes an ordering dependency. Primitives must either be processed in the order in which they are specified, or the implementation must at least ensure that the result is as if they had been processed in order. If primitives don't overlap, then the result is trivially independent of the processing order, which makes processing easier to parallelise.

  3. #3
    Junior Member Newbie
    Join Date
    Jan 2016
    Posts
    15
    Thank you for the reply.

    Anyway I don't have really understand the following:

    Primitives must either be processed in the order in which they are specified, or the implementation must at least ensure that the result is as if they had been processed in order
    What do you mean precisely by ordering dependency? How that may affect parallelization? (my data is load in the vbo contiguously as 3 float (xyz) for each vertex)
    In the first place I though this was related to depth testing...but it seems it is not...

    I have only point sprites, without any connection and their size is manipulated in the vertex shader by glPointSize. Anyway I think that inevitably most of the time points overlap, especially because that's the behaviour I pursue and implemented (point should be large enough to reconstuct the surface they belong to).

  4. #4
    Member Regular Contributor
    Join Date
    Jul 2012
    Posts
    429
    The loading order of the vertices does not have impacts on the rendering. However, the order in which you send these vertices to GL has, specially if depth testing is enabled. Using any spatial partitioning structures should normally help to send these vertices in a more coherent order so that the rendering will be (normally) improved compared to an erratic order of vertices. Since you are using octrees, how the vertices are grouped inside nodes is relevant, as GClements said.

    But since you render sprites, I would more go for a wrong send-order of the vertices to GL.

  5. #5
    Junior Member Newbie
    Join Date
    Jan 2016
    Posts
    15
    Ty too Silence.

    Why do you think that in case of sprites, a sparse order would be better?
    Anyway why exactly a spatial coerence should help performances?

  6. #6
    Member Regular Contributor
    Join Date
    Jul 2012
    Posts
    429
    You will generally want to draw in 'front-to-back' order in order to let the hardware discards z-failed fragments as soon as possible. More you have occlusion, and more this is true.

    One thing you can try to ensure this is to rotate your camera around the scene (and thus, or to go threw your tree in another order) and see if (and how) the framerate will change.

  7. #7
    Junior Member Newbie
    Join Date
    Jan 2016
    Posts
    15
    This makes sense to me, but why I have the same behaviour even if I disable depth test?

  8. #8
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,526
    Quote Originally Posted by Apollonio View Post
    What do you mean precisely by ordering dependency? How that may affect parallelization?
    If a draw call renders multiple primitives which modify a given pixel, the pixel's value at the end of the draw call must be that resulting from the last primitive which included that pixel.

    If depth tests are enabled, the rendering order still matters in cases where both primitives have the same depth value for the pixel (if the depth comparison is GL_LESS or GL_GREATER, the second primitive will fail the test and the value from the first primitive will be used; if the comparison is GL_LEQUAL or GL_GEQUAL, the second primitive will pass the depth test and the value from the second primitive will be used).

    Additionally, depth tests and blending involve a read-modify-write operation on the framebuffer. For each primitive, the value read must be that written by the preceding primitive.

    But if two primitives can easily be determined not to overlap, then none of this matters. The two primitives can be rendered in either order or in parallel, which may allow for higher utilisation of the GPU.

  9. #9
    Member Regular Contributor
    Join Date
    Jul 2012
    Posts
    429
    I read your first post certainly badly, making me believed that the issue disappeared when you disable depth testing, which obviously was not the case. So definitely, what I said above was not relevant for you...

    So here are my other two cents, not sure if that could help you:

    How things are going if you disable alpha testing ?
    Do you have the same number of draw calls with and without the octree ?
    Same question for buffers bindings, shader bindings, uniform sendings ?
    Do you make use of transparency (blending) or only alpha testing ?
    Do the alpha testing operation directly in the fragment shader (and discard the fragment if appropriate)

  10. #10
    Junior Member Newbie
    Join Date
    Jan 2016
    Posts
    15
    for GCLements
    So in both cases, sparse data would perform better. So theorically you expect the worst case to be a model with N points with the same coordinates? with and without depth test?
    Anyway how this might reconnect to my case, where a shuffle of data position (before upload to vbo, different indexing during draw doesn't work) alter significantly the perfomances?

    for Silence.
    Let's say i'm definetively more interested in depth test, cause i'm gonna use it, and i should explain the difference while i use it in performances...disabling it, was just a try...
    - alpha test don't change much
    - i debugged with and horrible cout, it prints the same number of points (at least the numbers given to drawarrays...)
    - both the approach have identical conditions. They execute the same code.
    - no only alpha (i need it to make the sprite represent a circle, and cut out the border)
    - no it is from the 'fixed pipeline test'. I don't perform it on shaders..only backface culling, that is now off for these tests:
    glEnable(GL_ALPHA_TEST);
    glAlphaFunc(GL_GREATER, 0.1);

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •