Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 6 of 6

Thread: Regarding performance comparison (R2VB vs TF)

  1. #1
    Advanced Member Frequent Contributor
    Join Date
    Mar 2009
    Location
    Karachi, Pakistan
    Posts
    810

    Regarding performance comparison (R2VB vs TF)

    Hi all,
    I am trying to do a cloth simulation and I have two versions
    1) using render to vertex buffer that uses fragment shader for verlet integration and

    2) using transform feedback that uses vertex shader for verlet integration.

    I tried to compare the performances of the two and in my tests using a 2D mesh grid ranging from 64x64 to 2048x2048, for small mesh sizes TF is around 1.25-1.5x faster however for larger meshes, R2VB is 1.5-2x faster than TF. Here are my stats on my NVIDIA Quadro FX 5800. All times are msecs per frame calc. using timer query as detailed below.
    Code :
    +------------+-------------+---------------+
    |  Grid size |    R2VB     |       TF      |
    +------------+-------------+---------------+
    |  64 x 64   | 0.370-0.376 |   0.088-0.090 |
    +------------+-------------+---------------+
    | 128 x 128  | 0.403-0.431 |   0.238-0.240 |
    +------------+-------------+---------------+
    | 256 x 256  | 0.713-0.758 |   0.804-0.806 |
    +------------+-------------+---------------+
    | 512 x 512  | 2.100-2.308 |   3.090-3.096 |
    +------------+-------------+---------------+
    |1024 x 1024 | 7.670-9.250 | 12.205-12.209 |
    +------------+-------------+---------------+
    |2048 x 2048 |31.800-32.39 | 48.240-48.560 |
    +------------+-------------+---------------+
    Is this an expected output or am i doing something wrong in timing calc.
    This is how i calc. my times
    1) For R2VB:
    Code :
    glBeginQuery(GL_TIME_ELAPSED,t_query);
    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fboID[writeID]);	
       //bind verlet integration fragment shader
       //draw full screen quad
    glFlush();
    //read back the results into the VBO
    glBindFramebuffer(GL_READ_FRAMEBUFFER, fboID[readID]);
    glReadBuffer(GL_COLOR_ATTACHMENT0); 			
    glBindBuffer(GL_PIXEL_PACK_BUFFER,vboID); 
    glReadPixels(0, 0, texture_size_x, texture_size_y, GL_RGBA, GL_FLOAT, 0); 
    glFlush();
    glFinish();
    glEndQuery(GL_TIME_ELAPSED);
     
    //get the elapsed time
    glGetQueryObjectui64v(t_query, GL_QUERY_RESULT, &elapsed_time);

    1) For TF:
    Code :
    glBeginQuery(GL_TIME_ELAPSED,t_query);
    glBeginTransformFeedback(GL_POINTS);
       glDrawArrays(GL_POINTS, 0, total_points);
    glEndTransformFeedback();
    glFlush();
    glEndQuery(GL_TIME_ELAPSED);				
     
    //get the elapsed time
    glGetQueryObjectui64v(t_query, GL_QUERY_RESULT, &elapsed_time);
    I was expecting TF to be a better option since it did not involve any readback but the performance results are the opposite. Any ideas?
    Regards,
    Mobeen

  2. #2
    Junior Member Regular Contributor tksuoran's Avatar
    Join Date
    Mar 2008
    Location
    Cambridge, UK
    Posts
    223

    Re: Regarding performance comparison (R2VB vs TF)

    I don't think either flushes or readpixels should be involved. Why do you have those?

  3. #3
    Member Regular Contributor
    Join Date
    Dec 2009
    Posts
    251

    Re: Regarding performance comparison (R2VB vs TF)

    I guess that the R2VB shaders make better use of the cache, thus they perform better on large grids.

    Nvidia GPUs process vertex and fragment shaders in groups of 32 parallel threads per core, but the vertex shaders are probably scheduled in groups of 32x1 and the fragment shaders in groups of 8x4.

    If you access neighbouring points in your shaders, you'll get much more overlap (and thus cache hits) in the fragment shaders.

  4. #4
    Advanced Member Frequent Contributor
    Join Date
    Mar 2009
    Location
    Karachi, Pakistan
    Posts
    810

    Re: Regarding performance comparison (R2VB vs TF)

    Quote Originally Posted by tksuoran
    I don't think either flushes or readpixels should be involved. Why do you have those?
    Hi tksuoran,
    I have omitted the vbo part here. Actually readpixels copies the content to the vbo. If there is no readpixels, how would the data be copied to the vbo? And the reason the glFlush is needed here is because readpixels is async, it should be finished before I can read the time.

    Quote Originally Posted by mbentrup
    If you access neighbouring points in your shaders, you'll get much more overlap (and thus cache hits) in the fragment shaders.
    Thanks for the insights mbentrup. So I think if I reorder the way I access the neighbors in vertex shader I might get better performance. My current neighbor access stencil favours the fragment shaders more I think.
    Two more questions:
    1) Is there a way to evaluate the number of cache hits/misses for GLSL shaders?
    2) Where do u get this information the number of parallel units beings executed in vs (32x1) / fs(8x4) ? Is there a document/manual that lists those?
    Regards,
    Mobeen

  5. #5
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    989

    Re: Regarding performance comparison (R2VB vs TF)

    Quote Originally Posted by mobeen
    And the reason the glFlush is needed here is because readpixels is async, it should be finished before I can read the time.
    Timer queries measure GPU time so as you use a pixel pack buffer the GPU time is indifferent whether you do a flush or not. In general, no glFlush or glFinish is needed anytime when you use timer queries.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  6. #6
    Advanced Member Frequent Contributor
    Join Date
    Mar 2009
    Location
    Karachi, Pakistan
    Posts
    810

    Re: Regarding performance comparison (R2VB vs TF)

    OK I removed the glFlush/glFinish and now the times are reduced to around 10-20 msecs for TF whereas for R2VB they are reduced to around half of the value for small grid size (<=512) and for large grid size (>512) by around 0.5 msecs.
    Regards,
    Mobeen

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •