Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: GL_EXT_transform_feedback + lod selection + frustum culling = slow

  1. #1
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124

    GL_EXT_transform_feedback + lod selection + frustum culling = slow

    Hello,
    I'm using a vertex and geometry shader to do lod selection and frustum culling into a buffer bound for transform feedback (GL_EXT_transform_feedback). I'm subsequently using that buffer as the instance source for a glDrawElementsInstanced() call, but I've currently got that bit disabled so I can specifically profile the lod selection and frustum culling stage.
    The input attributes to the vertex shader are a 4x3 matrix (4*vec4) and a bounding sphere (1*vec4).
    The input uniforms to the vertex shader are camera position (vec3) and frustum planes (6*vec4).
    The vertex shader does the cull/lod tests, and outputs the 4x3 matrix and a 'visible' flag to the geometry shader.
    The geometry shader only emits 'points' if the vertex shader 'visible' output is 1.

    Problem is, I'm only getting approx 57 million points processed per second on a quadro 4000.

    Question is, is there some performance trick/caveat I should be aware of when doing this sort of thing?
    Note that the code is *not* doing the GL_QUERY_RESULT (so not a stall problem) or the glDrawElementsInstanced() (so not related to the instancing API).

    Thanks for any advice offered.


  2. #2
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    947
    Geometry shaders do usually introduce some performance penalty, especially with transform feedback on NVIDIA cards, as far as I can tell.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  3. #3
    Senior Member OpenGL Pro Ilian Dinev's Avatar
    Join Date
    Jan 2008
    Location
    Watford, UK
    Posts
    1,262
    Try
    glEnable(GL_RASTERIZER_DISCARD);
    during the transform-feedback.

    Your gpu isn't old, so cannot be plagued by GF8800-type geometry-shader slowness.

  4. #4
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    947
    Quote Originally Posted by Ilian Dinev View Post
    Your gpu isn't old, so cannot be plagued by GF8800-type geometry-shader slowness.
    Yes, maybe I'm wrong, don't really know what generation is the quadro 4000. I should look it up.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  5. #5
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124
    Thanks guys, I see no other way of accelerating instancing without geometry shaders. I have discard enabled, sorry I should have said. I thought maybe there'd be some buffer flag set incorrectly or something but I've tried stream_draw, static_draw etc. but makes no difference. I have to say, this whole transform feedback extension saga (spec repo is awash with em!) looks like a bad joke.

  6. #6
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,893
    Quote Originally Posted by aqnuep View Post
    Yes, maybe I'm wrong, don't really know what generation is the quadro 4000. I should look it up.
    Looks like GTX480 era (GF100), so a couple generations after GeForce 8.

  7. #7
    Member Regular Contributor malexander's Avatar
    Join Date
    Aug 2009
    Location
    Ontario
    Posts
    257
    The Quadro 4000 isn't a very fast card, though -- it's a GF100 with only half the shaders enabled (256) running at a lowly 475MHz (GEforce 480 is 480 shaders @700Mhz). 57 million pnts/sec doesn't sound overly unreasonable to me, having using a 4000 for quite a while.

  8. #8
    Senior Member OpenGL Pro Ilian Dinev's Avatar
    Join Date
    Jan 2008
    Location
    Watford, UK
    Posts
    1,262
    Could you test agnuep's demo, that does exactly the same:

    http://rastergrid.com/blog/2010/02/i...metry-shaders/

    On startup (without navigating through mouse/kb), it gets 46fps, at culling+drawing millions of instances; on my GTX 550 Ti, which is of similar architecture and power.
    P.S. with the 4000's specs, I'd expect it to be able to process at least 500mil tri/s.

  9. #9
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124
    thanks, yes i've tried that nature demo before, just ran it again and did the same calculation - same results as my renderer - approx 50 million instances per second culled (10,000 tree instances + 250,000 grass instances = 260,000 total instances at 180fps = 46.8 million instances per second). Slightly slower than my results because that demo is doing the feedback count query and actually drawing the instances.
    Ah well, I guess I have to just swallow the fact that it's only slightly faster than doing the culling/lodding on the cpu, mapping a VBO using the buffer orphaning technique, and pushing the instances up every frame. Which is crazy if you think about it.

  10. #10
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124
    actually forget what i just said - if, in the nature demo, i fly down under the grass so nothing passes the cull test i get 590fps, which means it's culling on the gpu at a rate of 153 million instances per second, so 3 times what i'm getting. Right, now to compare the code...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •