Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 10 of 34

Thread: transform feedback + glDrawElementsInstanced

Hybrid View

  1. #1
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124

    transform feedback + glDrawElementsInstanced

    In order to avoid the query object stall when combining EXT_transform_feedback with glDrawElementsInstanced it seems to be recommended to use the ARB_draw_indirect extension - but for the life of me I can't find any information on how I get transform feedback to populate the GL_DRAW_INDIRECT_BUFFER needed for the new set of functions this extension introduces.
    I've seen people talk about OpenCL, but how do I get OpenGL's transform feedback mechanism to do it?
    thanks.

    (I've deliberately littered this post with the keyword breadcrumbs I've been searching with for people with the same question!)

  2. #2
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    941
    What do you mean by query object stall with transform feedback and DrawElementsInstanced exactly? What's your use case? Do you feed back vertex array data or instance data using transform feedback?

    If you feed back vertex array data then you should use DrawTransformFeedback to do a non-indexed rendering of the fed back vertex array data.

    If you feed back instance data then you would need atomic counters in the vertex shader or geometry shader, though I'm not aware of any driver supporting non-fragment shader atomic counters currently.
    However, on AMD hardware you can use the new GL_AMD_query_buffer_object extension to feed back the result of a primitive query to a draw indirect buffer in a non-blocking manner. Example #4 in the spec might be just what you are looking for.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  3. #3
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124
    yes i'd just been reading the AMD_query_buffer_object extension just now! spooky. Frustratingly this extension is not supported on the nvidia quadro 4000 even though it's exactly what i need (example #4 could have been written with me in mind).
    yes i'm trying to do frustum culling and lod selection on the gpu, just as you have done in your demos and just as I talk about in my other forum thread (where the question was performance).
    now I've got everything writing to multiple streams, one stream for each lod, and the culling/lod selection is very fast indeed (still approx 50 million per tests per second, but with multiple streams i don't have to do multiple passes over the same instance data!) - but i've now identified the GL_PRIMITIVES_GENERATED query as a pretty significant bottleneck. This is why I'm looking for ways of getting the primitive generated count to the draw command without the CPU readback.

  4. #4
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124
    btw, when i say a significant bottleneck i mean it takes the overall framerate down below doing the culling/lod on the CPU and using glMapBufferRange() to upload the results. So unless I can sort this out, I'll be abandoning the GPU approach.

  5. #5
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    941
    Well, you have at least two options:

    1. Use AMD_query_buffer_object if you can limit your target audience to AMD hardware (however, I hope that NVIDIA will implement it soon too).
    2. Use the visibility results of the previous frame to avoid the stall (you can even have a 2 frame delay). Obviously, this might result in popping artifacts, however, if your camera is not moving super fast and if you have decent frame rates, that one or two frame delay should not have any visible effect on your rendering.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  6. #6
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124
    well that's where it gets complicated (option 2 i mean). You see the instance renderer is used in a number of cull/renders - multiple viewports, quad buffered stereo, cascaded shadow maps.... it's just not practical to have a vbo for each lod for each cull/render phase. Apart from the memory wastage, there's also the code complexity.
    Ah well, life eh.

  7. #7
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124
    Yes that's what I was afraid of. The whole atomic counter stuff scared me, possible sync issues etc. And then aquen mentioned that you can only use atomic counters at fragment level......
    But thanks for the clear explanation of how I'd use them if it came to it. I can but try I suppose, with a heavy heart.

  8. #8
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,720
    The whole atomic counter stuff scared me, possible sync issues etc.
    So, you're frightened by atomic counters, even though the use in this case is fairly obvious and requires exactly one sync point. But you're perfectly fine with rendering something that's not rendering anything, using multiple output streams and geometry shaders that aren't shading any geometry, all to write stuff to a buffer object that you'll use to render instances of geometry.

    If you're going to yoke the GPU to do cool stuff, then yoke it. You're already forced to use GL 4.x hardware by your use of multiple streams. Best to use all of it.

    aquen mentioned that you can only use atomic counters at fragment level
    Then he's wrong. There is nothing in GLSL or OpenGL about where atomic counters can be used.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •