Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 3 of 4 FirstFirst 1234 LastLast
Results 21 to 30 of 34

Thread: transform feedback + glDrawElementsInstanced

  1. #21
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124
    Intel Xeon Quad Core 2.66GHZ, 8GB ram, windows 7 64 bit. Quadro 4000 2GB ram driver 296.88.

    Forgive me for the fps metric, was in a hurry.

    1.2ms is for a relatively small number of instances versus the number I'm actually going to be required to render. Also, consider that this is just for a single pass, whereas I need to also render into the second eye of a stereo pair, and into 4 csm splits. There's also a picture-in-picture second view, albeit without shadow maps.

  2. #22
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,732
    Quote Originally Posted by peterfilm View Post
    the thing you're missing alfonse is that the transform feedback pass is just drawing a long list of GL_POINTS (with rasterization disabled), each point contains vertex attributes, those vertex attributes are the entire objects transform and bounding volume (so in my case that's a mat4x3 for the transform and a vec4 for the sphere). The output of this transform feedback pass is a list of vertex attributes for each lod (I just output the mat4x3, the sphere has done its job) intended to be used in a glDrawElementsInstanced, as the per-instance data not the mesh data.
    OK, but that doesn't explain how it does LOD selection. LOD selection would have to mean changing the model being rendered, yes? Which would require writing values to an indirect buffer, which would then be used with an indirect rendering command.

    I don't see what you need query_buffer_object for in this case. Because the number of objects that pass (ie: the number of indirect rendering commands written) needs to come back to the CPU to be used with multi-draw-indirect. Or to loop over the indirect rendering commands.

    Also, I don't see how this constitutes instanced rendering, since each instance has its own indirect drawing command.

    Or, to put it simply, can you fully describe the algorithm, top to bottom? Because there seem to be some inconsistencies between the descriptions you given thus far.

    Quote Originally Posted by peterfilm View Post
    here's some numbers:-

    instances:-
    26781

    CPU culling/lod selection, with glMapBufferRange to pass results to GPU:-
    590fps

    GPU culling/lod selection, with vertex/geometry shader and transform feedback:-
    1995fps

    NOTE: this is just the culling/lod selection. I've commented out the drawing code.
    Since you're using instancing, what's the performance of not doing frustum culling at all and simply drawing all of the instances?

  3. #23
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124
    no, i still issue a glDrawElementsInstanced() call for each lod once the queries return me the primCount for each lod.
    I'm not using the indirect extension, which is what the original question was in this thread - i see no way of writing to the indirect buffer from transform feedback.
    I gave a link to rastergrids blog which explains the algorithm clearer than I have obviously done so far.

    The problem I'm trying to solve is not specifically the frustum culling, as I said in an earlier post (keep up man!), it's the lod selection. I'm attempting to mask the simplification of the vegetation geometry by sticking to the lod distances carefully set by the artists - batching them together makes too sudden a pop. I'm trying to stop the pop without an explosion in triangle count.

  4. #24
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,732
    i see no way of writing to the indirect buffer from transform feedback.
    Sure you can. You just need to employ atomic increments.

    Each LOD's per-instance data is being written to a separate stream. Every time you write an instance to one of the LOD streams, you atomically increment that LODs atomic counter.

    Now, atomic counters are backed by buffer object storage. But you can use glBindBufferRange, as well as the `offset` field of the atomic counter's layout specifier, to put them anywhere in a buffer object's storage. Like, say, the primCount value of an indirect rendering command.

    Each counter can be set to write to the `primCount` field of a different indirect rendering command, one for each LOD. Thus, when you're finished, you have three indirect rendering commands, all ready to go.

    The only thing you need to do is issue a `glMemoryBarrier(GL_ATOMIC_COUNTER_BARRIER_BIT)` after building the LOD instance data, but before trying to render them. And of course, reset these values to zero each frame before specifying the LODs.

    I have no idea if this will be faster than what you're doing. But there won't be any GPU->CPU->GPU antics.

  5. #25
    Junior Member Regular Contributor peterfilm's Avatar
    Join Date
    Sep 2009
    Location
    UK
    Posts
    124
    Yes that's what I was afraid of. The whole atomic counter stuff scared me, possible sync issues etc. And then aquen mentioned that you can only use atomic counters at fragment level......
    But thanks for the clear explanation of how I'd use them if it came to it. I can but try I suppose, with a heavy heart.

  6. #26
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,732
    The whole atomic counter stuff scared me, possible sync issues etc.
    So, you're frightened by atomic counters, even though the use in this case is fairly obvious and requires exactly one sync point. But you're perfectly fine with rendering something that's not rendering anything, using multiple output streams and geometry shaders that aren't shading any geometry, all to write stuff to a buffer object that you'll use to render instances of geometry.

    If you're going to yoke the GPU to do cool stuff, then yoke it. You're already forced to use GL 4.x hardware by your use of multiple streams. Best to use all of it.

    aquen mentioned that you can only use atomic counters at fragment level
    Then he's wrong. There is nothing in GLSL or OpenGL about where atomic counters can be used.

  7. #27
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,882
    Quote Originally Posted by peterfilm View Post
    Intel Xeon Quad Core 2.66GHZ, 8GB ram, windows 7 64 bit. Quadro 4000 2GB ram driver 296.88.
    Thanks for that!

  8. #28
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    941
    Quote Originally Posted by Alfonse Reinheart View Post
    Sure you can. You just need to employ atomic increments.

    Each LOD's per-instance data is being written to a separate stream. Every time you write an instance to one of the LOD streams, you atomically increment that LODs atomic counter.

    Now, atomic counters are backed by buffer object storage. But you can use glBindBufferRange, as well as the `offset` field of the atomic counter's layout specifier, to put them anywhere in a buffer object's storage. Like, say, the primCount value of an indirect rendering command.

    Each counter can be set to write to the `primCount` field of a different indirect rendering command, one for each LOD. Thus, when you're finished, you have three indirect rendering commands, all ready to go.
    Yes, actually that should work and if you think about it, if you use a load/store image and multi draw indirect, you can even do non-instanced object culling in the same way. If I'll have time to implement something like that, I'll post about it on my blog

    Quote Originally Posted by Alfonse Reinheart View Post
    The only thing you need to do is issue a `glMemoryBarrier(GL_ATOMIC_COUNTER_BARRIER_BIT)` after building the LOD instance data, but before trying to render them. And of course, reset these values to zero each frame before specifying the LODs.
    No, you're wrong. You need glMemoryBarrier(GL_COMMAND_BARRIER_BIT). Everybody seem to misunderstand how glMemoryBarrier works. It does not specify "what source" are you trying to sync but rather "what destination". In all cases glMemoryBarrier is meant to ensure that all shaders that performed image load/stores or used atomic counters finished before the commands after the barrier start. What the barrier bits specify is how you plan to use the written data. This ensures that all the appropriate input caches get flushed before commencing the next draw command.

    Quote from spec:
    COMMAND_BARRIER_BIT: Command data sourced from buffer objects by Draw*Indirect commands after the barrier will reflect data written by shaders prior to the barrier. The buffer objects affected by this bit are derived from the DRAW_INDIRECT_BUFFER binding.
    Then he's wrong. There is nothing in GLSL or OpenGL about where atomic counters can be used.

    There is nothing, that's true. But if you check the extension specs (or the core spec) you can see that the extensions require a minimum of 8 load/store images and atomic counters only for fragment shaders (MAX_FRAGMENT_IMAGE_UNIFORMS and MAX_FRAGMENT_ATOMIC_COUNTERS), but the required number is 0 for all other stages. It's not a coincidence that there are some GL 4.2 capable GPUs not supporting them in all shader stages (at least currently).
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  9. #29
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,882
    Quote Originally Posted by peterfilm
    Yes that's what I was afraid of. The whole atomic counter stuff scared me, possible sync issues etc.
    Quote Originally Posted by Alfonse Reinheart
    So, you're frightened by atomic counters, even though the use in this case is fairly obvious
    Quote Originally Posted by aqnuep View Post
    No, you're wrong. You need glMemoryBarrier(GL_COMMAND_BARRIER_BIT). Everybody seem to misunderstand how glMemoryBarrier works.
    Not to derail the thread, but this is perfect example of why many folks (not just peterfilm), including me, are hesitant to wade into the GLSL "side-effect" waters. For folks that have cooked OpenCL or CUDA kernels, this opens up the same issues you have to deal with there ... definitely not an pool to dive into lightly (watch out for the sharks!).

    I need to see more complete GLSL side-effect example code before I go hacking down that road.

    (Maybe some year there'll be a Expert OpenGL Techniques class at SIGGRAPH that'll cover this in detail... (hint hint). Anyway, we now resume your current program already in progress...)
    Last edited by Dark Photon; 07-12-2012 at 08:05 PM.

  10. #30
    Advanced Member Frequent Contributor
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    906
    I need to see more complete GLSL side-effect example code before I go hacking down that road.
    It was once suggested to me that down-sampling a texture is best done with image load/store instead of using the convenient glGenerateMipmap() - I didn't try it yet but it was suggested by an AMD driver developer (not aqnuep however ). Also, you can apply filters without doing ping-pong rendering as in the case of applying multiple iterations of a blur filter since incorporating already altered pixels when determining the value of the next one is acceptable. To cope with instruction limits one could tile the the full-screen quad and have GPU perform filtering on the tiled regions - not sure exactly if that's permissible mathematically thinking of applying kernels in a undeterministic way with multiple tiles.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •