Occlusion Culling with Transform Feedback

Hi, All!

Hope you have a good day.

I need some explanation for topic.

I try to understand “Transform Feedback” in “Occlusion culling” context.
I have mipmaped depth map for testing of objects.

My question is: Why I should use “Transform Feedback” (I mean serialization of instances into buffer object) if I can just test primitives in geometry shader (reject them there if need) and raster them immediately?

Thank you!

Transform feedback happens during vertex post-processing, right after all vertex processing stages. Occlusion queries happen much later, during rasterization and depth testing.

So these two operations have nothing in common.

If you use conditional rendering to cull geometry based on a query, the culled rendering commands will never reach the transform feedback stage.

My question is: Why I should use “Transform Feedback” (I mean serialization of instances into buffer object) if I can just test primitives in geometry shader (reject them there if need) and raster them immediately?

Because that’s not what transform feedback is for. Transform feedback is for when you need to store some intermediate data and process it later, typically more than once. A GS can’t actually do that. An instanced GS can process the same primitive in multiple ways, but only in limited ways. Different GS invocations cannot send primitives to different framebuffers; only to different layers within the same framebuffer. Also, GS instancing has limitations; there are no limitations with how many times you can render the same geometry from a feedback operation.

Also, GS’s are slow.

Seems I understood why solution I have read use “Transform Feedback” for Culling.

That because solution use depth map (terrian only) from current frame and must render map first.

In my case I already have reprojected depth map from previous frame and I can cull primitives in GS phase.
Am I right?

This only thing that worry me.

Why GS is slow?

[ul]
[li]Why Geometry Shaders Are Slow (Unless you’re Intel) (2015-03, Barczak) [/li][li]A trip through the Graphics Pipeline, Part 10 (Geometry Shaders) (2011-07, Giesen) [/li][/ul]
I hope someone else follows up with other links on this topic.

Thank you!

I gone to learn it.

In the answer we can see technical details of GS working: opengl - Why does this geometry shader slow down my program so much? - Game Development Stack Exchange

So, I think I can move “culling process” into vertex shader. :slight_smile:

Thank you for help, guys!

I have implemented Occlusion Culling in the geometry shader and have some interesting results:

  1. The activation GS on my “R9 nano with open source AMDGPU driver” didn’t make any performance overhead. All results are within the statistical error.
  2. The activation of “Frustum Culling” with “Occlusion Culling” in the GS also didn’t give any effect at all. (I don’t understand this fact :))

I learned example of “Instance Culling”: http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/

In the article I found next: “For loops seemed not to work when used inside the geometry shader, that’s why the culling itself is done in the vertex shader in the demo.”
It sounds as “The culling is not effective in the GS, and should be located in VS.”
Example solution make culling in the VS and send flag of it to GS. GS just reject vertex if it culled and working per vertex:

#version 330 core

layout(points) in;
layout(points, max_vertices = 1) out;

in vec4 OrigPosition[1];
flat in int objectVisible[1];

out vec4 CulledPosition;

void main() {

   /* only emit primitive if the object is visible */
   if ( objectVisible[0] == 1 )
   {
      CulledPosition = OrigPosition[0];
      EmitVertex();
      EndPrimitive();
   }
}

I think this is the reason why such GS don’t give overhead, it doesn’t require of primitive buildings.

Could you answer for two questions:

  1. GS working per one vertex - it is ok?
  2. Why culling in the GS has no any effect?

That only works if you’re culling points. With lines or triangles, it’s possible for parts of the primitive to be inside the view frustum even if none of its vertices are.

If I understood solution correctly, the author use extension that guarantee fact:
“if vertex + extension outside of frustum, that means whole object is outside”.

The same way can be used for depth test for Occlusion Culling.

But I’m not sure of effectiveness.

example:

vertex A: (+0 | -1.1 | +0) (middle of the screens lower edge)
vertex B: (+1.1 | -1.1 | +0) (behind the lower right corner)
vertex C: (+1.1 | 0 | +0) (middle of the screens right edge)

each vertex lies outside of screen space, though the line AC lies almost completely in screen space, as most of the fragments.

you might wanna read about “instance culling”. afaik that has nothing todo with occlusion queries. precalculate the max_radius of a mesh, check for each instance of that mesh if its centers screen space location is more than max_radius away from the screen. if so, omit its drawing. if it lies within screen space or the distance of its center is smaller than max_radius from the screen, copy that instance into the instance buffer for drawing later (in the next step) …

the vertex shader just passes through. the transformation and conditionan transform feedback operation is don in the geometry shader (or use a compute shader instead)

[QUOTE=john_connor;1290960]example:

vertex A: (+0 | -1.1 | +0) (middle of the screens lower edge)
vertex B: (+1.1 | -1.1 | +0) (behind the lower right corner)
vertex C: (+1.1 | 0 | +0) (middle of the screens right edge)

each vertex lies outside of screen space, though the line AC lies almost completely in screen space, as most of the fragments[/QUOTE]

You are right.

But, how can I implement Occlusion Culling?

[QUOTE=nimelord;1290961]You are right.

But, how can I implement Occlusion Culling?[/QUOTE]

you have a complicated mesh to draw and you want to avoid its drawing if possible. what you do first is wrap the smallest cube possible around that complicated mesh, draw that cube with an “empty” early-depth-testing fragment shader.

https://www.khronos.org/opengl/wiki/Early_Fragment_Test

layout(early_fragment_tests) in;

it doesnt send a color to the FBO, its just there to do “nothing” on every fragment:

void main {}

but with occlusion query you can determine if any fragment hits the screen space. if so, then draw the compliicated mesh. if not, omit its drawing …

https://www.khronos.org/opengl/wiki/Query_Object#Occlusion_queries

example:

what you wanna do is NOT to check how many fragments have passed the depth test immediately after its drawing. because you want to avoid (implicit) syncronization. so you do the test cube drawing for all the meshes first, then do something else. later (maybe 1 frame later) check which instance is visible, draw it if necessary …

https://www.khronos.org/opengl/wiki/Synchronization#Implicit_synchronization

[QUOTE=john_connor;1290962]you have a complicated mesh to draw and you want to avoid its drawing if possible. what you do first is wrap the smallest cube possible around that complicated mesh, draw that cube with an “empty” early-depth-testing fragment shader.

https://www.khronos.org/opengl/wiki/Early_Fragment_Test

layout(early_fragment_tests) in;

[/QUOTE]

Such early fragment test can be used with async calls only?

Sorry, I didn’t understand how to use early-depth-testing fragment shader. It must be attached to current pipeline, or how?
I couldn’t find any code example.

After some investigation I found algorithm for “early depth test”:

  1. disable color writing, draw the scene.
  2. disable depth writing, leave enabled depth test.
  3. enable color writing, draw the scene.

It is correct one?

If yes, could you answer:

  • How opengl will know on second draw call phase which fragments was been rejected by early depth fragment shader?
  • Those two draw calls are different phases of the same pipeline, thats right?

Is it correct formalization?:

  • 1st draw call: “scene vertex shader” + empty “early depth fragment shader”
  • 2d draw call: only “scene fragment shader” (As I think it just catch all not rejected fragments after 1st call)

Thanks!

[QUOTE=nimelord;1290975]After some investigation I found algorithm for “early depth test”:

  1. disable color writing, draw the scene.
  2. disable depth writing, leave enabled depth test.
  3. enable color writing, draw the scene.

It is correct one?
[/QUOTE]
That’s a “depth pre-pass”, i.e. you first render the scene into the depth buffer, then into the colour buffer. The advantage of doing it this way is that the depth buffer already contains the closest depth for each pixel before you start rendering anything to the colour buffer(s). If step 3 uses an early depth test (either because of an explicit layout qualifier in the fragment shader, or because the implementation has determined that it can use an early depth test without changing the results), the fragment shader will only be invoked for fragments which pass the depth test (i.e. those for the closest primitive at any point). This avoids invoking the fragment shader (which will perform computation, read textures, etc) for fragments which end up being overdrawn by closer primitives.

Note that early depth tests can’t be used with a fragment shader which uses [var]discard[/var] or which writes to gl_FragDepth, as those mean that the value written to the depth buffer cannot be determined until the fragment shader has run.

An occlusion query boils down to a sequence of glBeginQuery(GL_ANY_SAMPLES_PASSED), render something, glEndQuery(). Often this is done without writing to either the depth or colour buffers, but that isn’t necessary; you can perform an occlusion query during normal rendering. However, to be of any use, all of the major occluders (anything which might realistically occlude an entire object) must have already been rendered to the depth buffer (they may or may not have also been rendered to the colour buffer(s)). In short, an occlusion query asks “is any part of what I just rendered actually visible?”

[QUOTE=GClements;1290976]That’s a “depth pre-pass”, i.e. you first render the scene into the depth buffer, then into the colour buffer. The advantage of doing it this way is that the depth buffer already contains the closest depth for each pixel before you start rendering anything to the colour buffer(s). If step 3 uses an early depth test (either because of an explicit layout qualifier in the fragment shader, or because the implementation has determined that it can use an early depth test without changing the results), the fragment shader will only be invoked for fragments which pass the depth test (i.e. those for the closest primitive at any point). This avoids invoking the fragment shader (which will perform computation, read textures, etc) for fragments which end up being overdrawn by closer primitives.

Note that early depth tests can’t be used with a fragment shader which uses [var]discard[/var] or which writes to gl_FragDepth, as those mean that the value written to the depth buffer cannot be determined until the fragment shader has run.[/QUOTE]

So, I’m trying to combine all pieces together.

  • I need to render scene two times: 1st - render depth map, 2d - render scene with early depth test using prepared depth map.
  • For 1st rendering I replace real heavy fragment shader by empty one. (just rasterize it with disabled color writes).
  • For 2d rendering I use real heavy fragment shader with qualifier “layout(early_fragment_tests)”, disable depth writes and leave enabled depth test (also not using rejecting in FS).
  • Both times I use the same vertex shader and don’t use geometry shader (because it redundant here).

Are those points above right?

[QUOTE=nimelord;1290982]So, I’m trying to combine all pieces together.

  • I need to render scene two times: 1st - render depth map, 2d - render scene with early depth test using prepared depth map.
  • For 1st rendering I replace real heavy fragment shader by empty one. (just rasterize it with disabled color writes).
  • For 2d rendering I use real heavy fragment shader with qualifier “layout(early_fragment_tests)”, disable depth writes and leave enabled depth test (also not using rejecting in FS).
  • Both times I use the same vertex shader and don’t use geometry shader (because it redundant here).

Are those points above right?[/QUOTE]
That’s a correct summary of the depth pre-pass optimisation (at least in its simplest form).

However, this has nothing to do with occlusion culling. Occlusion culling involves rendering with a “complete” depth buffer (either from a depth pre-pass or from “normal” rendering) and depth tests enabled, bounded by an occlusion query. Depth and/or colour writes may or may not be enabled, depending upon whether you’re only performing an occlusion query or performing it as part of normal rendering.

Well, I say “nothing” but there is one interaction between the two concepts: if you’re performing occlusion queries with the intent on using the results for the current frame, a depth pre-pass reduces the risk and/or severity of pipeline stalls, as you can perform the rendering for the occlusion queries after the first phase (depth rendering) but won’t need the results until after the second phase (colour rendering). This makes it more likely that the results will already be available when you need them.

thank you alot!

I have found couple of more complicated techniques: GitHub - nvpro-samples/gl_occlusion_culling: OpenGL sample for shader-based occlusion culling