When is depth written?

Finally knuckled down and started reading Jason McKesson’s excellent GL tutorial. There he says, chapter 5 Boundaries and Clipping section:

the OpenGL spec says that the depth test must happen after the fragment program executes"

But of clipping, if a fragment is to be clipped “it is discarded before any fragment processing takes place”.

Is the depth and clip test the same, i.e, fragments that are behind what is already in the depth buffer will be discarded before fragment processing takes place? Or does the fragment shader process the fragment, test the depth, and write or not write to the color buffer dependingly?

If the second, then is it possible to do the cull up front like clipping using glsl?

Clipping is the process of taking triangles and splitting them into smaller ones, such that all of the remaining triangle(s) fit into the 3D window space. Clipping happens before rasterization, and until you rasterize a triangle, you don’t have fragments of any kind.

That is at odds with:

Instead of clipping, the hardware usually just lets the triangles go through if part of the triangle is within the visible region. It generates fragments from those triangles, and if a fragment is outside of the visible window, it is discarded before any fragment processing takes place.
taken from the tutorial your own answer links to, the Boundaries and Clipping section.

My question is not about clipping but depth. Again, in the Overlap and Depth Buffering section:

With the fragment depth being something that is part of a fragment’s output, you might imagine that this is something you have to compute in a fragment shader. You certainly can, but the fragment’s depth is normally just the window-space Z coordinate of the fragment. This is computed automatically when the X and Y are computed.

I understand this as saying that the depth is computed at the same time the x, y position of the fragment is interpolated from vertex to fragment shader. The question is: is the depth test carried out before the fragment shader processes the fragment and fragments failing the test are culled; or is the fragment processed in case it’s z value is altered in the fragment shader, then the depth test performed, and the fragment color written or not depending on the outcome of the depth test?

Both situations seem problematic. If the first, then how can you alter the depth of a fragment in the fragment shader if that fragment fails the depth test and is therefore discarded. If the second, then the calculation of the depth is not stored in the depth buffer immediately but held in a temporary variable until finalization of the fragment shader. And, confusingly, given the obvious efficiency there would be to early culling of fragments that you have no intention of altering depth values for, why is google not full of examples of glsl fragment shader code performing early depth testing and discarding fragments before processing fragments that will never see day?

The question is: is the depth test carried out before the fragment shader processes the fragment and fragments failing the test are culled; or is the fragment processed in case it’s z value is altered in the fragment shader, then the depth test performed, and the fragment color written or not depending on the outcome of the depth test?

It’s odd that you’re quoting something to ask a question that explains this literally 3 paragraphs before what you quoted. Observe:

Admittedly, I should get rid of that last sentence, since I moved the discussion of early-z elsewhere. But that’s the idea: hardware can do whatever it wants as long as it behaves as though it happened in the way specified by the OpenGL specification.

So, to answer your question: if your shader modifies the depth, then obviously the depth cannot be tested before running the shader, or else the depth test would be meaningless. If your shader does not modify the depth, then you would not be able to tell whether the depth is tested before or after the shader runs. Therefore, the implementation is free to do whatever it wants, up to and including doing the depth test before the shader.

The OpenGL specification requires that the fragment depth is tested against the depth buffer AFTER the fragment shader runs.

However, if the driver can determine that the fragment shader does not alter the depth value, and that performing the depth test prior to the fragment shader will make no difference to the result, then purely as an optimisation it will perform an early depth test so it can discard the fragment without having to run the fragment shader.

The final result written to the depth and color buffers MUST always be identical to what would have been written if the depth test actually occured AFTER the fragment shading.

Except, in OpenGL 4.2 a layout qualifier was added that allows you to explicitly force the depth testing to occur prior to the fragment shader.

Instead of clipping, the hardware usually just lets the triangles go through if part of the triangle is within the visible region. It generates fragments from those triangles, and if a fragment is outside of the visible window, it is discarded before any fragment processing takes place.

So the average GPU has built-in clipping (which is needed for user-defined clip planes anyway), but prefers to just generate millions of fragments that are immediately discarded ?

So the average GPU has built-in clipping (which is needed for user-defined clip planes anyway)

You assume that user-defined clip-planes actually do triangle clipping rather than fragment discarding.

Notice that transform feedback is specified so that it happens before clipping. There’s a reason for that.

but prefers to just generate millions of fragments that are immediately discarded ?

You’re assuming that the rasterizer isn’t written to know where the window coordinates are. The scan converter is simply smart enough to not generate fragments for parts of triangles that are outside of the window.

Clipping has costs. It stalls the triangle pipeline by generating vertex data. It stalls the triangle pipeline by potentially generating more than one triangle. It’s not cheap, so hardware avoids it wherever possible.

Again, it’s a pure optimization; it’s not like you can tell the difference.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.