Depth Buffer Precision

Depth buffering seems to work, but polygons seem to bleed through polygons that are in front of them. What's going on?

You may have configured your zNear and zFar clipping planes in a way that severely limits your depth buffer precision. Generally, this is caused by a zNear clipping plane value that's too close to 0.0. As the zNear clipping plane is set increasingly closer to 0.0, the effective precision of the depth buffer decreases dramatically. Moving the zFar clipping plane further away from the eye always has a negative impact on depth buffer precision, but it's not one as dramatic as moving the zNear clipping plane.

The OpenGL Reference Manual description for glFrustum() relates depth precision to the zNear and zFar clipping planes by saying that roughly $\log_{2}\tfrac{zFar}{zNear}$ bits of precision are lost. Clearly, as zNear approaches zero, this equation approaches infinity.

While the blue book description is good at pointing out the relationship, it's somewhat inaccurate. As the ratio (zFar/zNear) increases, less precision is available near the back of the depth buffer and more precision is available close to the front of the depth buffer. So primitives are more likely to interact in Z if they are further from the viewer.

It's possible that you simply don't have enough precision in your depth buffer to render your scene. See the last question in this section for more info.

It's also possible that you are drawing coplanar primitives. Round-off errors or differences in rasterization typically create "Z fighting" for coplanar primitives. Here are some Drawing Lines over Polygons.

Why is my depth buffer precision so poor?

The depth buffer precision in eye coordinates is strongly affected by the ratio of zFar to zNear, the zFar clipping plane, and how far an object is from the zNear clipping plane.

You need to do whatever you can to push the zNear clipping plane out and pull the zFar plane in as much as possible.

To be more specific, consider the transformation of depth from eye coordinates

$x_e, y_e, z_e, w_e$

to window coordinates

$x_w, y_w, z_w$

with a perspective projection matrix specified by

 glFrustum(l, r, b, t, n, f);


and assume the default viewport transform. The clip coordinates of $z_c$ and $w_c$ are

\begin{align} z_c & = -z_e\dfrac{f + n}{f - n} - w_e\dfrac{2 * f * n}{f - n} \\ w_c & = -z_e \\ \end{align}

Why the negations? OpenGL wants to present to the programmer a right-handed coordinate system before projection and left-handed coordinate system after projection.

and the ndc coordinate:

\begin{align} z_{ndc} & = \dfrac{z_c}{w_c} \\ & = -z_e\dfrac{f + n}{f - n} - w_e\dfrac{2 * f * n}{f - n} \\ & = \dfrac{f + n}{f - n} + \dfrac{2 * f * n * w_e}{z_e(f - n)} \\ \end{align}

The viewport transformation scales and offsets by the depth range (Assume it to be [0, 1]) and then scales by s = (2n-1) where n is the bit depth of the depth buffer:

$z_w = s * (\dfrac{w_e}{z_e} * \dfrac{f * n}{f - n} + 0.5\dfrac{f + n}{f - n} + 0.5)$

Let's rearrange this equation to express ze / we as a function of zw

\begin{align} \dfrac{z_e}{w_e} & = \dfrac{\dfrac{f * n}{f - n}}{\dfrac{z_w}{s} - 0.5\dfrac{f + n}{f - n} + 0.5} \\ & = \dfrac{f * n}{\dfrac{z_w}{s}(f - n) - 0.5(f + n) - 0.5(f - n)} \\ & = \dfrac{f * n}{\dfrac{z_w}{s}(f - n) - f} \\ \end{align}

Now let's look at two points, the zNear clipping plane and the zFar clipping plane:

$\dfrac{z_e}{w_e} = \begin{cases} \dfrac{f * n}{-f} = -n, & \text{when }z_w\text{ is }0 \\ \dfrac{f * n}{(f - n) - f} = -f, & \text{when }z_w\text{ is }s \end{cases}$

In a fixed-point depth buffer, zw is quantized to integers. The next representable z buffer depth away from the clip planes are 1 and s-1:

$\dfrac{z_e}{w_e} = \begin{cases} \dfrac{f * n}{\tfrac{1}{s}(f - n) - f}, & \text{when }z_w\text{ is }1 \\ \dfrac{f * n}{\tfrac{s - 1}{s}(f - n) - f} = -f, & \text{when }z_w\text{ is }s-1 \end{cases}$

Now let's plug in some numbers, for example, n = 0.01, f = 1000 and s = 65535 (i.e., a 16-bit depth buffer)

$\dfrac{z_e}{w_e} = \begin{cases} -0.01000015, & \text{when }z_w\text{ is }1 \\ -395.90054, & \text{when }z_w\text{ is }s-1 \end{cases}$

Think about this last line. Everything at eye coordinate depths from -395.9 to -1000 has to map into either 65534 or 65535 in the z buffer. Almost two thirds of the distance between the zNear and zFar clipping planes will have one of two z-buffer values!

To further analyze the z-buffer resolution, let's take the derivative of $\dfrac{z_e}{w_e}$ with respect to zw

$\dfrac{\operatorname{d}\dfrac{z_e}{w_e}}{\operatorname{d}z_w} = -f * n * (f - n) * \dfrac{\tfrac{1}{s}}{(\tfrac{z_w}{s} * (f - n) - f)^2}$

Now evaluate it at zw = s

\begin{align} \dfrac{\operatorname{d}\dfrac{z_e}{w_e}}{\operatorname{d}z_w} & = -f * (f - n) * \dfrac{\tfrac{1}{s}}{n}\\ & = -f\dfrac{\tfrac{f}{n - 1}}{s} \\ \end{align}

If you want your depth buffer to be useful near the zFar clipping plane, you need to keep this value to less than the size of your objects in eye space (for most practical uses, world space).

Why is there more precision at the front of the depth buffer?

After the projection matrix transforms the clip coordinates, the XYZ-vertex values are divided by their clip coordinate W value, which results in normalized device coordinates. This step is known as the perspective divide. The clip coordinate W value represents the distance from the eye. As the distance from the eye increases, 1/W approaches 0. Therefore, X/W and Y/W also approach zero, causing the rendered primitives to occupy less screen space and appear smaller. This is how computers simulate a perspective view.

As in reality, motion toward or away from the eye has a less profound effect for objects that are already in the distance. For example, if you move six inches closer to the computer screen in front of your face, it's apparent size should increase quite dramatically. On the other hand, if the computer screen were already 20 feet away from you, moving six inches closer would have little noticeable impact on its apparent size. The perspective divide takes this into account.

As part of the perspective divide, Z is also divided by W with the same results. For objects that are already close to the back of the view volume, a change in distance of one coordinate unit has less impact on Z/W than if the object is near the front of the view volume. To put it another way, an object coordinate Z unit occupies a larger slice of NDC-depth space close to the front of the view volume than it does near the back of the view volume.

In summary, the perspective divide, by its nature, causes more Z precision close to the front of the view volume than near the back.

A previous question in this section contains related information.

There is no way that a standard-sized depth buffer will have enough precision for my astronomically large scene. What are my options?

The typical approach is to use a multipass technique. The application might divide the geometry database into regions that don't interfere with each other in Z. The geometry in each region is then rendered, starting at the furthest region, with a clear of the depth buffer before each region is rendered. This way the precision of the entire depth buffer is made available to each region.

Assuming perspective projection, what is the optimal precision distribution for the depth buffer? What is the best depth buffer format?

First what is the precision in the x and y directions? Consider two identical objects, the first of which is at distance d in front of the camera and the other is at distance 2*d. Thanx to the perspective projection, the more distant object will be seen as half the size of the other. This means the precision in the X and Y direction with which it is drawn will be half that of the first object (half as many pixels in X and Y direction for the same object size). So the precision in X and Y direction is proportional to 1/z.

Now i will assume a postulate which defines what i consider to be "the general case": for any given position in the camera space the precision in the z direction should be roughly equal to the precision in the x and y directions. This means if you attempt to calculate the position of an object by its rendered image and the values in the depth buffer, all the 3 components of the position you calculate should be within the same error margin. This also means the maximal (camera-space) z difference which can causes z-fighting is equal to the (camera-space) size of 1 pixel at the same distance (in z direction).

From all the above it follows that the precision distribution of the depth buffer should be such that values close to z have approximate precision of C/z for any given camera space z (within the valid range), where C is some constant.

Lets assume the depth buffer stores integer values (witch means uniform precision over the entire range) and the z coordinate values are processed by a function f(x) before sored to the depth buffer (that is, for given camera space z, to the depth buffer goes the value f(z)). Let's find what is this function. Denote it's inverse with g(x). Denote the smallest difference between 2 depth buffer values with s (this is a constant over the entire depth buffer range because, as we assumed, it has uniform precision). Then, for given camera-space z, the minimal increment of the camera-space depth is equal to g(f(z) + s) - g(f(z)). The minimal increment is the inversed precision, so it should be equal to z/C. From here we derive f(z) + s = f(z*(1+C)), where s and C are constants. This is the defining equation of the logarithm function. So f(x) = h*log2(x) for some constant h (h depends on C and s, but since C is unknown constant itself, the exact formula is of no use). I prefer log2(x) over ln(x) because in the context of binary computers log2 is more "natural". In particular the floating point format stores values that are roughly close to log2 of the true values, at least when it comes to precision distribution - what we are concerned with now. Thus, assuming the above postulate, the floating point format is nearly perfect for a depth buffer.

There is one more little detail. With standard projection matrix, the values that are output to the depth buffer are not the camera-space z but something that is proportional to 1/z. The log function has the property that f(1/x) = -f(x), so it is only a matter of sign change.

So the best depth format would be the floating point, but it may be necessary to apply some negating/scaling/shifting to adjust for best precision distribution. glDepthRange() should be enough for this.