gl_Fragcoord.xy, depth texture to eye coordinates

I’m having some trouble converting the depth written to a depth texture, together with the gl_Fragcoord.xy values to the eye position. I’m trying to derive the formula by inversing everything that happens to coordinates in eye space when rendering. However, it seems I derive another formula than the ones I found in other parts of this forum. Perhaps I made a mistake somewhere, or I just fail to see how the other formula’s are the same as mine.

This is my approach:

When rendering the following happens:

[eye coordinates: (xe, ye, ze, we)]
-> apply projection matrix
[clip coordinates: (xc, yc, zc, wc)]
-> apply normalizing with clip coordinate w
[normalized device coordinates: (xd, yd, zd)]
-> apply viewport transformation
[window coordinates: (xw, yw, zw)]


These transformations look like the following:


projection matrix (assuming -left == right, thus origin centered):
A  0  0  0
0  B  0  0
0  0  C  D
0  0 -1  0

with:
A = 2n / w
B = 2n / h
C = -(f + n) / (f - n)
D = -(2 * f * n) / (f - n)

w: width (opengl coordinates)
h: height (opengl coordinates)
n: near (opengl coordinates)
f: far (opengl coordinates)


Normalizing to Normalized Device Coordinates (NDC)
(see OpenGL 3.3 spec, page 92):
xd = xc / wc
yd = yc / wc
zd = zc / wc

Viewport transformation (see OpenGL 3.3 spec, page 92):
xw = vw / 2 * xd + hvw
yw = vh / 2 * yd + hvh
zw = zd * (f - n) / 2 + (n + f) / 2

vw: viewport width (pixels)
vh: viewport height (pixels)
hvw: viewport width / 2, assuming (0,0) is origin (pixels)
hvh: viewport height / 2, assuming (0,0) is origin (pixels)

So… to go from depth as read from a depth texture and the gl_Fragcoord.xy, I figured I had to do the reverse. In that case the transformations look like this:


Inverse projection matrix (assuming -left == right):
1/A   0   0   0
 0   1/B  0   0
 0    0   0  -1
 0    0  1/D C/D

(with A, B, C, D as before)
This seems to be correct (see Appendix F of the Red Book).

Inverse normalization:
xc = wc * xd
yc = wc * yd
zc = wc * zd
wc = ?

Inverse viewport transformation:
xd = 2 * (xw - hvw) / vw
yd = 2 * (yw - hvh) / vh
zd = 2 * zw / (f - n) - (n + f) / (f - n)

Now, I rewrite all these reverse transformations to try and obtain an (hopefully) easy formula to get the eye coordinates from texture depth + gl_Fragcoord.xy.


eye coord = Inverse Matrix * clip coord
          = Inv.Mat * wc * NDC
          = Inv.Mat * wc * Inv.Viewport

xe = 1/A * wc * 2 * (xw - hvw) / vw
ye = 1/B * wc * 2 * (yw - hvh) / vh
ze = -wc
we = 1/D * wc * [2 * zw / (f - n) - (n + f) / (f - n)] + C/D * wc

Rewrite `we' (apply 1/D and 1/C):
1/D = -(f - n) / (2 * f * n)
C/D = (f + n) / (2 * f * n)
we = wc * [(-zw) / (f * n) + (n + f) / (2 * f * n)]
     + wc * [(f + n) / (2 * f * n)]

Because we want to end up with an eye vector with `w'
coordinate equal to 1, we can devide (xe,ye,ze) by (we),
this allows removing of all wc factors:

xe = 1/A * 2 * (xw - hvw) / vw
ye = 1/B * 2 * (yw - hvh) / vh
ze = -1
we = [(-zw) / (f * n) + (n + f) / (2 * f * n)] + [(f + n) / (2 * f * n)]


Rewrite `we' a bit more:
we = (-zw) / (f * n) + (2 * (f + n)) / (2 * f * n)
   = (f + n - zw) / (f * n)


And now multiply (xe,ye,ze,we) by the inverse of we:
xe = (f * n) / (f + n - zw) * 1/A * 2 * (xw - hvw) / vw
ye = (f * n) / (f + n - zw) * 1/B * 2 * (yw - hvh) / vh
ze = -(f * n) / (f + n - zw)
we = 1


Lets have a look at 1/A and 1/B:
1/A = w / 2n
1/B = h / 2n

xe = (f * n) / (f + n - zw) * w/n * (xw - hvw) / vw
ye = (f * n) / (f + n - zw) * h/n * (yw - hvh) / vh
ze = -(f * n) / (f + n - zw)
we = 1

Which could be written as:
xe = -ze * w/n * (xw / vw - 0.5)
ye = -ze * h/n * (yw / vh - 0.5)
ze = -(f * n) / (f + n - zw)
we = 1

However… I found some formula’s on these forums that are a bit different from mine, although there are also some similarities:

http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=239700#Post239700


mat4 m = gl_ProjectionMatrix;
float Z = m[3].z / (texture2DRect(G_Depth, gl_FragCoord.xy).x
 * -2.0 + 1.0 - m[2].z);
vec3 modelviewpos = vec3(pos.xy/pos.z*Z,Z);

http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=239127#Post239127


float DepthToZPosition(in float depth) {
   return camerarange.x / (camerarange.y - depth *
   (camerarange.y - camerarange.x)) * camerarange.y;
}

float depth = texture2D(texture1,texCoord).x;

vec3 screencoord;

screencoord = vec3(
  ((gl_FragCoord.x / buffersize.x)- 0.5) * 2.0,
  ((-gl_FragCoord.y/buffersize.y)+0.5) * 2.0 / (buffersize.x/buffersize.y),
  DepthToZPosition(depth));

screencoord.x *= screencoord.z;
screencoord.y *= -screencoord.z;

I fail to see how I could derive the above formula’s. And because many different formula’s can be found on the internet (perhaps to transform to different coordinate systems, or for DirectX instead of OpenGL, often it is unclear what transformation exactly is done) I thought I would derive the formule myself. However it does not seem to match with anything I found (except the more general examples that just say: use the inverse projection matrix).

Does anybody have a clue whether my general approach is correct? Or whether I am making a mistake somewhere? Also I’d like to know what the other formula’s are exactly about: do they transform to eye space coordinates as well?

Perhaps I got a bit too far in depth with this post. Can somebody tell me whether my general approach (apply inverse viewport transformation, apply inverse NDC transformation, apply inverse projection matrix) would be the proper way of going from a depth stored in a depth texture + gl_Fragcoords.xy to a position in eye space? Or do I miss any steps / apply steps that are unnecessary?

So far the results with my approach do not look good. If anybody can tell me how the other examples (the code fragments in the bottom) are derived, I’d be happy to hear that as well.

What you say intuitively makes sense, but its difficult to “undo” the perspective divide (which you call “inverse NDC transformation”) when you don’t know the numerator and denominator. Easier to write out the forward transformation, and then solve for eye coords.

If anybody can tell me how the other examples (the code fragments in the bottom) are derived, I’d be happy to hear that as well.

I can help you out with that a bit, since I verified one of those myself not long ago. This:

mat4 m = gl_ProjectionMatrix;
float Z = m[3].z / (texture2DRect(G_Depth, gl_FragCoord.xy).x
 * -2.0 + 1.0 - m[2].z);

Or renaming terms, this:

z_eye = gl_ProjectionMatrix[3].z / (z_viewport * -2.0 + 1.0 - gl_ProjectionMatrix[2].z);

comes just from the fact of writing out what z_viewport is in terms of z_eye (forward transform), and then just solving for z_eye. That is…:

z_ndc = z_clip / w_clip
z_ndc = [ z_eye*gl_ProjectionMatrix[2].z + gl_ProjectionMatrix[3].z ] / -z_eye

if we assume a perspective projection. The 2nd step presumes w_eye = 1. Solve the above for z_eye, and you get:

z_eye = gl_ProjectionMatrix[3].z/(-z_ndc - gl_ProjectionMatrix[2].z);

Typically your glDepthRange is 0…1, so z_ndc = z_viewport * 2 - 1, so plugging that in…

z_eye = gl_ProjectionMatrix[3].z/(z_viewport * -2.0 + 1.0 - gl_ProjectionMatrix[2].z);

You can do a similar analysis for orthographic projection if desired. If your depth map is for a point light, perspective is what you want.

All that analytical stuff said, considering what you are doing I think you’re gonna find it a lot easier to just compute a “camera-eye-to-light-viewport” transform, pass that in, and be done with it. It’s just matrix plug-n-chug. For some intuition, look at the diagram here: http://www.paulsprojects.net/tutorials/smt/smt.html

Ah, why didn’t I think about that… just have a good look at the forward transformations and reverse them that way (instead of trying to reverse the general formula that makes very little assumptions about the projection matrix).

Thanks for you insights, I’ll give it a go your way!

Ah, I haven’t fully recalculated everything, but it seems I have mixed up some variables in my first post. For [window coordinates -> ndc] I used the same near and far values as in the projection matrix. While the near and far values in the [window coord -> ndc] transformation should be the same as used in glDepthRange. After doing some quick math, what I get out from it looks very much the same as the equation you explained, however I did it by reversing all formula’s.

So I guess I’m good now. Thanks again for triggering the right parts of my brain ;).

Sure thing!

Just my 2cents… using the inverse of ViewportProjectionModelview is perfectly possible and works with perspective and orthogonal projection either way:

uniform mat4 u_unproject; // (VPM)^-1
uniform vec2 u_texsize;
unifom sampler2D tex_depth;

float depth = texture2D(tex_depth, gl_FragCoord.xy/u_texsize).r;
vec4 pos = u_unproject * vec4(gl_FragCoord.xy, depth, 1.0);
pos.xyz /= pos.w; //back to cartesian coordinates!

As a sidenote, I consider glDepthRange() to be part of the viewport transformation.

That is exactly what I did: using the inverse matrix. But I’ve written it out so that the terms simplify a bit. If you do that you’ll get the formula Dark Photon explained (assuming you are using glDepthRange(0.0, 1.0)).

edit: unfortunately I wasn’t able to change my initial post anymore when I discovered what I did wrong. But I hope/assume that somebody looking for the derivation and understanding the steps that are involved going from eye coordinates to window coordinates and the other way around is able to understand what is going on by reading the rest of the thread.

To be clear: the steps in the initial post were correct, but some variables were messed up. The formula `zw = zd * (f - n) / 2 + (n + f) / 2’ assumes the same far and near as used in the projection matrix, but it should be the far and near used in glDepthRange (defaults to 0 and 1). So that formula should become something like:

zw = zd * (drf - drn) / 2 + (drn + drf) / 2
with drf: the far value used in glDepthRange
with drn: the near value used in glDepthRange

Of course the inverse changes as well in that case:

zd = 2 * zw / (drf - drn) - (drn + drf) / (drf - drn)

Other than that my approach was correct. Using the correct formula one can derive the formula explained by Dark Photon (but only if glDepthRange(0.0, 1.0)).

In case other values are used for glDepthRange or in case a more complex projectionmatrix is used (for example one in which 0,0,0 is not in the center) the final formula might be different (but the approach is still correct).