From window coordinates to world coordinates

Hello.

I’ve read various tutorials on “3D picking” around the web, but most of them seem to be either self contradictory, wrong, or gloss over points without explanation. I’ve searched the forum here and only found references to posts that then link to other poor quality explanations off-site.

Essentially, I’m trying to implement ray picking. That is, I’m transforming a point in OS-specific window coordinates, to GL world coordinates and then using that to cast a ray into the world in order to determine which objects were under the cursor when the user clicked the mouse. I already have working spatial data structures to do ray casting, so the problem is solely in actually doing the correct transformations to get from window-space to world-space.

I’m using the songho.ca page (www.songho.ca/opengl/gl_transform.html) on matrix transformations as a reference.

According to every text I can find on the subject, I need to do the following:

[ol]
[li]Transform window coordinates to viewport coordinates (by flipping the Y value and removing the viewport translation, for example)
[/li][li]Transform the viewport coordinates to clip-space coordinates
[/li][li]Multiply the clip-space coordinates by the inverse of the (projection * modelview) matrix
[/li][/ol]

The first and third steps are no problem.

The problem is in the second step. In order to get from clip-space to normalized device coordinates, the clip-space coordinates are divided by their own [var]w[/var] component. If all I have are 2D viewport coordinates, where do the [var]z[/var] and [var]w[/var] coordinates come from?

I think what you may be missing is you can’t back-project a 2D point and get a 3D point. There are an infinite number of points lying underneath the center of a pixel along the view ray. You need a window-space depth (Z) at that 2D coordinate (X,Y) – in other words a 3D point – to backproject into a 3D point.

However, it sounds like all you want is a 3D ray from the eyepoint to a pixel anyway. So the depth value is just the near clip plane (where the “window” conceptually is). And you don’t need all this back-projection stuff. Just take the window-space pixel coordinate (0…1, 0…1), map it to (L…R, T…B), and tack on -N for the depth value. This gives you an eye-space vector from the eyepoint (0,0,0) through the center of the specified pixel (L…R, T…B, -N). Now you’ve got your ray and are ready to trace.

[QUOTE=Dark Photon;1248202]I think what you may be missing is you can’t back-project a 2D point and get a 3D point. There are an infinite number of points lying underneath the center of a pixel along the view ray. You need a window-space depth (Z) at that 2D coordinate (X,Y) – in other words a 3D point – to backproject into a 3D point.
[/QUOTE]

I see, makes sense. However:

[QUOTE=Dark Photon;1248202]
However, it sounds like all you want is a 3D ray from the eyepoint to a pixel anyway. So the depth value is just the near clip plane (where the “window” conceptually is). And you don’t need all this back-projection stuff. Just take the window-space pixel coordinate (0…1, 0…1), map it to (L…R, T…B), and tack on -N for the depth value. This gives you an eye-space vector from the eyepoint (0,0,0) through the center of the specified pixel (L…R, T…B, -N). Now you’ve got your ray and are ready to trace.[/QUOTE]

In this case, no, I don’t think so. For this particular program, there are various other issues involved that mean that I do have to do the ray casting in world space (such as allowing the user to click multiple times to select progressively more distant but overlapping objects). It seems easier to achieve that by reversing the projection with a few simple matrix operations than involving the rest of the rendering pipeline at this stage.

Assuming that I do want the ray in world space: If I understand correctly, assuming I have normalized device coordinates, then choosing 0 for the [var]z[/var] component will give me a value on the near clipping plane, and choosing 1 for the [var]z[/var] component will give me a value on the far clipping plane? Presumably the [var]w[/var] component can just be set to [var]1[/var]?

Then take that eye-space vector and multiply by the inverse viewing transform.

Assuming that I do want the ray in world space: If I understand correctly, assuming I have normalized device coordinates, then choosing 0 for the [var]z[/var] component will give me a value on the near clipping plane

No, NDC runs from -1…1, in X, Y, and Z. So NDC near clip Z = -1.

You’re thinking window-space, which runs from 0…1 (assuming your glDepthRange is set to the default values of 0, 1).

THANK YOU

This was what was consistently screwing up the test implementations I tried to write.

I’d been trying for a few days to get something working, and when it failed, I decided to start from scratch and try to verify each step. This was the key piece of information I’d gotten wrong each time.

Sure thing! Glad you got it figured out.