Texture space is not tangent space: Discussion

Just thought I’d drop this in here for comment, and because it helps me think it through. Basically it is a cheaper version of tangent space useful for floors, walls, and ceilings, all of which if they are flat do not need a tangent matrix per vertex. Unfortunately, because I cannot find an image program that creates world space normal maps; normals are defined in tangent space a la Gimp, Photoshop normal plugins etc, so a tangent matrix is still needed.

Lets say we have a normalized normal N=(0.2, 0.3, 0.87), generated in Gimp from a height map and stored in a normal map. As you can see, the greater part of the normal (0.87) points out of the screen toward the viewer who is sitting up the positive Z axis. Were this normal map to be applied to a south facing wall (one that faces the viewer) then we could read the normal from the map and use it directly in lighting calculations because, for all intents and purposes, it would be the same as a world/object space normal map.

As a quick aside it is important to understand the difference between object and world space. While it is said that models are defined in object space, they are not really, they are defined in world space only they have not been moved… object space is world space that has had no movement to it. As soon as the object is moved a transformation matrix called the ModelView matrix is applied to it, the object now consists of its object space coordinates and a transformation to those coordinates that changes the location/orientation of the model in world space. If you reversed the transformation the model would return to its object space coordinates.

The normal map situation becomes problematic if we want to use this normal map on the floor, for there we want the greatest extent of the normals to point up the y axis. Intuitively, however, it seems that all we need do is bend the coordinate frame in which the original vertex normal and the perturbed normal (N`) sit through 90 deg about the X axis… that is, we want to glue the perturbed normal to the Z axis and rotate the Z axis till it points up; this will drag the perturbed normal to where we want it, pointing up the y axis, but correctly offset from it.

We can do that by providing a base frame and multiplying the perturbed normal by the inverse of that frame. What is a base frame? It is a set of vectors that define the x, y, and z axis of a 3D coordinate frame. World space defines the standard base frame

x=(1.0, 0.0, 0.0)
y=(0.0, 1.0, 0.0)
z=(0.0, 0.0, 1.0)

Any vector is only meaningful if it is specified in a coordinate frame. The vector vec=(0.2, 0.3, 0.87) is actually:

vec dot standard base frame, or:
vec.x=(1.0, 0.0, 0.0).( 0.2, 0.3, 0.87)=0.2
vec.y=(0.0, 1.0, 0.0).( 0.2, 0.3, 0.87)=0.3
vec.z=(0.0, 0.0, 1.0).( 0.2, 0.3, 0.87)=0.87

which is to say, 0.2 units along the x axis, 0.3 units up the y axis, and 0.87 units out along the z axis of the base frame.

If we create a new base frame and do the dot products using the same vector but with the inverse of the base frame, that vector will be rotated as if it were glued to the standard base frame, and the standard base frame was rotated to align with the new base frame.

The base frame we need takes the world space y axis and uses it as the zbase of the new frame, the x axis remains unchanged, and the y axis now points down the Z axis. The new base frame is:

xbase=(1.0, 0.0, 0.0)
ybase=(0.0, 0.0, -1.0)
zbase=(0.0, 1.0, 0.0)

We need the inverse of this base frame which because the frame is orthogonal (the axis are perpendicular to each other) we can get by transposition (basically substitute the columns for the rows):

xbase`=(1.0, 0.0, 0.0)
ybase`=(0.0, 0.0, 1.0)
zbase`=(0.0, -1.0, 0.0) 

Using dot products (as opposed to matrix multiplication but with identical effect) we can convert the perturbed normal to point up the y axis:

x`=xbase`.N`=(1.0, 0.0, 0.0).(0.2, 0.3, 0.87)=(1.0*0.2+0.0+0.0)=0.2
y`=ybase`.N`=(0.0, 0.0, 1.0).(0.2, 0.3, 0.87)=(0.0+0.0+1.0*0.87)=0.87
z`=zbase`.N`=(0.0, -1.0, 0.0).(0.2, 0.3, 0.87)=(0.0+(-1*0.3)+0.0)=-0.3

Which gives us the normal we are looking for:

N``=(0.2, 0.87, -0.3)

For the hell of it, and because it is extremely important for what follows, lets dot product the new vector with the non-inverted new base matrix:

x=xbase.N``=(1.0, 0.0, 0.0).(0.2, 0.87, -0.3)=(1.0*0.2+0.0+0.0)=0.2
y=ybase.N``=(0.0, 0.0, -1.0).(0.2, 0.87, -0.3)=(0.0+0.0+(-1.0*-0.3))=0.3
z=zbase.N``=(0.0, 1.0, 0.0).(0.2, 0.87, -0.3)=(0.0+1.0*0.87+0.0)=0.87

leaving us with the original perturbed normal (0.2, 0.3, 0.87). The inverse of the matrix moves a vector into the new base; the non-inverse moves a vector out of the new base and into world space or the standard base frame.

Now let’s consider the light vector. The light has a position in world space eg lightPos=(10.0, 10.0, 5.0). Let’s say the pixel the fragment shader is currently working on is at pos=(2.0, 0.0, 1.5), ie, it is part of the floor we have been considering… we assume pos is gl_Position * ModelViewMatrix as interpolated into the fragment shader. LightDir would be:

lightDir=normalize(gl_Position-lightPos)=normalize(2.0-10.0, 0.0-10.0, 1.5-5.0)=normalize(-8.0, -10.0, -3.5)=(-0.6, -0.75, -0.26)

Note that lightDir points from the light’s direction toward the pixel. Note also that the perturbed normal we calculated above would work correctly with lightDir, that is, the normal would be pointing in the general direction of the light. The lightDir vector can be envisaged as a vector in the same new base frame, its tail attaches to the point defined by pos (the pixel being worked on), and it heads in a direction somewhat opposite the normal vector.

But if we could return Nto N` using the non-inverted new base matrix, and if the light vector can be said to be defined within the same new base frame as N, then we can also turn lightDir:

l.x=xbase.lightDir=(1.0, 0.0, 0.0).(-0.6, -0.75, -0.26)=-0.6
l.y=ybase.lightDir=(0.0, 0.0, -1.0).(-0.6, -0.75, -0.26)=0.26
l.z=zbase.lightDir=(0.0, 1.0, 0.0).(-0.6, -0.75, -0.26)=-0.75

Giving us lightDir`=(-0.6, 0.26, -0.75)

This is kind of difficult to envisage so grab a pen and hold it up representing the y axis (which is the z axis of the new base frame). Now grab another pen and stick its tail at the bottom of the y pen representing the lightDir, it will point to the left down and away from you. Now rotate both pens as if welded together so that the y axis pen points directly to you, the lightDir pen will now point to the left, up and away. Especially note that the y point moves from negative to positive.

OK, so what does all this mean? It means that the base frame can be used to convert lightDir, and by extension eyeDir, into correct position relative to the perturbed normal read from the normal map… ready to perform the required light calculations. Essentially, it acts like texture space matrix but we have had no need of uv cords to construct the tangent space, nor have we need of attaching the T vector (and possibly B vector) as attributes to each vertex; we need only pass one T vector as a uniform variable for the entire floor, wall, ceiling, etc. Furthermore, because z (b) values in normal maps are mapped different to x ® and y (g) values, we might be better to recalculate the z value from the x and y and use the z channel to store some other goodie.

As far as I can see, what I have said is right, but before I continue I just want to open what has been said to any mathemagicians who might be lurking, ready with mathemagical spells that would undo all my intuitions on this matter. I have not extensively tested this hypothesis and quite frankly getting this far finds me several fishes short of a bicycle, so criticism about the truth of the intuitions are appreciated.

If the mathemagicians have all been purged (“Huzzah” to quote Hiro), I would just like to end by considering some of the shortfalls that strike me about texture space and expand on how base frames as I have suggested above might be implemented over complex models.

Texture space tangent mapping seems to face several problems:

  1. one often submitted method for generating the texture space matrix is to calculate T, use N from the model and cross(T, N) to get B. The problem here being that N is most often exported from the modeling program as a smoothed vertex normal not an unsmoothed face normal, therefore it mostly will not be perpendicular to T therefore the matrix will not be orthonormal, and without a pet mathemagician I do not understand how out a non-orthonormal frame would throw vector during rotation

  2. producing an orthonormal frame using the face normal and calculating both T and B is also mostly problematic, because any stretch introduced to the texture during the unwrapping of the model equates to non-perpendicular T and B vectors, and many parts of complex models suffer thus

  3. another problem, which I suspect explains the difficulty getting texture spaced models to behave coherently around seams, is that different models will have different texture spaces. Thus if you have a model of a head that you want to place on the model of a shoulder, then even if the normals in the maps cohere at the pixels, I do not know if they should be expected to return the same lightDir values if the texture spaces are different.

  4. maybe the seam problem is to do with a mixture of the above, and this. Another problem can be seen by exposing the misnomer that is calling texture space tangent space. A tangent off a 2D circle is a line perpendicular to a normal on the circumference of the circle. A tangent off a 3D sphere is a plane perpendicular to a normal on the circumference of the sphere. The normals being talked about here are smoothed vertex normals. Textures mostly do not lie on tangents to models for exactly the same reason as there is a difference between smoothed and unsmoothed normals. Now, if the lightDir is considered a vector in texture space, and if texture space from one model to the next differs, including what is taken to be the tangent and therefore the light vector relative to the pixel, then I’ll leave the rest for Socrates.

This brings me to the final section; how could the above be exploited on complex models. I haven’t tried it, but intuitively I should think every vertex could have an orthonormal base frame constructed about it. Take the smoothed normal as the zbase. Cross product the zbase with world y to get the xbase. Cross product zbase with xbase to get the ybase. If zbase == world y, then create ybase first by crossing zbase and world x. This would give a true orthonormal frame universal to all models.

Author Stephen Jones

An edit to the above. Where it says:

As a quick aside it is important to understand the difference between object and world space. While it is said that models are defined in object space, they are not really, they are defined in world space only they have not been moved… object space is world space that has had no movement to it. As soon as the object is moved a transformation matrix called the ModelView matrix is applied to it, the object now consists of its object space coordinates and a transformation to those coordinates that changes the location/orientation of the model in world space. If you reversed the transformation the model would return to its object space coordinates.

It should read:

As a quick aside it is important to understand the difference between object and world space. While it is said that models are defined in object space, they are not really, they are defined in world space only they have not been moved… object space is world space that has had no movement to it. As soon as the object is moved a transformation matrix is applied to it (glTranslate/Rotate, etc do this), the object now consists of its object space coordinates and a transformation to those coordinates that changes the location/orientation of the model in world space. If you reversed the transformation the model would return to its object space coordinates.

Your usage of the term space is maybe a bit different from the way it is used in Computer Graphics. If you think of a (geometric) space as a set of points like Euklid, then object and world (and clip and tangent and …) space are indeed the same, as every point in one of these spaces also exists in all the others.

The different “spaces” used in OpenGL are defined by their basis, and the basis defines the coordinates of every point. The same point will usually have different coordinates relative to the world basis and the object basis, and you can transform from one coordinate system to another by a matrix multiplication.

As long as you’re only using linear operations you can calculate in any coordinate system, but the issue becomes tricky when you use dot products to compute lengths or angles between vectors, because these are not independent of the chosen basis.

The rule is that dot products are identical in two coordinate systems iff the transformation matrix is orthogonal. So if your TBN matrix is not orthogonal then calculating lighting in tangent space coordinates and world space coordinates will not be the same.