Cascaded Shadow Mapping: Projection distortions

Hello there. I’ve been trying to switch over from standard shadow mapping to cascaded shadow mapping, and although I’ve made pretty good progress, I’ve reached a bit of a snag.
I’ve been scavenging the internet for over a week now for resources on cascaded shadow maps, so my current implementation is kind of jury rigged together, but it at the very least produces shadows, so I’m part way there.

My issue:
My issue is a math issue I think. I can’t quite get the projection right for rendering into a given shadow split / cascade level.
My camera’s rotation doesn’t properly shift where the light’s CSM’s should be focused on, and further, when the camera does rotate the entire shadow map tends to squish towards the center.
See i.imgur.com/JRajdIb.gif

What does happen:
I’m fairly certain that my calculation for getting the split distances is correct. I’ve debugged it more than several times and triple checked the math myself by doing it by hand and getting the same expected results.
Also, if I move an object between the split regions, the shadow does decay in quality, as expected. I can see this firsthand by colorizing the regions that lay within any given cascade of the shadow map.
My scene is simple. It contains only a few basic objects and a single directional light to cast shadows, so there is nothing else to really interfere with my results.
Lastly, I do have my shadow textures set up in a texture array, and sampling from any given one works correctly.

When I tell my directional light to create a shadow, I perform the following (worry not about inefficiencies at this point):

float lambda = 0.5;    // Lambda value for split distance calc.
float n = 1.0f;        // Near plane
float f = 100000.0f;   // Far plane
float m = 6;           // 6 split intervals
float Ci[7];           // Split distances stored here
Ci[0] = n;             // Base split = near plane

// 6 levels of shadows
for (int x = 0; x < 6; x++)
{
    // Calculate the split distance
    float cuni = n+((f-n)*((x+1)/m));
    float clog = n*powf(f/n, (x+1)/m);
    float c = lambda*cuni + (1-lambda)*clog;
    Ci[x+1] = c;

    QMatrix4x4 cameraModelMatrix = camera->getModelMatrix();

    float frustumHeight = 2.0 * Ci[x+1] * tanf((90.0f * 0.5 * M_PI)/180);
    float frustumWidth = frustumHeight * (camerasize.width()/camerasize.height());

    // Corners of the frustum
    QVector3D corners[8];
    corners[0] = QVector3D(-(frustumWidth/2), -(frustumHeight/2), Ci[0]), corners[1] = QVector3D((frustumWidth/2), -(frustumHeight/2), Ci[0]), corners[2] = QVector3D((frustumWidth/2), (frustumHeight/2), Ci[0]), corners[3] = QVector3D(-(frustumWidth/2), (frustumHeight/2), Ci[0]),
    corners[4] = QVector3D(-(frustumWidth/2), -(frustumHeight/2), Ci[x+1]), corners[5] = QVector3D((frustumWidth/2), -(frustumHeight/2), Ci[x+1]), corners[6] = QVector3D((frustumWidth/2), (frustumHeight/2), Ci[x+1]), corners[7] = QVector3D(-(frustumWidth/2), (frustumHeight/2), Ci[x+1]);

    // Transform corner vectors by the camera's view/model matrix
    for (int z = 0; z < 8; z++)
        corners[z] = cameraModelMatrix*corners[z];

    // Calculate bounding box
    QVector3D min(INFINITY,INFINITY,INFINITY), max(-INFINITY,-INFINITY,-INFINITY);
    for (int z = 0; z < 8; z++)
    {
        if (min.x() > corners[z].x())
            min.setX( corners[z].x() );
        if (min.y() > corners[z].y())
            min.setY( corners[z].y() );
        if (min.z() > corners[z].z())
            min.setZ( corners[z].z() );
        if (max.x() < corners[z].x())
            max.setX( corners[z].x() );
        if (max.y() < corners[z].y())
            max.setY( corners[z].y() );
        if (max.z() < corners[z].z())
            max.setZ( corners[z].z() );
    }

    // Create Crop Matrix
    float scaleX, scaleY, scaleZ;
    float offsetX, offsetY, offsetZ;
    scaleX = 2.0f / (max.x() - min.x());
    scaleY = 2.0f / (max.y() - min.y());
    offsetX = -0.5f * (max.x() + min.x()) * scaleX;
    offsetY = -0.5f * (max.y() + min.y()) * scaleY;
    scaleZ = 1.0f / (max.z() - min.z());
    offsetZ = -min.z() * scaleZ;

    QMatrix4x4 crop( scaleX,  0.0f,    0.0f,    offsetX,
                       0.0f,    scaleY,  0.0f,    offsetY,
                       0.0f,    0.0f,    scaleZ,  offsetZ,
                       0.0f,    0.0f,    0.0f,    1.0f);

    QMatrix4x4 projection = QMatrix4x4::ortho(-1, 1, -1, 1, -1, 1);

    crop = projection * crop;

    /*...SEND VALUES TO SHADER...*/

    /*..RENDER SCENE INTO THIS SHADOW TEXTURE...*/
}

In my fragment shader, to determine my light space position of a fragment, I do:

vec4 LightSpacePos = LightCrop[i] * LightViewMatrix * WorldPosition;

And the rest really isn’t needed, as the shadows work, and so does displaying the regions which fall under each texture.

Although I understand the underlying basics of how this should work, the specific implementation of it is what I’m a little bit stumped on. I’ve seen the NVidia slides on it, the GPU Gems article, as well as over a dozen other websites.

Can anyone explain to me how I should be correctly determining the minimum and maximum values required to properly create the crop matrix?

I will gladly provide any further information if anyone wants to help me.

I think you might be making this harder than it is. You build the light’s viewing and projection transforms just like you build the eye’s viewing and projection transforms. Same routines! The only difference comes in if your light source is a directional light (infinite distance) instead of point light source. In that case, use a glOrtho-like routine (orthographic) to build the projection transform instead of a glFrustum-like routine (perspective). Then just multiply the transforms together in the right order.

See the diagram here for details. I would clearly understand if not memorize this diagram. It’ll save you a ton of headache.

With that, your question translates into “what bounds do I choose for the lights’ view frustum (aka your shadow frustum)?” How you choose these bounds depends on which Cascaded Shadow Mapping variant you’re implementing, but for starters you want to choose these bounds to sample all of the shadow casters that could potentially cast shadows into that view frustum (or subfrustum).

Past that you get into interesting questions such as how do you tweak this to minimize shadow edge crawling, or more tightly bound the potential casters (or receivers), etc.

I don’t quite follow. To which parts are you referring when you say I’m overthinking my approach? I’ve been following several approaches, all pretty much the same, for instance there is one posted within the nVidia Developer Zone’s GPU Gems 3 (specifially Parallel Split Shadow Maps), as well as a PDF available from nVidia on Cascaded shadow maps (written by Rouslan Dimitrov). I’d link these but I’m prevented from doing so, as I’m a new user.

Despite that, all of these describe the method of creating a Crop Matrix, however they don’t delve into a lot of information about how the bounding box / min-max values are calculated before hand, hence my attempt at calculating the frustum size at the given split distance, multiplied by the camera’s model matrix.

Should I just be using the camera’s frustum width/height at the given split distance ± divided by 2 as the minimum and maximum values for the crop matrix?
All this math is just so confusing to me. However I do know what I must do on the shader side, its just a matter of getting it right on the shadow production side.

The light’s view frustum needs to be large enough to contain every shadowed point which lies within the camera’s view frustum. Ideally, it should not be significantly larger (otherwise you’re wasting pixels which could be used to provide better resolution within the visible region).

In the typical case, you can’t easily tell which parts of the scene could be shadowed, so you have to assume that any part could be. So the above constraint effectively requires the light’s view frustum to encompass the portion of the scene which lies within the camera’s view frustum (i.e. the visible portion).

If you have some kind of broad-phase visibility structure (e.g. hierarchical bounding boxes or whatever), then you can determine which regions intersect the camera’s view frustum to get a loose bound on the visible portion of the scene. Otherwise, you can just clip the camera’s view frustum to the scene’s bounding box (so that you don’t end up calculating shadows for the sky). If the far distance is small enough, you don’t even need to do that; you can just use the camera’s entire view frustum as the visible region.

Either way, you end up with some finite visible region in world space. You transform this to the light’s “eye space” then use its bounds to calculate the light’s view frustum so that the shadow map covers everything which can be seen. For a parallel light source (i.e. an orthographic projection), the minimal view frustum is just the visible region’s bounding box in the light’s eye space. For a point light source (perspective projection), the bounds are determined by the x/z and y/z slopes.

Cascaded shadow mapping just slices the camera’s view frustum into multiple distance ranges so that you can have a high-resolution shadow map covering the relatively small area close to the camera with the resolution getting coarser as you get farther from the camera (and thus need to cover a larger area). Each slice of the camera’s view frustum is processed as with the entire frustum when using a single (non-cascaded) shadow map.

[QUOTE=Yattabyte;1279728]To which parts are you referring when you say I’m overthinking my approach? … Should I just be using the camera’s frustum width/height at the given split distance ± divided by 2 as the minimum and maximum values for the crop matrix?
All this math is just so confusing to me. [/QUOTE]

Hi Yattabyte. What I was referring to is that you are focusing on the cropMatrix, but it sounds like you haven’t figured out conceptually how the math works for the entire transform chain. It’ll be frustrating to you to try to figure out the cropMatrix in isolation if you treat the rest like a black box.

Not all shadow tutorials have the cropMatrix either. Some of them use a shadowMatrix, but they sometimes don’t have a lot of detail on the reasoning behind the math. The point is though, if you understand the concept, you can see how it maps to what they’re doing, and then extend that to your own purposes.

Conceptually you have two frustums in your scene: the one for the camera and the one for the light. You know how to compute viewing transforms for frustums: use a gluLookAt-like route. You also know how to set up projection transforms for frustums: glFrustum- and glOrtho-like routines.

So now what do we need to do? We start with something like camera eye-space positions in the shader. We need to get that position from camera eye-space into light’s NDC space. How? camera eye-space -> world-space -> light eye-space -> light clip-space. Or in other words pos * CAMERA_VIEWING * LIGHT_INVERSE_VIEWING * LIGHT_PROJECTION -> perspective divide. That’s pretty much it. Except there’s one final piece. We don’t really want light’s NDC space (X,Y,Z in -1…1) we want something with X,Y,Z in 0…1, because the X,Y will be used as texture coordinates to look up into our shadow map, and Z will be compared against a 0…1 depth value pulled out of the depth texture. Do do that, we basically just bolt on a scale_and_translate transform (scale X,Y,Z by (0.5,0.5,0.5) and then offset by (0.5,0.5,0.5). This is sometimes called the “bias matrix” (whatever). As it turns out, we can push this before the perspective divide so for the total transform chain we’ve got:

pos * CAMERA_VIEWING * LIGHT_INVERSE_VIEWING * LIGHT_PROJECTION * BIAS_MATRIX -> perspective divide

Once you understand this conceptually, then this pretty much just boils down to what frustums do you want to define for your camera and for your light, and conceptually that’s a much simpler problem than how do I tweak this strange matrix in a transform chain I don’t understand.

Thank you for your help so far guys.

My issue right now is that I’m not entirely certain on how my results should look. The regions containing each cascade should rotate with the camera, correct? Yet for some reason they don’t quite behave 100% correctly.
When the camera/eye looks up or down, the shadows within the texture skew dramatically, often throwing them off-screen:
webm.host/d93a6/vid.webm

I’ve simplified my code a bit


QMatrix4x4 LightProjection = QMatrix4x4::ortho(0,1,0,1,0,1);
for (int x = 0; x < 6; x++)
{
    /*...CALC. SPLIT DISTANCE...*/
  
    float frustumHeight = 2.0 * Ci[x+1] * tanf((90.0f * 0.5 * M_PI)/180);
    float frustumWidth = frustumHeight * (camerasize.width()/camerasize.height());
    
    QVector3D min =  QVector3D(-(frustumWidth/2), -(frustumHeight/2), Ci[0]), max =  QVector3D((frustumWidth/2), (frustumHeight/2), Ci[x+1]);
    
    Crop = getCropMatrix(min,max);
    
    QMatrix4x4 final = Crop * LightProjection * LightView * CameraView;

    /*...BIND UNIFORMS, RENDER...*/    
} 

In the above code, I make my min-max values the frustum size at the given distance, and throw those into the crop matrix. Secondly, I multiply that matrix by an orthographic projection matrix, the lights view matrix, then lastly the camera’s view matrix. The resulting matrix is titled “final”.
I perform the same order of operations for rendering into the shadow map (with the addition of multiplying it by each object’s own model matrix), and as well I use it for transforming each fragment’s world position into the lights perspective for performing the standard depth comparison.
“getCropMatrix” works identically as the code chunk used in my opening post worked for calculating the crop.

I never put in a bias matrix, yet, because I just do it in the shader for now (once again, I’m trying to get this working first, then optimize later).

However, I can simply create a temporary view matrix for the camera which doesn’t use the camera’s rotation. In that case, the shadow cascades are just orientated in the direction of the light.

[QUOTE=Yattabyte;1279751]
Secondly, I multiply that matrix by an orthographic projection matrix, the lights view matrix, then lastly the camera’s view matrix.[/QUOTE]
I can’t see how multiplying by both the light’s matrix and the camera’s matrix makes any sense. Unless you’re referring to the inverse of the camera’s matrix.

Have you got non-cascaded shadow mapping working? Do you understand how it works?

The light’s overall transformation (model-view, projection, crop, bias) needs to be such that any point in world space which ends up (after all camera-related transformations) inside the viewport must also end up (after all light-related transformations) inside the texture. Beyond that, a tight bound will give better shadow resolution than a loose bound. But if the constraint isn’t met, you’ll end up sampling outside the depth texture when determining whether a point is in shadow.

[QUOTE=GClements;1279754]I can’t see how multiplying by both the light’s matrix and the camera’s matrix makes any sense. Unless you’re referring to the inverse of the camera’s matrix.

Have you got non-cascaded shadow mapping working? Do you understand how it works?
[/QUOTE]

Yes I have. I’m trying to switch over to cascaded shadow maps. I know that in order to perform the depth comparison one must transform the fragment from world position into the lights perspective. That logic all makes perfect sense to me, even if I’m describing it poorly. I’ve implemented it for directional lighting, point lighting, and spot lighting.
For cascaded shadow maps, I’m trying to implement it just for directional lighting for starters.
The whole point and process of having to provide more detailed textures closer to the camera is what is tripping me up. And I’ve tried the inverse light view but I haven’t noticed any difference.