This thread sort of forked into a shadow mapping discussion which got me thinking about shadow mapping in conjunction with fragment programs. Interestingly enough the spec states the following:
Interactions with ARB_shadow
The texture comparison introduced by ARB_shadow can be expressed in terms of a fragment program, and in fact use the same internal resources on some implementations. Therefore, if fragment program mode is enabled, the GL behaves as if TEXTURE_COMPARE_MODE_ARB is NONE.
Which seems really weird. This means there is no way to use dedicated filter-after-compare hw a.k.a “you get free PCF shadow maps for the cost of a texture lookup”.
It obviously makes life easier for ATI’s driver engineers (since the radeon doesn’t have dedicated PCF hw AFAIK) but not for anyone else. You can’t get the nice cheap pcf on nvidia hw (which supports it) for example. The sane thing would be if this was remedied in the spec and a tex lookup on a shadow texture resulted in multiple instrucitons on hw that doesn’t have dedicated pcf functionality.
Of course, you might want the actual depth of the depth texture in addtiion to the PCF value sometimes. To do blurring of the shadow with occluder receiver distance for example like Angus pointed out in that other thread. So the best thing would be if there were two texture lookup instructions, shadow lookup and regular. This is how it works in glslang, so it seems weird fragment programs are broken.
Anyway, since ATI hw doesn’t have the fancy specialised PCF functionality I got to thinking about how to do fast PCF anyway and got a pretty good idea. If you use a four channel depth map you can store the depth of 4 shadow map texels in one so to speak. So the first channel holds the depth at the current texel, the next at u+1texel, u-1texel, v+1texel etc. I don’t know if it will work at polygon edges. I think it will if you just render one depth channel and then copy it to all the other channels with the required offsets. The edge cases might be tricky though. This eats bandwidth like hell of course, primarily in the shadow map generation phase but you have to fetch four times the number of bits in the shadow map when accessing it as well.
However, to filter you just need one texture lookup and DP4 with a weight vector to get your PCF value. Pretty neat. Has anyone done shadow maps on the Radeon? What did you do? Jason and Evan, didn’t one of you do the chimp demo, that had shadows didn’t it?
And I really think the fragment program spec should be updated to work in the sane way with shadow maps, the way it is now seems weird.