ARB_shadow & fragment programs

Hi.

I was wondering - It seems nvidias hardware allows you to enable ARB_shadow, while using fragment programs.

Is this how it is supposed to be? I mean, is it stated somewhere in the spec, that they must do the z comparison, whenever we do a lookup on texunits with it enabled?

Using ARB_shadow and ARB_fragment_program together would work just fine in a world without ATI.

There’s a paragraph in the fragment program spec stating that the R compare mode is ignored within fragment programs if ARB_shadow is enabled (making ARB_shadow worthless). That paragraph was accidentally left in when the spec was approved by the ARB. It’s a carry-over from ATI’s crappy GL_ATI_fragment_shader extension and was put there because ATI hardware still can’t do the correct filtering for shadow maps (in which an R-compare should be done per sample for bilinear filtering).

No one is happy about that restriction being there, especially Nvidia, which did not intend to let it slip through, and as you’ve noticed, doesn’t recognize it. To remedy the situation, the ARB recently approved the GL_ARB_fragment_program_shadow extension that defines new texture targets (e.g., SHADOW2D) to be used in fragment programs to get the correct filtering.

It’s uncertain whether ATI will support this extension on current hardware, since it would require them to add many instructions to the fragment program when it’s compiled. One TEX instruction with a SHADOW2D target would turn into four TEX instructions plus comparison instructions for each sample and texcoord offset arithmetic. Boo to ATI for this mess.

[This message has been edited by Eric Lengyel (edited 01-29-2004).]

g ARB_shadow and ARB_fragment_program together would work just fine in a world without ATI.

I would point out that, in a world without ATi, we wouldn’t even have ARB_fragment_program; we’d be using NV_fragment_program.

It’s uncertain whether ATI will support this extension on current hardware, since it would require them to add many instructions to the fragment program when it’s compiled.

Actually, it’s a bit worse than that. The filtering is defined by the glTexParam value for texture filtering. Only instead of filtering the depth values before comparing, you have to do it afterwards (by the ARB_shadow spec). Because of that, ATi would have to do render-time compiling of the shader, since they don’t know the filtering value for the texture until then.

I wonder how ATi cards handle the equivalent functionality in glslang?

Originally posted by Korval:
[b] Actually, it’s a bit worse than that. The filtering is defined by the glTexParam value for texture filtering. Only instead of filtering the depth values before comparing, you have to do it afterwards (by the ARB_shadow spec). Because of that, ATi would have to do render-time compiling of the shader, since they don’t know the filtering value for the texture until then.

I wonder how ATi cards handle the equivalent functionality in glslang?[/b]

The full vanilla OpenGL in current programmable cards is “render time compilation” of shaders. You cache your state, generate the shader and off you go. The next time you need that shader you’ll get it from the cache.

Okay, it was as I feared.

What about glslang, is the problem removed there?

The full vanilla OpenGL in current programmable cards is “render time compilation” of shaders. You cache your state, generate the shader and off you go.

Not true of nVidia’s cards. They still retain separate fixed-function hardware.

And, in any case, that’s not a good idea when you’re building shaders yourself. If I ask for a shader that is close to the resource cap, and a render-time parameter pushes this over the resource cap (say, adds too many instructions), what is the pipeline going to do?

What about glslang, is the problem removed there?

Yes. There is an explicit command for doing shadow compare texturing.

Originally posted by Korval:
[b]Not true of nVidia’s cards. They still retain separate fixed-function hardware.

And, in any case, that’s not a good idea when you’re building shaders yourself. If I ask for a shader that is close to the resource cap, and a render-time parameter pushes this over the resource cap (say, adds too many instructions), what is the pipeline going to do?

[/b]

I’m not sure at all what I am talking about, but evanGLizer’s statement makes a lot of sense… and about the shader cap limits, it’s easy for the driver to set a cap limit lower than the hardware’s one, so the hardware reserves a space for their own generated resources… at least to me, it makes a lot of more sense to have only 1 programmable pipeline than having to switch between a fixed-function and programmable one, according to what the user made… unless it’d allow using both at the same time (which would incurr on a performance loss, due to both fixed and programmable stages having to be processed for every fragment on the pipeline).

Once again, I’m only wildguessing here… don’t take me too serious

[This message has been edited by Jcl (edited 01-30-2004).]

Originally posted by Eric Lengyel:
It’s a carry-over from ATI’s crappy GL_ATI_fragment_shader extension and was put there because ATI hardware still can’t do the correct filtering for shadow maps (in which an R-compare should be done per sample for
bilinear filtering).

So is there any way to simulate this kind of filter functionality to simulate shadow comparison in fragment programs on ATI hardware?

Sure, just do the filtering manually. Do four texture lookups in the shadow map and use SLT or whatever comparison function you want to compare with the R coord and then do a MAD with the filter weights to get a final PCF value. It’s extremely expensive compared to using the dedicated PCF hardware in nvidia chips, but it works.

There is a new extension that should be showing up in upcoming drivers for doing shadowing in ARBfp. It’s a little bit of extra code in the fragment program to specify that you’re using the option and that the target is a shadow map, but otherwise it’s straightforward.

See http://www.opengl.org/about/arb/notes/meeting_note_2003-12-09.html#arb_fps

Thanks -
Cass