multiple levels of dependent texturing

Is it possible to do more than one level of dependent texturing? One level requires two passes when using the ATI_fragment_shader extension. Is it possible to use more than two passes with this extension. Does the ARB_fragment_program extension allow for more than two passes? If so, which graphics chips support multiple (>2) passes? Lastly, does NV_texture_shader allow for multiple levels of dependent texturing?

ARB_fragment_program allows for more than two dependant reads, but it starts getting slow on the Radeon 9700 after the fourth.

> does NV_texture_shader allow for multiple levels of dependent texturing?

Yes, NV_texture_shader can handle a texture lookup with three more texture lookups that depend on the result of the previous lookups. See the GL_DEPENDENT_AR_TEXTURE_2D_NV operation and similar operations.

NVIDIA’s CineFX architecture (supported by the GeForce FX and Quadro FX lines) allows essentially unlimited dependent texturing (limited only by the length of your fragment program – a limit which is many hundreds of instructions long).

You can use either the ARB_fragment_program extension or the more functional NV_fragment_program functionality to get at all the CineFX functionality.

Unfortunately, ATI’s ARB_fragment_program extension has a lot of restrictions about the number of various types of instructions and the number of “indirections” allowed. You’ll find the CineFX architecture isn’t bridled by such restrictions.

Using the Cg arbfp1 and fp30 fragment profiles makes it very easy to generate code for these extensions using a high-level language.

Dependent textures are expensive and prone to aliasing if you due huge amounts of dependent lookups so be careful.

  • Mark

Originally posted by Mark Kilgard:

Unfortunately, ATI’s ARB_fragment_program extension has a lot of restrictions about the number of various types of instructions and the number of “indirections” allowed. You’ll find the CineFX architecture isn’t bridled by such restrictions.

Mark, correct me if I’m wrong, but issue #14 in the ARB_fragment_program spec states that the number of texture indirections the hardware must support is at least 4. Why is this a problem? If the hardware can support more then I don’t see how using ARB_fragment_program would limit a program on CineFX hardware. And since when is this an “ATI” extension?

Originally posted by fenris:
since when is this an “ATI” extension?

I believe he meant ATI’s implementation of ARB_fragment_program.

The ARB extensions are, by necessity, lowest-common-denominator. Fragment programs don’t have much in the way of predication. Vertex programs don’t have much in the way of flow control and subroutine calls.

I believe Mark is referring to the fact that those, higher-level capabilities, are available in the NV extensions.

I believe nVIDIA has decided to make the CineFX be “it” for a long time to come – i e, they may tweak it a little bit, but I think it doesn’t need all that much more in semantic extensions. Instead, I think nVIDIA is on the path to making future generations run more of the shaders, faster, for longer. Can’t say why I believe this, though – it’s just a feeling I get from the completeness of the nv30 feature set.

Meanwhile, I think ATI has bet on delivering the heavy punch early with the Radeon 9700, and then rev in the future at the semantic level, perhaps when performance catches up such that really long, complex shaders are useful for the majority of a scene.

Both approaches make sense, AFAIC.

Besides, would you REALLY want to do that much texture sampling anyway?

The FPS must drop to a dead crawl if you do more than 6-8 indirections I would think…

I just wanted to add .02 as a clarification for anyone that isn’t terribly familiar with all the different specifications.

First, ARB_fragment_program is not limited in number of indirections/instructions/etc, except per an implementation specific limits. An implementation is free to be limitless.

Secondly, as mentioned above ARB_fp is not an ATI extension. It is presently supported on capable NVIDIA hw. ATI, NVIDIA, and Intel all participated heavily in the working group that produced the ARB_fp spec.

Finally, ARB extensions are not really lowest common denominator. In many cases, they do lack some features that vendor extensions have, but often they also gain generality or orthogonality in standardization. ARB_fp lacks things both that ATI’s latest HW can do and other things that NVIDIA’s latest HW can do.

-Evan

Evan,

It’s great to hear about added capability in the R300 family. It’s not surprising that they are more capable than ARB_{vertex,fragment}_program, because DirectX 9 shaders actually require a little bit more.

The question on everyones mind is when this extra functionality will be exposed in ARB extensions (or possibly ATI enhancements to the existing ARB extensions). Is this actively being collaborated on between the various hardware vendors?

jwatte…

i THINK this is all in discussion with the überbuffer specs. for example the rendering to multiple targets at the same time, and such thing, ya know…

can’t wait for the überbuffers

Originally posted by davepermen:
can’t wait for the überbuffers

Ditto! Those would solve A LOT of problems I’ve been having.

And Render To Vertex Array… :P~~~~~ YUM!