F-buffer of ATI's card (excerpt from an interview)

http://www.3dcenter.org/artikel/2003/11-06_english.php

3DCenter: The Radeon 9800 series introduced a new feature, the F-Buffer. Is this rather a stop-gap solution, or will it be ATI’s efficient way to handle the challenge of long, dynamic shaders in the future?

Eric Demers: The F-Buffer concept is method to generalize multi-pass and actually make it useful. And you need multi-pass solutions to deal with very long shaders and other challenges such as larger temp storage or larger number of iterators, etc. However, it’s not the panacea that just allows for limitless shading either. The way we are planning on exposing it (through an OpenGL extension) allows the savvy developer to use them. We are working on improving its ease of use for future products. As well, there are other solutions we are looking at that make it easy for developers to write, for example, longer shaders. I’m sure that future products will have those features as well.

f-buffers is really needed on ATI R3x0 chips to implement GLSlang - in on of the Ashli GLSlang shaders unpacked to 5 ARB_fragment programs.
In NV3x - its only one fp program.

There’s one that requires seven. . .

Has anyone, other than ATI, benchmarked Ashli on NVidia and ATI cards?

2Ostsol
>>Has anyone, other than ATI, benchmarked Ashli on NVidia and ATI cards?
It’s a bit comlecated - NV cards doen’t support this temporary buffer(F-Buffer?) - will run only demos with single path.

Another thing - generated code is optimized for ATI hardware(significully differend from ARB_fp, genereted by cg/optimal for NV hardware).

On NV hardware only NV_fp is really fast - Ashli doesn’t support it.(Surprise :slight_smile: ).

Given the GeforceFX’s capability to have 1024 instruction fragment programs, the F-Buffer shouldn’t be necessary for them. I agree that the compiler is probably optimized for ATI, though, in that it’ll try and order instructions to take advantage of the Radeon’s ability to run certain operations in parallel in a single pass. However, shouldn’t NVidia’s universal compiler be able to offset this by reordering the resulting assembler shader code and reducing register usage (at the cost of more instructions)?

Another thing is that NV_fragment_program provides speed for the GeforceFX only by allowing the card to make use of its multiple precisions. In the case of the NV30, NV31, and NV34 that included the use of fixed point precision, but from what I’ve heard the NV35 is a totally floating point GPU and doesn’t receive much of a performance boost (if any at all) using FX12 instead of FP16. As such, all that’s really left is managing register usage, which can be done with any shader language. In any case, there’s still NVidia’s Universal Compiler, as I mentioned above.