Fragment Programs

I implemented a bump-mapping application in
NVIDIA GeForceFX 5200 using multitexturing and ARB_texture_env_dot3.Then,I did the EXACT same thing using a fragment program and the frame rate suddenly dropped to 40%.Are fragment programs in fact slower than texture or register combiners?Why is that?I thought that texture or register combiners are just pre-translated fragment programs.

Apparently, they are not. Performance of GeForceFX’s RC is good. In contrast to everything else, unfortunately.

Texture and register combiners only need to work in fixed point precision (9 or 10 bits maybe). For the same thing in fragment programs it’s enough to use fixed point or half float precision.

Originally posted by Relic:
Texture and register combiners only need to work in fixed point precision (9 or 10 bits maybe). For the same thing in fragment programs it’s enough to use fixed point or half float precision.

Apparently,you are talking about NV_FRAGMENT_PROGRAM,not ARB.
Is there an option for fixed point?I thought there are only floating point registers.
Using half float precision(registers H0,H1…) does speed up a little but it’s still much slower than combiners.

[This message has been edited by mikeman (edited 01-27-2004).]

Due to a strange design decision fragment programs and fixed function (or NV_register_combiners) pipeline setups are supported by separate execution units on the Geforce FX series. Exact details differ for every chip in the line, but especially for the older models (5200, 5600, 5800), there’s much more integer processing power than there is floating point power, and even for the latter, there are two different precisions with different performance characteristics.

Try adding this line to your shader code:
OPTION ARB_precision_hint_fastest;

It might enable fixed point processing on Geforce FX hardware, or at least reduce the shader to 16 bit floating point precision which is a bit faster.

This is still portable ARB_fragment_program code. Eg ATI hardware will simply ignore this hint.

Originally posted by zeckensack:
Try adding this line to your shader code:
OPTION ARB_precision_hint_fastest;
It might enable fixed point processing on Geforce FX hardware, or at least reduce the shader to 16 bit floating point precision which is a bit faster.

16 bit precision is just a “bit” faster,combiners are a LOT faster.
After all,you can’t really rely on HINT options for performance,they are just suggestions,not commands.I guess I just have to use fragment programs only for state-of-the-art effects combiners can’t do.It seems that combiners are hardwired in the chip and not fragment programs.
It really sucks.Fragment programs are the most powerful feature in 3D graphics and we have to use old extensions for performance reasons?It seems to me that there is a lot of power inside the chip that fragment programs just can’t use.Can’t NVidia fix this with new drivers?

Nvidia can’t fix it in the new drivers because it’s the way their hardware work.
However, FX5200 is a joke and not a DX9 card.
If you write fargment program using fixed-point precision, the performance will be the same as with RC(FX5600).
Floating point operations are generelly slow with FX line. They have a different design.

Originally posted by Zengar:
If you write fargment program using fixed-point precision, the performance will be the same as with RC(FX5600).

How do I use fixed-point precision?In NV_FRAGMENT_PROGRAM there are R0,R1… and H0,H1… registers which are floating point.
(32 bits and 16 bits).Can you write an example program that uses fixed-point?
If you are talking about instructions like MULX,i’ve tried that and i did’n see much of a difference.

[This message has been edited by mikeman (edited 01-27-2004).]

I had a similar situation:

Coded a ~10 pass bump mapping demo using Texture Environments

Coded the same bump mapping demo with only one pass using ARB_fp

The first version was about two times faster on a FX5200.

It might be easier to just treat nvidias low end cards like they are from the previous generation (i.e. DX8 level) and just use regular combiners and texture shaders. Less of a headache and actually recommended by nvidia.

The newer models like the 5700 and the 5900 have better floating point performance AFAIK.

‘X’-suffix means that the operation is performed with 12-bit fixed-point precision. Conversion into format of register seems to be free.
I don’t know, on my GFFX5600 non Ultra fixedpoint is about 1.5 - 3 times faster as floating point, and it’s as fast as RC, if you compare the framerate. I guess, the driver may be programming RC automatically. FX5200 has only 2 pipelines??(not shure) and it is very very slow. Fragment programs with FX5200 are just to claim DX9 support. DX9 card for $60, huh

Well, at least fragment programs work flawlessly on a GFFX 5200, which makes it a very suitable (and cheap) solution for private software developers not willing to spend too much money, and it’s still much faster than software rendering.
It’s not a high performance card, but most games do work, so I think that there is nothing to complain about (at least not for the price).
The ATI “cheap” line (9000 to 9200) does not support fragment programs, but still claims to be a DX9 card - I’d like that functionality very much on my notebook (ATI 9000), since I have to test fragment programs on my desktop or use MesaGL (which is slow, but at least functional).

I think NVIDIA does very well on the low end for discrete graphics. The FX 5200 is SOO much better for developers than the old GeForce 4 MX card. Would you rather the 5200 didn’t support ARB_fragment_program? Didn’t think so. Write for 20 fps in 640x480 on a FX 5200, and people with faster cards get more frame rate and higher resolution (and AA). Sound fine to me.

Meanwhile, the ATI low end still doesn’t even do hardware transform (all those IGP chips) and/or are still at DX8.1 level (Radeon 9000/9200).

I know this might sound crazy, but consider this strategy for FX5200:

You could render to 320x240 buffer, and then stretch it linearly filtered on 640x480 screen, (or 512x384->1024x768, or 400x300->800x600). Then draw HUD data on top of it, at the full screen resolution, so that it remained readable.

If your rendering path could be handled by FX5900 @ 1024x768xReasonable_Hz, then certainly it could be handled by FX5200 at one of the tiny resolution @ 30+ Hz. However both cards would show the same richness of effects: HDR, soft shadows, high quality specular, etc.

User would get clear message: for better quality, buy more expensive card. If you instead just reduced number of effects, used dumbed down rendering path, he wouldn’t even realize what was he missing.

Did you know that…
In Doom 1, 320x200 was the high resolution mode, ‘normal’ mode was 160x200.

Originally posted by MZ:
[b]Did you know that…
In Doom 1, 320x200 was the high resolution mode, ‘normal’ mode was 160x200.

[/b]

No i didnt know that I thought normal mode was 320x200 not 160x200

[This message has been edited by GT5 (edited 01-27-2004).]

Originally posted by zeckensack:

Try adding this line to your shader code:
OPTION ARB_precision_hint_fastest;

It might enable fixed point processing on Geforce FX hardware, or at least reduce the shader to 16 bit floating point precision which is a bit faster.

This is still portable ARB_fragment_program code. Eg ATI hardware will simply ignore this hint. [/b]

Even better, NVIDIA could introduce proprietary NV_* OPTION arguments for switching to low precision and fixed point hardware paths. Ideally they should obsolete NV_fragment_program and allow developers to have unified code path and better use of fp on their quirky hw…

Are there any reasons whay this would not be possible? Anyone from NVIDIA care to comment? O

You usually wouldn’t want to run an entire shader in fixed point or half precision mode. You should choose on a per-instruction basis which precision level is most suitable. Removing this flexibility would not make things any better for shader writers that want to get good results on NV hardware.

As an example, you probably don’t want to use fixed-point for specular lighting calculations, but it might be fine for the rest of your shader. Using a hint, you would either choose to use fixed point and have crappy-looking specular, or to use floating point and have good-looking results but poor performance.

What you can do is use the ARB_fragment_program entry points to load fragment programs with NVidia syntax. This means that the code changes required to support both ARB and NV paths are minimal, and all you need to do is write two shaders. It’s still annoying compared to having a single code path, but you’ll be able to get better results than when relying on per-shader hints.

– Tom

[This message has been edited by Tom Nuydens (edited 01-28-2004).]

and the hint could be bad… just think of new hw… it could run the good version fast enough, but because of the “fastest” hint it’s “forced” to go down in quality because you said so… urgh

Bottom line is that when you use fragment programs and RC to do the same thing,RC is faster,at least in GFX5200.In newer chipsets,the fragment programs are of course faster but the question is: do their performance match RC?
Is it really possible for ARB to implement different precisions in their programs?I mean,precision is hardware-dependent.ARB_fp is not a vendor-specific extension,but NV_fp is.

[This message has been edited by mikeman (edited 01-28-2004).]

[This message has been edited by mikeman (edited 01-28-2004).]