ARB/NV fragment program performance numbers

Hi,

in case anyone is interested, these are the performance results of my program using different fragment programs (and drawing stuff like this: http://de.geocities.com/westphj2003/refl.html but without the water). bump mapping/ppl all over the scene.

Athlon 2000+, GF FX 5700 (non ultra). one color map, one bump map, simple diffuse/specular bump mapping plus ambient lighting.

OpenGL 1.1 lighting (of course without bumps): 166 fps
NV_fragment_program with 16 bit registers: 132 fps
NV with 32 bit/ARB_fragment_program: 86 fps

So it’s quite clear where the ARB performance drawback comes from. I would like to know two things:

  • do you think that the visual quality with 16 bit precision is still sufficient (I think in general it is)
  • what is perfomance like on ati boards (std. OGL vs. ARB_FP)?

Jan

IIRC on ATi boards, the standard OpenGL functions are translated to a fragment program anyway, so it would all depend on how well optimized your custom fragment program itself is.

If youd like to provide a demo so that others can test it on ATi hardware, then we could give some numbers.

Also, its well known that ARB_fp is slower on nVidia hardware. No surprises come from your numbers

I cannot give a demo as the program is quite large and also copyright protected, sorry.

Of course no surprise comes from these numbers but I found it helpful to measure how great the differences actually are, and maybe others will find that too .

At least, a decision on Nvidia vs ATI depends on that (regarding my project).

Jan

Originally posted by DopeFish:
IIRC on ATi boards, the standard OpenGL functions are translated to a fragment program anyway, so it would all depend on how well optimized your custom fragment program itself is.

Only on R3xx, R2xx have both a fixed function pipeline and a programmable pipeline.

Originally posted by Ingenu:
Only on R3xx, R2xx have both a fixed function pipeline and a programmable pipeline.

Well seeing as the thread is in regards to arb_fp speed compared to the fixed function pipeline, and R2xx doesnt have arb_fp support, I would imagine that it was fairly self-explanatory which I was referring to.

OpenGL 1.1 lighting (of course without bumps): 166 fps

Why no bumps? You could use the dot3 operation on the texture environment to do dot3 bumpmapping. That should use the combiners of the GF and so be faster then the nv_frag implementation (if you do normalizing in your fragment programs dont forget it here)

Lars

Originally posted by DopeFish:
Well seeing as the thread is in regards to arb_fp speed compared to the fixed function pipeline, and R2xx doesnt have arb_fp support, I would imagine that it was fairly self-explanatory which I was referring to.

Indeed.
Just wanted to add my 0.02€

[This message has been edited by Ingenu (edited 02-03-2004).]

It’s really funny… On GFFX5600, using 16-bit registers shoved no improvement in performance over 32-bit.

Lars: of course I could, but I wanted to know what happens to the framerate when fragment programs are used, compared to stupid dump lighting / not using them at all.

And what I want to do would require at least two rendering passes when using NV_register_combniers and even more when using ARB_texture_env_combine, and so slow down a lot more than with NV_fragment_program in 16 bit mode.

Also, there’s still the possibility of choosing ATI cards instead of NVidia, at least, the program would run on both. And I am hoping that sometimesm NVidia will have solved this problem…

Jan

Do you see any performance gain using
ARB_precision_hint_fastest with arbfp1? What
driver are you using?

I will try that… I am using the actual linux driver.

Jan

OPTION ARB_precision_hint_fastest doesn’t seem to change anything, still about the same numbers ~80-90 fps).

So I think it is generally recommendable to use NV_fragment_program with 16 bit registers instead of ARB_fragment_program on NV hardware.

Jan