Fragment program perfs

I think this will sound stupid … but I have to ask.

Running a simple fragment program (4 instructions, 2 TEX, 1 ADD & 1 MAD) renders smoothly.
Running a bit longer prog makes drastically frame rate fall (4 MAD are added).

Does this makes sense ? I intend to render all the scene with the second fragment program, at 1600 * 1200. Of course, lowering resolution to 1024*768 makes it all smooth.

What are the common way to improve this ? Render front to back would make it ?

could be vsync wich kicks you from 120 to 60 or 60 to 30fps… dunno… and it is just at the limit…

what gpu…?

It is perfectly normal for performance to go down significantly when you add instructions. Performance scales very well against instruction count. You doubled the number of instructions, so performance should roughly be halved. Also, on some hardware texture sampling and ALU instructions may work in parallel, so you may even see a larger performance decrease as you’re going from 2 ALU instructions to 6.

video is GeForce FX 5200
Vsync was active, but it was rather goinr from 40 fps down to 20.

What does ALU means ?

ALU is a term from the processor world and means “Arithmetic and Logic Unit”.
In conjunction with shading hardware, ALU refers to the circuitry carrying out the computations (as opposed to texture samplers). Eg ADD/SUB/MUL/MAD are ALU instructions, TEX is a texture sampling instruction.

This is frightening …
I’ll have to limit my fragment programs to about 5 instructions in order to run smoothly ?

Originally posted by SeskaPeel:
This is frightening …
I’ll have to limit my fragment programs to about 5 instructions in order to run smoothly ?

no! you have just to buy a faster graphics card (ie.ATI9700) )

Well, the jump in resolution to 1600x1200 means many more instructions are executed.

I’ve never used an FX-class card, but I guess that if you don’t have good front-to-back ordering, you’re also incurring a pretty heavy penalty for executed-but-not-visible fragment shaders.

I’m not an authority on all of these new Z-buffer optimizations, that discard fragments early. Someone else might have some tips.

you have the cheapest existing fragment processor supporting card on the world, and ask why it cannot run smooth at a higher resolution than most normal users at home ever will use?

this card is not thought for that huge res. just stick to 1024x768 and be happy if your programs run smooth there!

the 5900 is made for such situations, or the radeon9800 works well at that res, too…

you’re essencially processing 2 million fragments per frame at your resolution! what do you expect from that cheap card?

I assume you are using ARB_FP, so all calculations are performed at full precision.

(250 MHz * 2 fp32 pipelines) / (1200 * 1600 * 7 fp32 instructions) == 37 fps.

This is theoretical estimation of maximum you can get, assuming ideal conditions (zero overdraw, 100% efficiency of other parts of GPU, etc.), so in real life, and with vsync on, the 20 fps seems not surprising…

Do you absolutely need full precision in your program? If not, try NV_FP version, and use the “fixed” precision. I estimate (still theoretically) in best case you could get 233% performance increase by this (I’m basing on tests done by thepkrl guy @Beyond3D)

I’ve also heard that NVIDIAs fragment shader performance can drop off drastically when increasing the number of used registers. Might be something else to watch out for.

Like everyone said, though, you can’t expect too much from the 5200, especially if you’re using high precision fragment programs.

Thanks for the answer, your maths makes it clear. Why do you say that the 5200 is the cheapest ? I read that its GPU was going 350 MHz (ho, might for the ultra version that I don’t own). Fragment speed processing depends entirely on this frequency, right ?

Is that NV_FP version, with “fixed” precision Nvidia proprietary ? Will it work on ATI cards ?

Why do you say that the 5200 is the cheapest ?

Well because it is. In the list of FX cards, the 5200 is in the bottom of the list in terms of performance. It’s the budget version of the FX line. Heck, in terms of performance in current games, my GeForce 4 Ti 4400 beats the snot out of the 5200. Personally I wouldn’t buy an FX less than the 5600 ultra (there is an ultra version of the 5600 right?). Any thing less is just too damn slow.

-SirKnight

[This message has been edited by SirKnight (edited 08-22-2003).]

Is that NV_FP version, with “fixed” precision Nvidia proprietary ? Will it work on ATI cards ?

Well since the extension has NV in it, it of course will only work on nvidia hardware. What you can do is use an ARB_FP when your program is running on an ATI card, and an NV_FP when on an nvidia one.

-SirKnight

I’ve heard the 5200 only does fp16 so I doubt you’d get a benifit on that card when forcing lower precision unless you can go to fx12. Even better than drawing front to back (unless your app is really good at drawing front to back) is to fill in the depth buffer and then draw <= depth test to fill in the color buffer. In many cases you can get rid of all the overdraw in these cases on hardware with an early z test. Unfotunantly I don’t think the 5200 has an early z test, but I believe all other nvidia hardware does and I know all ati hardware that supports fragment shaders does.

The 5200 doesn’t have a great bandwidth thus will be slown on hi res screens. Run it at 6x4 or 8x6 and you’ll be fine. You can also play with fitering modes in low res to see if you’ll benefit w/o using high res modes. I read the gffx have speedy filtering modes over their last gen. cousins.

Well since the extension has NV in it, it of course will only work on nvidia hardware

Not necessarily SirKnight. NV_blend_square, NV_texgen_reflection, NV_occlusion_query, and NV_texture_rectangle are supported on ATI hardware, so it’s not a dumb question.

I’ve heard the 5200 only does fp16 so I doubt you’d get a benifit on that card when forcing lower precision unless you can go to fx12.

The 5200 fully supports 32-bit floating-point operations, just like the 5600 and 5800.

Originally posted by SirKnight:
(there is an ultra version of the 5600 right?)

Yes. It’s merely as fast as a GeForce4 Ti4800 (except anti-aliasing where the FX line significantly improved the GF4 line) but features lots of new extensions like ARB_fp.

Originally posted by Zeno:
Not necessarily SirKnight. NV_blend_square, NV_texgen_reflection, NV_occlusion_query, and NV_texture_rectangle are supported on ATI hardware, so it’s not a dumb question.

It’s true that some (rare) NV extensions are supported by ATI, but generally it’s supported later, when supported. And when such extension is really helpful it goes into the OpenGL core spec or into an ARB extension anyway.

As for replying on-topic, yes I agree that the FX5200 is a bit too slow to run apps with a 1600x1200 screen. Moreover I’d say it’s pretty pointless when fragment programs can increase the quality so much that you don’t make the difference between 1024x768 and 1600x1200 screens.