msi geforce fx5200, major slowdown using fragment programs

i’ve got a problem with my geforce fx5200, it’s very slow!

i downloaded a very simple demo from http://esprit.campus.luth.se/~humus/ wich demonstrates some fancy per-pixel lighting using fragment programs in 1024x768 resolution. In the site says that the demo should run about 50fps on an radeon 9700, but on my computer runs about 3fps.

For you guys get an ideia, the techique used in the demo, if applied to a simple quad that ocupies about 1/3 of the scrren in 1024x768, runs at about 20/30 fps, that’s a crappy framerate!

if the problem is the fill rate then the fill rate of my graphic board is 1.3 billion texels/sec, is it a good or bad fill rate?

By the way, i’ve got the latest nvidia’s drivers.

please guys, help me to understand what’s wrong with my hardware, i would really aprecciate any tips to solve the problem.

That sounds about right.

That doesn’t sound right. The FX5200 is indeed slower than the 9700. However, it’s certainly not 15-20x slower.

That doesn’t sound right. The FX5200 is indeed slower than the 9700. However, it’s certainly not 15-20x slower.

In general, perhaps. But, consider the circumstances. GeForceFX’s implementation of ARB_fragment_shader is quite unoptimal for the card. Combine that with the fact that a 5200 is pretty slow to begin with, and you’re killing the card’s fillrate. Reaching only 5-10% of a 9700 under these circumstances is not unreasonable.

If you want to speed things up, you need to rewrite the shader to use NV_fragment_program, and to use the ‘fixed’ type rather than full or half floats.

I thought “half” was a fine type to use on GeForces; only “float” was bad (would use more register space and could cut fill rate in half). And I thought the problem was that ARB_fragment_program requires precision better than “half”.

Perhaps NVIDIA could enable a control panel and/or extension which accepts ARB_fragment_program syntax, but internally uses half for all floats? That’d be an interesting experiment.

According to various profiling tests, the key to performance on an FX is to used ‘fixed’ as much as possible. There is a performance difference between ‘float’ and ‘half’, but even with all 'half’s, you don’t get the performance you could with ATi hardware.

half and floats run at the same, half, clockspeed compared to fixed point, if i remember correctly and the one in the beyond3d forum was right while analyzing the nv30 architecture.

check the newest tombraider benchmark. for the 5200, they had to/had disabled the ps2.0 completely to get at least playable performance.

the 5200 is crap in performance. it runs a hardware not really designed to perform optimal in floatingpoint calculations (its merely a fast extended gf4 design + a bit floatingpoint in the original texture sampling unit… again, see beyond3d), and, additional to that, the card itself is very slow.

Using halfs is definitely faster than using floats on FXs. Using fixeds is usually faster again. I’ve observed the best performance comes from mixing halfs and fixeds. I remember a thread from Beyond3D mentioning they use different resources (halfs use fp ops, fixeds use integer ops), which would make sense of this.

float and half operations both run at the same speed on the FX architecture. The key to speed when using floating point is to keep your register usage as small as possible. The FX can hold 2 halfs or 1 float in a register. This is where the speed gain from using half comes from. Using half effectively doubles the amount of registers you can use without slowing down.

You could also try
OPTION ARB_precision_hint_fastest;
in your fragment program but I don’t know if it actually does anything.

If you want raw fillrate numbers take a look at this Beyond3D thread .

Thanks for all your posts people, you were really usefull.

By the way, since my board isn’t a good one could you tell me wich boards have the best performance using ARB’s fragment programs?

i’d suggest you a radeon9600pro, or something like that… i for myself have a radeon9700pro and i’m very happy… if you have the money, you could get a radeon9800pro and be at the highend…

they all deliver good ARB_fp performance for the price you pay… if you find an “old” (old == not produced and officially sold anymore) radeon9500, thats great, too… if its cheap

i simply cannot support any gfFX cards, sorry. its not that i’m biased, but i have not seen one running yet at acceptable speed and quality for the price you pay. i thought the 5900ultra is good at least, but now newest benches show up that it not only sucks in 3dmark, but about every dx9 game possibly… hm…

i can stand behind radeons, they work great.

For the best performance, go for either an ATi Radeon 9800 Pro or a nVidia GeForce FX 5900 Ultra. Both have similar speeds (well, you will always find ppl that tells one is a bit faster than the other, but in fact you will barely not make the difference).

I’ve heard that the radeon 9600 is slower than the 9500, so I wouldn’t recommend it, except maybe for the price/performance ratio, but definately not a good solution for performance.

Originally posted by vincoof:
For the best performance, go for either an ATi Radeon 9800 Pro or a nVidia GeForce FX 5900 Ultra. Both have similar speeds (well, you will always find ppl that tells one is a bit faster than the other, but in fact you will barely not make the difference).
I disagree. You’re right for ‘fixed function’ stuff. For serious shader performance (that’s what we’re talking here, right?) there’s no way around an ATI card atm.

For serious shader performance (that’s what we’re talking here, right?) there’s no way around an ATI card atm.
Wrong. If you’re looking for absolute best shader performance at any price, you want to avoid ATI and nVidia. Get some $10000 card from 3dLabs. But I don’t think that’s what jcabeleira is looking for if he bought an FX 5200.

I also noticed quite a difference in terms of speed with the fx 5200 in my notebook when the “same” fragment program runs with register combiners and with a fragment program. I’d say the slowdown is about 40%. Is that the case for the other Geforce FX cards too?

How about ATI cards? Is there a slowdown between programs with equals output using ATI_fragment_shader and ARB_fragment_program?

Speaking about those fx12 types in NV_fragment_program: I’m not too familiar with them, but how do I work with those? Is it OK to store the result of e.g ADDX in H0 and should that yield a better performance? (There seem to be no special temporary registers for fx12 types).

Originally posted by stefan:
How about ATI cards? Is there a slowdown between programs with equals output using ATI_fragment_shader and ARB_fragment_program?

That was the first thing I checked when I got my Radeon 9700 Pro. The only performance difference I noted was due to the slightly different bump-mapping technique I used (arithmetic normalization rather than cubemaps). Other than that performance was the same.

ATI chips don’t seem to have dedicated ‘legacy’ shading hardware. It’s all FP24 and it’s all run on the same units, regardless of whether it’s a multitexturing setup, ATI_fragment_shader or ARB_fragment_program. At least that’s my conclusion.

Transcribing ATI_fragment_shader stuff into ARB_fragment_program versions gave me the exact same performance and same precision in all of the cases tested. Testing the opposite way isn’t always possible, of course.

That’s nice, IMO. If the extension is detected, I can always use ARB_fragment_program on ATI cards without worrying about performance drops.

The FX cards somewhat force me to include an “off” switch for ARB_fp to enable the fallback to NV_register_combiners.

Originally posted by Aaron:
Wrong. If you’re looking for absolute best shader performance at any price <…>
I don’t. I’m referring to consumer cards. You should be looking at FX5200/U, FX5600/U and Radeon 9500/Pro, 9600/Pro. That’s what people are buying. That’s where code must run well.

Technology only matters when there’s a target audience.

Originally posted by zeckensack:
[b] [quote]Originally posted by Aaron:
Wrong. If you’re looking for absolute best shader performance at any price <…>
I don’t. I’m referring to consumer cards. You should be looking at FX5200/U, FX5600/U and Radeon 9500/Pro, 9600/Pro. That’s what people are buying. That’s where code must run well.

Technology only matters when there’s a target audience.[/b][/QUOTE]

True enough.

I wonder if there is any shader program that runs faster on a FX series card than the 9500+ cards? I haven’t checked out any demo/game that uses shaders since I don’t have a new card yet. Although I can use emulation to get blazing speeds!