PDA

View Full Version : msi geforce fx5200, major slowdown using fragment programs



jcabeleira
08-26-2003, 02:29 PM
i've got a problem with my geforce fx5200, it's very slow!

i downloaded a very simple demo from http://esprit.campus.luth.se/~humus/ wich demonstrates some fancy per-pixel lighting using fragment programs in 1024x768 resolution. In the site says that the demo should run about 50fps on an radeon 9700, but on my computer runs about 3fps.

For you guys get an ideia, the techique used in the demo, if applied to a simple quad that ocupies about 1/3 of the scrren in 1024x768, runs at about 20/30 fps, that's a crappy framerate!

if the problem is the fill rate then the fill rate of my graphic board is 1.3 billion texels/sec, is it a good or bad fill rate?

By the way, i've got the latest nvidia's drivers.

please guys, help me to understand what's wrong with my hardware, i would really aprecciate any tips to solve the problem.

NitroGL
08-26-2003, 02:35 PM
That sounds about right.

al_bob
08-26-2003, 02:43 PM
That doesn't sound right. The FX5200 is indeed slower than the 9700. However, it's certainly not 15-20x slower.

Korval
08-26-2003, 04:08 PM
That doesn't sound right. The FX5200 is indeed slower than the 9700. However, it's certainly not 15-20x slower.

In general, perhaps. But, consider the circumstances. GeForceFX's implementation of ARB_fragment_shader is quite unoptimal for the card. Combine that with the fact that a 5200 is pretty slow to begin with, and you're killing the card's fillrate. Reaching only 5-10% of a 9700 under these circumstances is not unreasonable.

If you want to speed things up, you need to rewrite the shader to use NV_fragment_program, and to use the 'fixed' type rather than full or half floats.

jwatte
08-26-2003, 05:24 PM
I thought "half" was a fine type to use on GeForces; only "float" was bad (would use more register space and could cut fill rate in half). And I thought the problem was that ARB_fragment_program requires precision better than "half".

Perhaps NVIDIA could enable a control panel and/or extension which accepts ARB_fragment_program syntax, but internally uses half for all floats? That'd be an interesting experiment.

Korval
08-26-2003, 07:02 PM
According to various profiling tests, the key to performance on an FX is to used 'fixed' as much as possible. There is a performance difference between 'float' and 'half', but even with all 'half's, you don't get the performance you could with ATi hardware.

davepermen
08-26-2003, 07:03 PM
half and floats run at the same, half, clockspeed compared to fixed point, if i remember correctly and the one in the beyond3d forum was right while analyzing the nv30 architecture.

check the newest tombraider benchmark. for the 5200, they had to/had disabled the ps2.0 completely to get at least playable performance.

the 5200 is crap in performance. it runs a hardware not really designed to perform optimal in floatingpoint calculations (its merely a fast extended gf4 design + a bit floatingpoint in the original texture sampling unit.. again, see beyond3d), and, additional to that, the card itself is very slow.

MichaelNewman
08-26-2003, 08:15 PM
Using halfs is definitely faster than using floats on FXs. Using fixeds is usually faster again. I've observed the best performance comes from mixing halfs and fixeds. I remember a thread from Beyond3D mentioning they use different resources (halfs use fp ops, fixeds use integer ops), which would make sense of this.

Pop N Fresh
08-26-2003, 09:14 PM
float and half operations both run at the same speed on the FX architecture. The key to speed when using floating point is to keep your register usage as small as possible. The FX can hold 2 halfs or 1 float in a register. This is where the speed gain from using half comes from. Using half effectively doubles the amount of registers you can use without slowing down.

Pop N Fresh
08-26-2003, 09:25 PM
You could also try
OPTION ARB_precision_hint_fastest;
in your fragment program but I don't know if it actually does anything.

If you want raw fillrate numbers take a look at this Beyond3D thread (http://www.beyond3d.com/forum/viewtopic.php?t=6142&postdays=0&postorder=asc&start=0) .

jcabeleira
08-27-2003, 02:11 AM
Thanks for all your posts people, you were really usefull.

By the way, since my board isn't a good one could you tell me wich boards have the best performance using ARB's fragment programs?

davepermen
08-27-2003, 02:19 AM
i'd suggest you a radeon9600pro, or something like that.. i for myself have a radeon9700pro and i'm very happy.. if you have the money, you could get a radeon9800pro and be at the highend..

they all deliver good ARB_fp performance for the price you pay.. if you find an "old" (old == not produced and officially sold anymore) radeon9500, thats great, too.. if its cheap http://www.opengl.org/discussion_boards/ubb/biggrin.gif

i simply cannot support any gfFX cards, sorry. its not that i'm biased, but i have not seen one running yet at acceptable speed and quality for the price you pay. i thought the 5900ultra is good at least, but now newest benches show up that it not only sucks in 3dmark, but about every dx9 game possibly.. hm..

i can stand behind radeons, they work great.

vincoof
08-27-2003, 11:48 PM
For the best performance, go for either an ATi Radeon 9800 Pro or a nVidia GeForce FX 5900 Ultra. Both have similar speeds (well, you will always find ppl that tells one is a bit faster than the other, but in fact you will barely not make the difference).

I've heard that the radeon 9600 is slower than the 9500, so I wouldn't recommend it, except maybe for the price/performance ratio, but definately not a good solution for performance.

zeckensack
08-28-2003, 06:17 AM
Originally posted by vincoof:
For the best performance, go for either an ATi Radeon 9800 Pro or a nVidia GeForce FX 5900 Ultra. Both have similar speeds (well, you will always find ppl that tells one is a bit faster than the other, but in fact you will barely not make the difference).I disagree. You're right for 'fixed function' stuff. For serious shader performance (that's what we're talking here, right?) there's no way around an ATI card atm.

*Aaron*
08-28-2003, 07:56 AM
For serious shader performance (that's what we're talking here, right?) there's no way around an ATI card atm.Wrong. If you're looking for absolute best shader performance at any price, you want to avoid ATI and nVidia. Get some $10000 card from 3dLabs. But I don't think that's what jcabeleira is looking for if he bought an FX 5200.

stefan
08-28-2003, 08:03 AM
I also noticed quite a difference in terms of speed with the fx 5200 in my notebook when the "same" fragment program runs with register combiners and with a fragment program. I'd say the slowdown is about 40%. Is that the case for the other Geforce FX cards too?

How about ATI cards? Is there a slowdown between programs with equals output using ATI_fragment_shader and ARB_fragment_program?

Speaking about those fx12 types in NV_fragment_program: I'm not too familiar with them, but how do I work with those? Is it OK to store the result of e.g ADDX in H0 and should that yield a better performance? (There seem to be no special temporary registers for fx12 types).

Ostsol
08-28-2003, 08:15 AM
Originally posted by stefan:
How about ATI cards? Is there a slowdown between programs with equals output using ATI_fragment_shader and ARB_fragment_program?
That was the first thing I checked when I got my Radeon 9700 Pro. The only performance difference I noted was due to the slightly different bump-mapping technique I used (arithmetic normalization rather than cubemaps). Other than that performance was the same.

zeckensack
08-28-2003, 08:26 AM
ATI chips don't seem to have dedicated 'legacy' shading hardware. It's all FP24 and it's all run on the same units, regardless of whether it's a multitexturing setup, ATI_fragment_shader or ARB_fragment_program. At least that's my conclusion.

Transcribing ATI_fragment_shader stuff into ARB_fragment_program versions gave me the exact same performance and same precision in all of the cases tested. Testing the opposite way isn't always possible, of course.

That's nice, IMO. If the extension is detected, I can always use ARB_fragment_program on ATI cards without worrying about performance drops.

The FX cards somewhat force me to include an "off" switch for ARB_fp to enable the fallback to NV_register_combiners.

zeckensack
08-28-2003, 08:33 AM
Originally posted by *Aaron*:
Wrong. If you're looking for absolute best shader performance at any price <...>I don't. I'm referring to consumer cards. You should be looking at FX5200/U, FX5600/U and Radeon 9500/Pro, 9600/Pro. That's what people are buying. That's where code must run well.

Technology only matters when there's a target audience.

Elixer
08-28-2003, 06:00 PM
Originally posted by zeckensack:

Originally posted by *Aaron*:
Wrong. If you're looking for absolute best shader performance at any price <...>I don't. I'm referring to consumer cards. You should be looking at FX5200/U, FX5600/U and Radeon 9500/Pro, 9600/Pro. That's what people are buying. That's where code must run well.

Technology only matters when there's a target audience.

True enough.

I wonder if there is any shader program that runs faster on a FX series card than the 9500+ cards? I haven't checked out any demo/game that uses shaders since I don't have a new card yet. Although I can use emulation to get blazing speeds! http://www.opengl.org/discussion_boards/ubb/wink.gif

Korval
08-28-2003, 06:41 PM
I wonder if there is any shader program that runs faster on a FX series card than the 9500+ cards?

Sure. They're made using NV_fragment_program or Cg through NV_fp(which provides the necessary basic types needed for the performance).

harsman
08-29-2003, 01:24 AM
Anything with lots of filtered shadow map lookups should be faster on the GeForceFX since it has dedicated hw for pcf. The radeon has to waste cycles and fetches doing the same thing manually. Another idea is something which uses the sincos or lit instructions lots. Those are native on the FX but emulated by exapnsion into several instructions on the Radeon IIRC.

Korval
08-29-2003, 10:08 AM
Anything with lots of filtered shadow map lookups should be faster on the GeForceFX since it has dedicated hw for pcf. The radeon has to waste cycles and fetches doing the same thing manually. Another idea is something which uses the sincos or lit instructions lots. Those are native on the FX but emulated by exapnsion into several instructions on the Radeon IIRC.

While true, that isn't the question at hand here. The question is why the 5200 seems slower than it ought to be. IE, doing things that should speed up the card do not. My best guess is that the 5200 lacks some actual hardware that the higher-end cards have that boosts their performance.

harsman
09-02-2003, 11:15 AM
Well, he did ask about specific cases when the GeForceFX would beat the 9700. Anyway, theres a nice article on FX's architechture here: http://www.3dcenter.de/artikel/cinefx/
It's not official word form nvidia by any means but it does seem to be consistant with what nvidia have said and what people have measured. In short, to get the most of FX performance, use few registers, lots of textures and utilise the combiners at the end.