PDA

View Full Version : Poor Performance of NVidia Cards



mmshls
09-11-2003, 03:49 AM
http://www.tomshardware.com/business/20030911/index.html

Any Comments?

EDIT: NVidia's response thanks to Madman. Hopefully, the new drivers will fix performance issues. http://www.gamersdepot.com/hardware/video_cards/ati_vs_nvidia/dx9_desktop/HL2_benchmarks/003.htm



[This message has been edited by mmshls (edited 09-15-2003).]

Ostsol
09-11-2003, 03:54 AM
It's all in the shaders -- and we already knew that the GeforceFX had problems there, compared to ATI. IMHO, these results come as no surprise.

kehziah
09-11-2003, 04:01 AM
Except that NVIDIA PR have done whatever they could to "demonstrate" that shader performance problems were due to the benchmarks, not their hardware. Now they have their real-world, shipping DX9 game.

mattc
09-11-2003, 05:54 AM
somehow i don't think the thread name you chose is gonna help convince staunch nvidiots... http://www.opengl.org/discussion_boards/ubb/wink.gif

mmshls
09-11-2003, 06:14 AM
Originally posted by mattc:
somehow i don't think the thread name you chose is gonna help convince staunch nvidiots... http://www.opengl.org/discussion_boards/ubb/wink.gif

What's wrong with my thread name? If it doesn't help, that says something about nvidiots.

FSAAron
09-11-2003, 06:53 AM
It was always embarassing to watch IHV dirty fights, tricks, accusations, lack of respect for competitors etc. Now when ISV joining this politics, the embarassment reaches unprecedenced levels.

>> Valve stated that the development of that special code path took 5 times the development time of the standard DX9 code path.

what's the point of such announcement? Is this developer conference? Stinks like FUD.

>> Special optimizations for ATI cards were not nescesarry.

because they optimized for ATI from the beginning?

>> Valve was able to heavily increase the performance of the NVIDIA cards with the optimized path

Isn't it his friggin job to do so? What's a reason for inventing special "Mixed Mode" label for this? This conference? FUD as hell.

Anyway, it will be interesting to see the shader code when game ships.

>> but Valve warns that such optimizations won't be possible in future titles, because future shaders will be more complex and will thus need full 32-bit precision.

And what's a reason of such announcement there, apart from FUD? How on earth does this "warning" relate to Half-life 2 ?

>> 32-bit precision

oops, a typo? He was meaning 24-bit, of course? Some people should be more careful, as this would (if true) dismiss ATI HW for the future games... A careful reader will notice a table on previous page, saying that ATI does not support 32-bit precision, but the FUD effect remains for most people, I guess.

>> Newell strictly dismissed rumors that his comments about the disappointing performance of NVIDIA cards were based on a deal between Valve and ATi

Yes, I believe in your honesty fully. The whole conference was related purely to HL2. And the conference would be still be held even if there was no deal with ATI at all. In PR we trust.

Valve "warns", Valve whines about "5 times development time", it is all bent in one obvious direction. Have we *ever* experienced such attitude towards *any* hardware vendor from John Carmack? You can take him as reference point to judge Valve's impartiality.

I'm truly disgusted. Valve, stick your FUD in your arse.

M/\dm/\n
09-11-2003, 06:54 AM
Well, if that's true then NVIDIA sucks, but FX5900==Ti4600 is simply radiculous (can you believe it?). Moreover 16 bit mode isn't showing any kind of real improvement, strange. Something like ATi db Valve echoes somwhere & Nvidia db ID SW. I'd like to see numbers on Matrox ....... For a quick comparision http://www.opengl.org/discussion_boards/ubb/wink.gif BTW, fixed cliplanes & so on, I guess that's aimed at NVidia, but then they are so HELLLLOOOOOWA fast to write cheats for yet unreleased bench, I'd like to learn that too http://www.opengl.org/discussion_boards/ubb/smile.gif
Well, fps world is in mess, but I can't see that bad performance on FX5200 at home http://www.opengl.org/discussion_boards/ubb/frown.gif 'll see

Crap, GPU world has gone terribly wrong.

Zengar
09-11-2003, 07:10 AM
"Valve stated that the development of that special code path took 5 times the development time of the standard DX9 code path"

How much time do you need to write a shader if you know the algorithm already? 10 minutes? And 50 minutes to add prefixes for register precision? It's creasy!

Zengar
09-11-2003, 07:12 AM
And FX5600 in mixed mode is slower as in full prec????? I thought I like Valve...

davepermen
09-11-2003, 07:32 AM
Originally posted by Zengar:
"Valve stated that the development of that special code path took 5 times the development time of the standard DX9 code path"

How much time do you need to write a shader if you know the algorithm already? 10 minutes? And 50 minutes to add prefixes for register precision? It's creasy!



uhm, to optimize for speed, you have to play around with all possible ways the shader could get rewritten, and check where you can go to what precicion, etc. they want highest quality at highest performance, that takes some time, yes..

oh, and MADMAN.. haven't i told you?

gpu world didn't went wrong. only nvidia who based their future development effort into marketingcampaigns and cheat-coders instead of good hw-designers and driverdevelopers..

well, mostly. of course there ARE good men in nvidia, too.. but i guess they got holidays for the last years..

dorbie
09-11-2003, 07:36 AM
Zengar, you need to implement your system to handle multiple code paths, and perhaps spend a lot of time figuring out where you can afford make your optimizations without losing quality & possibly alter game assets to support it. On the face of it it sounds simple, but it depends on your starting point and the complexities it introduces to your rendering system + how many effects & custom shaders you have and how programmable they are at the asset level. Then there's the added complexity and what that means for cross platform testing & debugging. Not everyone has a monolithic shader codepath they can just swap out.

It is interesting watching the nvidiots trying to shoot the messenger. Yes this was an ATI event, but I think Valve has a bit more integrity than to be unduly influenced by that. Do they have to attend and give this talk? Hell no, and they would certainly be free to present their version of events rather than something slanted in ATI's favor. This is a contribution to public knowledge disseminating their experience, take it or leave it, but don't blame them. Are developers supposed to avoid discussing this kind of thing because nvidiots get upset?

Other high profile developers have corroborated some of Valve's comments and Valve presented real data, not vague opinion. You could easily make your own measurements with a couple of graphics cards.

Ostsol
09-11-2003, 08:29 AM
Originally posted by FSAAron:
>> Special optimizations for ATI cards were not nescesarry.

because they optimized for ATI from the beginning?
If you're so confident that there was optimization for ATI, would you be so kind as to list the possible optmizations that may have been implemented?


>> but Valve warns that such optimizations won't be possible in future titles, because future shaders will be more complex and will thus need full 32-bit precision.

And what's a reason of such announcement there, apart from FUD? How on earth does this "warning" relate to Half-life 2 ?

>> 32-bit precision

oops, a typo? He was meaning 24-bit, of course? Some people should be more careful, as this would (if true) dismiss ATI HW for the future games... A careful reader will notice a table on previous page, saying that ATI does not support 32-bit precision, but the FUD effect remains for most people, I guess.
Now this is the problem when you read Toms Hardware. No mention of 32 bit precision was ever actually made. If you'll look at the presentation slides you'll see this:

". . . new DX9 functionality will be able to use fewer and fewer partial precision functions"

ATI has only one precision and apparently it is sufficient. Full precision is the maximum precision allowed by the video card, which is a minimum of 24 bit -- as is stated by the PS2.0 specifications.


>> Newell strictly dismissed rumors that his comments about the disappointing performance of NVIDIA cards were based on a deal between Valve and ATi

Yes, I believe in your honesty fully. The whole conference was related purely to HL2. And the conference would be still be held even if there was no deal with ATI at all. In PR we trust.

Valve "warns", Valve whines about "5 times development time", it is all bent in one obvious direction. Have we *ever* experienced such attitude towards *any* hardware vendor from John Carmack? You can take him as reference point to judge Valve's impartiality.

I'm truly disgusted. Valve, stick your FUD in your arse.
Heh. . . so how about the Doom 3 benchmark? Is that automatically valid because NVidia won? In any case, do these results not simply confirm -everything- we've seen, read, and heard regarding NVidia's pixel shader performance? 3dMark03, Tomb Raider: AoD, ShaderMark, RightMark3D, etc. . . All have pointed to the same deficiencies. Are they all wrong?

[This message has been edited by Ostsol (edited 09-11-2003).]

davepermen
09-11-2003, 09:08 AM
Originally posted by Ostsol:
Are they all wrong?

sure, or not? http://www.opengl.org/discussion_boards/ubb/biggrin.gif and the humus demo running at 50fps on a one year old card runs on a gfFX 5200 at 3-4 fps. right/wrong? definitely WRONG. and thats why it was a big topic in here, wasn't it?

some people will never learn

Elixer
09-11-2003, 09:35 AM
Nvidiots? Atidiots?

Bah.

Maybe if the original poster would insert some code, we can see who is better? heh.


Now where is the moderator hiding? :P

FSAAron
09-11-2003, 09:41 AM
>> Heh... so how about the Doom 3 benchmark? Is that automatically valid because NVidia won?

Well, if Doom III benchmark results were presented on nVidia PR event, with Carmack participating in speach tailored to explicitly show how ATI sucks, blaming ATI for necessity of multiple code paths, inventing special (tm) names for ATI paths to show them on charts, "warning" about future games unrelated to Doom3, and having a bundle-deal with nVidia - then you would have reasons to question Carmacks crediblity. But you dont have any. Don't compare him to Valve.

FYI, I personally believe NV won because D3 uses OGL and because Carmack has a will to fully optimise for HW architecture no matter he likes its design or not. The latter is a thing some coding fanboys keep on resisting to understand.

Ostsol
09-11-2003, 10:30 AM
Originally posted by FSAAron:

>> Heh... so how about the Doom 3 benchmark? Is that automatically valid because NVidia won?

Well, if Doom III benchmark results were presented on nVidia PR event, with Carmack participating in speach tailored to explicitly show how ATI sucks, blaming ATI for necessity of multiple code paths, inventing special (tm) names for ATI paths to show them on charts, "warning" about future games unrelated to Doom3, and having a bundle-deal with nVidia - then you would have reasons to question Carmacks crediblity. But you dont have any. Don't compare him to Valve.

FYI, I personally believe NV won because D3 uses OGL and because Carmack has a will to fully optimise for HW architecture no matter he likes its design or not. The latter is a thing some coding fanboys keep on resisting to understand.

LMAO!! You are absolutely hilarious!

Hmm. . . which video card runs absolutely fine and with great performance using Carmack's ARB2 path? Which video card -must- utilize FX12 and FP16 registers in order to achieve decent performance? Is the ARB2 path really a vendor specific path? Currently it is, but only because it is being used by default for one vendor. The GeforceFX could certainly run the game using it, but that would not produce practical framerates. The truth of the matter is that if NVidia's floating point performance were on par with ATI's, it would also be using the ARB2 path.

Also, consider that there is only one ATI-specific path in Doom3, but two for NVidia. Based on these numbers, who is more a cause for the game having so many render paths?

Back to Half-Life 2, since you did not specify any ATI-specific optmizations you thought Half-Life 2 might have. . . I can see only two possible ways in which Valve may have slanted the game towards ATI. First is that they used the Radeon 9800 Pro as the standard for basing what full detail should perform at (based on a predetermined feature-set). Floating point pixel shaders, textures, etc. . . all on at a particular detail level (presumably maximum) and achieving 60 fps. Either that or Valve intentionally chose to use features that the GeforceFX is known to be weak in.

NitroGL
09-11-2003, 10:50 AM
The only ATI optimization they could do, would to make use of co-issue, and that's not really an ATI specific optimization (since, AFAIK, the NV3x supports co-issue instructions too). Can't use another precision since ATI only has one, and ATI doesn't have any special hacks/extensions to D3D/OpenGL that would produce any greater speed increase...

davepermen
09-11-2003, 11:20 AM
its a standard dx9 game. so its an "ati optimized game", as ati simply rocks in standard dx9 tasks

same is true for standard arb gl1.4/1.5 apps with standard arb extensions.

problem is with those "ati optimized games".. they will run good even in years, while ati could be dead then. as long as gl / dx survives.

same can not be said for all the proprietary rc/ts/nvfp/nvvp/nvtex coded paths.. they wiill die out with nvidia. after that, there is no support for them. proprietary ****.

i prefer to "optimize for ati", as at the same time it means "optimize for dx9 or arb gl" and like that "optimize for a save future"

Pop N Fresh
09-11-2003, 11:22 AM
Valve's results match the results I get with my application when using ARB_fragment_program for floating point shaders. Radeon 9700 Pro is double or more the speed of an NV35. This is on an application that was originally developed on Geforce 4 4200 hardware using NV_register_combiners and then upgraded with ARB_fragment_program.

I've no reason to doubt any of Valve's statements as they match my own experience. Those who do doubt Valve, please state your experience and how it differs rather than making accusations with nothing to back them up.

Nakoruru
09-11-2003, 11:27 AM
I cannot help but feel that this forum has gone down hill significantly when people start refering to others as 'nvidiots'

Such name calling is pretty useless.

The title of this post, 'Poor Performance of NVidia Cards' is poorly choosen. It should be more like: "Valve Reports Poor Performance of NV3X Cards compared to ATI 9XXX cards in Half Life 2." But, that would imply a much more limited failing of nVidia and not get as much attention, now would it.

I found Valves results to be startling, with the ATI card being 100 percent better. Nothing I've seen so far ever suggested that ATI's cards were more than 20 or 25 percent better. This is extraordinary, but everyone should take it with a grain of salt because extraordnary claims require extraordinary proof.

I cannot help but feel that Valve is whining. This is the second time they have made a big deal about something in DX9. Is it fair for me to feel this way, or is Valve just standing up as a developer and saying they aren't going to take crap from Microsoft or IHV's anymore? i.e. They want things to be easier because it really is getting too expensive and time consuming to develop game software (not because they are lazy).

pkaler
09-11-2003, 11:36 AM
First of all, why is everyone taking this so personal? We're talking about video cards here, not religion, not economic theory, not abortion. There is not need for name calling.

GPU performance has been leap frogging along the last few years and will continue to do so into the future. Deal with it. 12 months ago Nvidia had the best card out there. Today, ATI probably does. 12 months from now 3DLabs could have the best card. Who knows?

Vertex programming functionality has stabilized. Fragment programming is starting to stabilize. In the future floating point and high level programming will stabilize.

Programming at the bleeding edge is hard. But that's why it is rewarding.

Can we go back to talking about something constructive?

santyhamer
09-11-2003, 12:11 PM
Ok, here is my opinion:

NVIDIA prefers OGL. Look the last official DX9 NVIDIA's drivers (45.23).... no floating-point texture support... how is this possible when you can create it in OGL without problems???... Hey NVIDIA dx9 driver team, wake up from holydays!

And, I think ATI current products sucks cuz no dynamic-control-flow vs2_x/ps2_x support.




[This message has been edited by santyhammer (edited 09-11-2003).]

FSAAron
09-11-2003, 12:25 PM
>> GPU performance has been leap frogging along the last few years and will continue to do so into the future. Deal with it.

Why everybody concentrates on these performance reports? Is the inferior floating point performance of 5900 in DirectX really that surprising?

I'm pissed because an important, respected ISV have joined one fanboy camp. They took part in event which only purpose was to show how one company's products suck. They actively engaged in spewing FUD. This disgusts me. This has no precedence, not in that league.

mmshls
09-11-2003, 12:45 PM
Originally posted by Nakoruru:
I cannot help but feel that this forum has gone down hill significantly when people start refering to others as 'nvidiots'

Such name calling is pretty useless.


My mom teasingly calls me a 'Vidiot'. But, it didn't bother me. We both thought it was funny.


The title of this post, 'Poor Performance of NVidia Cards' is poorly choosen.


I found Valves results to be startling, with the ATI card being 100 percent better.

I too found the results startling. I don't know if the results are accurate, but I agree that the title was poorly choosen. I should have made it "Piss Poor Performance of NVidia Cards".


[This message has been edited by mmshls (edited 09-11-2003).]

Korval
09-11-2003, 12:50 PM
Stinks like FUD.

What is FUD?


Well, if Doom III benchmark results were presented on nVidia PR event, with Carmack participating in speach tailored to explicitly show how ATI sucks, blaming ATI for necessity of multiple code paths, inventing special (tm) names for ATI paths to show them on charts, "warning" about future games unrelated to Doom3, and having a bundle-deal with nVidia - then you would have reasons to question Carmacks crediblity.

The difference is that, had Carmack said so, he would clearly be lying on most of the factual issues. ATi doesn't inflate the number of codepaths; that distinction belongs to nVidia.

Secondly, Carmack is one man. Valve is a company. One of the reasons I give what they say more weight is that they are a group. Carmack is an individual with his own personal opinions on various matters.

And, in this instance, Valve is, in so far as their factual claims are concerned, 100% right. nVidia's hardware has known fragment-program issues. We've had several threads dedicated to people having disappointing performance with ARB_fp under nVidia hardware. So, even if this is a PR stunt, at least it's one grounded in facts, not idle speculation or lies.


FYI, I personally believe NV won because D3 uses OGL and because Carmack has a will to fully optimise for HW architecture no matter he likes its design or not. The latter is a thing some coding fanboys keep on resisting to understand.

But nVidia didn't win. According to Carmack, if both of them use the ARB path, ATi wins. Granted, it's kind of an unfair test, since we know that nVidia's hardware is weak in this area. However, it's not a fair test to compare NV_fragment_program to ATi's hardware either, since ATi didn't optimize their hardware for fixed-point operations.

There isn't really a fair test between these two pieces of hardware. On DX9/ARB_fp, ATi wins because those shaders can't be optimized for nVidia cards. Under NV_fragment_program, nVidia wins, because nVidia's hardware isn't doing as much work as ATi's.


since, AFAIK, the NV3x supports co-issue instructions too

By "co-issue", do you mean issuing ALU and texture instructions on the same cycle? If so, you're wrong; NV3x doesn't support that.


I cannot help but feel that Valve is whining. This is the second time they have made a big deal about something in DX9. Is it fair for me to feel this way, or is Valve just standing up as a developer and saying they aren't going to take crap from Microsoft or IHV's anymore?

They probably are whining. With good reason. Better to complain about a problem than be silent; at least, if you make noise, it might get fixed.

Developing shaders for nVidia's hardware is hard. Not just because you have to limit your thinking to smaller precision, but you have to spend time playing around with it until you strike upon the correct shader variants that gets good performance. There's no publically avaliable FAQ for getting decent performance out of it; only some general guidelines that don't always work.

Granted, I'm sure that, if Valve had asked nVidia to take their shaders and optimize them, nVidia would have. However, there's no reason that this needs to be the case.

Besides, Valve probably figures nVidia will just put some "optimizations" into their driver specifically for HL2 shaders that will give them the performance they want.


NVIDIA prefers OGL. Look the last official DX9 NVIDIA's drivers (45.23).... no floating-point texture support... how is this possible when you can create it in OGL without problems???...

"Without problems"? Are you kidding? nVidia only allows floating-point textures with texture rectangles. I don't know what D3D says about supporting FP textures, but I wouldn't be surprised to see that it requires full support (all formats and texture types) if you're going to support it at all.

mmshls
09-11-2003, 01:02 PM
Originally posted by FSAAron:

>> GPU performance has been leap frogging along the last few years and will continue to do so into the future. Deal with it.

Why everybody concentrates on these performance reports? Is the inferior floating point performance of 5900 in DirectX really that surprising?

I'm pissed because an important, respected ISV have joined one fanboy camp. They took part in event which only purpose was to show how one company's products suck. They actively engaged in spewing FUD. This disgusts me. This has no precedence, not in that league.



There was a previous article on Tom's Hardware http://www.tomshardware.com/graphic/20030714/index.html which said that the 5900 "was the fastest card on the market."

If the Valve results are accurate, then I thank Valve for enlightening at least Tom's Hardware and a lot of end users.

I am having trouble understanding your anger toward Valve. Who would you rather have in Valve's place? Or would you prefer having no benchmarks? Maybe everyone should buy based on how pretty the box looks.

[This message has been edited by mmshls (edited 09-11-2003).]

Nutty
09-11-2003, 01:12 PM
What is FUD?

****ed Up Data.

SirKnight
09-11-2003, 01:30 PM
Maybe everyone should buy based on how pretty the box looks.


What, you don't do that!? I always determine the speed of a video card by how pretty the graphics of the box are. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Seriously...

It's too bad the FX cards are so slow in the fragment department. The benchmarks didn't suprise me all that much but I did expect a little better. And for $500 for an FX 5900 Ultra I would expect a hell of a lot more than what you really get. If this is the kind of performance you get for $500 from nvidia then I think it's crystal clear what to buy (or what NOT to buy) for Half-Life 2. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Let's just hope nvidia has some nice improvements in their 50.xx drivers, and on the same note, hope their next card will just burn through those fragment calculations (not literally of course http://www.opengl.org/discussion_boards/ubb/wink.gif).

-SirKnight

*Aaron*
09-11-2003, 01:43 PM
quote:What is FUD?

****ed Up Data.
I asked this same question when I read this thread (I had seen it before, but I never bothered to look it up.) But when I asked the almighty Google "What is FUD?", it directed me to a page that says that it is "Fear, Uncertainty, Doubt". According to this page, it's a marketing technique that casts doubt on a competitor's superior product to keep customers from switching brands. Strangely, until I looked into it, I also thought it stood for "Effed" Up Data.

Ostsol
09-11-2003, 02:19 PM
Originally posted by Korval:
By "co-issue", do you mean issuing ALU and texture instructions on the same cycle? If so, you're wrong; NV3x doesn't support that.
I believe NitroGL was refering to the capability to perform a vec3 op and an alpha op in a single clock.

rgpc
09-11-2003, 04:39 PM
As crap as this thread is I still have to pose two questions...

1. Valve and ATI wouldn't be partners would they? (Yes I know the answer)

2. What are the results like if you run HL2 on the 5900 with 16bit FP? (Edit - I presume this is the mixed mode? Valve states that this won't be possible in future titles - Who gives a toss, takes them so long to put anything out that we'll have Geforce 95000's by then anyway)

Basically that article just makes me think that I won't rush out and buy it - Simply because it looks to me like they haven't spent the sort of effort id have in optimising their product. I've found a couple of products that don't perform (well) on my 5900 Ultra (GP3 and Colin Mcrae Rally 3) - but I can find plenty of others that perform extremely well (especially the GL products).

[This message has been edited by rgpc (edited 09-11-2003).]

Korval
09-11-2003, 05:22 PM
Valve and ATI wouldn't be partners would they?

Which doesn't prevent Valve from speaking the truth. It only makes them more likely to embellish it.

The material statements made by Valve (as opposed to things like their presuming that future titles would need high-precision floats, which is clearly speculation) are 100% true. The FX line does have these problems with cross-platform shaders. We knew this 5 months ago. Every week, somebody asked, "Why does my FX 5600 not run ARB_fp shaders very well?" And these people are quickly referred back to the Beyond3D boards where shader benchmarks first revealed the problems with FX's and cross-platform fragment shaders.

Is it FUD when it is the truth?


Simply because it looks to me like they haven't spent the sort of effort id have in optimising their product.

This the kind of attitude makes no logical sense.

First, D3D's ability to allow the use of 16-bit precision (via modifiers, presumably what the article refers to as "mixed mode") isn't good enough for the FX. What you really need to get performance is to use fixed-point, which neither D3D nor ARB_fp supports. Hence, no matter what Valve does, they aren't getting good performance out of an FX without writing nVidia-specific code. For OpenGL, there is a potential solution. For D3D, there is none.

Secondly, even if they had a possible solution (which, as I pointed out, D3D doesn't), time spent building an entire new shader codepath is time taken away from optimizing other parts of their game. HL2 is very CPU intensive, especially with all that physics and so forth they've got going on. They have to be able to make that work on 1.6GHz AthlonXP's (they'd better http://www.opengl.org/discussion_boards/ubb/wink.gif ), lest they alienate too much of the market.

Lastly, you can, presumably, turn down the graphical spiffiness if your card can't handle it. One of the purposes of their even bringing this up is so that people aren't shocked (and ticked off) when they get HL2 home to their FX5600-Ultras, only to find that acceptable performance is not avaliable with the spiffy graphics.

You're the one who bought the FX5900-Ultra without thinking through the possible ramifications on the fragment shader end. It is unfortunate (especially considering what you paid for it), but you had the opportunity to weigh the avaliable evidence about its cross-platform fragment shading capacity. It's not our, or Valve's, or ATi's fault that you purchased the wrong card for what you are wanting to do.

Ostsol
09-11-2003, 05:24 PM
I think that we can't really speculate on how much effort either company put into optimization. HL2 could be using a featureset much more advanced than that used in Doom 3. After all, there's more than just floating point precision pixel shaders in DX9. Perhaps many of the items in HL2's featureset (other than PS2.0) are those that the GeforceFX is weak in. If someone could list the DX9-level features that each are using, we could perhaps evaluate the performance of both on either IHV's video cards to determine if the performance levels shown by Valve are reasonable.

dorbie
09-11-2003, 05:37 PM
FUD = "fear uncertainty doubt", it refers to one competitor trying to spread fear uncertainty and doubt about another's product or business typically through rumour mongering, vague assertions and disparraging statements of all sorts. Generally it refers to unsubstantiated, unfair and mostly underhanded practices that lack real merit or impartiality.

This does not seem like FUD to me, it seems like an independent developer with a lot of credibility presenting their experiences, that it puts NVIDIA in a bad light doesn't mean it is FUD. Sometimes unfomfortable observations are just uncomfortable truths. Sure from ATI's perspective it is quite a coup, they paid a truck load of cash in their half-life deal and it looks like they are getting their money's worth, but that still doesn't mean the presentation was a pure FUD. There's a bit too much substance and valve has a bit too much credibility for that. I do find a couple of things in it a bit surprising so I'm reminded that sometimes people have axes to grind etc, and what you take away from this really depends on how you reguard platform specific optimizations and whether you trust a credible developer at an ATI conference who's received been a wad of cash from ATI.

FUD would be some marketing droid at ATI suggesting that NVIDIA schedules were slipping on NV40 and they were struggling to fund their engineering department as their key technologists jumped ship due to their deflated stock prices and treadmill project cycles. Of course I just made all that up, none of it is true, and it would be an example of FUD if anyone posted it in earnest.

[This message has been edited by dorbie (edited 09-11-2003).]

SirKnight
09-11-2003, 06:33 PM
More of a reason for me to stick with OpenGL. http://www.opengl.org/discussion_boards/ubb/biggrin.gif


-SirKnight

rgpc
09-11-2003, 07:35 PM
Originally posted by Korval:
Hence, no matter what Valve does, they aren't getting good performance out of an FX without writing nVidia-specific code. For OpenGL, there is a potential solution. For D3D, there is none.



You're the one who bought the FX5900-Ultra without thinking through the possible ramifications on the fragment shader end. It is unfortunate (especially considering what you paid for it), but you had the opportunity to weigh the avaliable evidence about its cross-platform fragment shading capacity. It's not our, or Valve's, or ATi's fault that you purchased the wrong card for what you are wanting to do.

Actually I bought it based on the performance figures from the DOOM3 benchmarks. But apparently I not only bought the wrong card, I talked Valve into ditching GL and concentrating on DX.

Nakoruru
09-11-2003, 08:14 PM
mmshls, you are not my mom, I don't know you, you have no right to call me names and expect me not to be offended. If you meant to offend me, admit it, or STFU.

Maybe you should have named the post "mmshls is a troll!!111"

I'm so glad that a call to elevate the tone of discussion is met with such an intelligent response. Maybe you should just post a picture of your ******* and save us all some time getting to whatever point you were trying to make.

M/\dm/\n
09-11-2003, 08:30 PM
http://www.gamersdepot.com/hardware/video_cards/ati_vs_nvidia/dx9_desktop/HL2_benchmarks/003.htm

Lets wait for Det 50 http://www.opengl.org/discussion_boards/ubb/mad.gif

dorbie
09-11-2003, 09:42 PM
Nakoruru, nobody called you an nvidiot, you posted AFTER those comments and assumed they were talking about you, hillarious.

Making negative comparrisons about a graphics card is not a personal attack, and nvidiot is a great term for people who lack objectivity and are excessively pro NVIDIA. I just wish we had an equivalent term for their counterparts in the ATI camp, I think it's fanATIc but it doesn't have the same ring. This thread wasn't troll, it's a newsworthy event.

You can prefer NVIDIA without being an nvidiot, just don't go around throwing pro NVDIA tantrums in place or reasoned debate and nobody will think you're one.

W.r.t. NVIDIA's response, it sounds very reasonable, and refreshingly frank & honest. The generic shader optimizer in Rel.50 sounds great (I suspected related work seeing the 3DMark updated PS2.0 results alongside NVIDIAs policy changes), if that is the outcome of the Futuremark debacle then it's a great one for everyone. Gabe may end up with egg on his face over this if he has the unreleased drivers and is pushing the old numbers, but nobody is ever going to come out of something like this untarnished. It's a snapshot in time in a dynamic situation. I'm glad to hear that NVIDIA can narrow this gap, it's important that they do for all of us.

[This message has been edited by dorbie (edited 09-12-2003).]

davepermen
09-11-2003, 09:56 PM
what i find rather interesting actually is..

matrox parhelia. everyone knows its a rather slow and bad card. everyone knows it does have good features anyways, espencially the displacementmapping, as well as some other stuff.

but it is rather simple: its a slow card, not worth the money. matrox made a "bad card"

why does it hurt that much to accept this happened to nvidia? the gfFX DOES indeed have good sides. it runs very well in dx8 class applications for example. it has no that good dx9 support. it has no good ARB opengl support. it does have tons of own extensions to expose its features. and its good that gl can give such access.

but its anyways a card that performs just bad in a standard application. be it a synthetic or game benchmark, or an own coded demo.

of course, nvidia tried to hide that, and tried it with marketing and cheating, as well as with REAL optimisation in the drivers. we have to see how good det50 really works, i wish them the best at least.

but i think we should just accept that the gfFX is not a such an amazing card. then, all the fanboycalling and crying and flamewars could simply stop. nobody flames around for matrox.

nvidia can do great next time. we'll see.

roffe
09-11-2003, 11:25 PM
After all the bad things said about the gfFX I'll add some good things about this card.For my current work I had to choose the FX because R3XX just didn't have the features. Shader speed wasn't my primary concern.

Nice FX features:
- Very long shaders
- Full 32 bit support through the pipeline for pixel shaders
- No limit on dependent texture lookups
- 128-bit floating point textures/render targets(ATI has this too,no?)

tfpsly
09-11-2003, 11:48 PM
Originally posted by davepermen:
I prefer to "optimize for ati", as at the same time it means "optimize for dx9 or arb gl" and like that "optimize for a save future"

I prefer that too. The only specific code I ever used is VAR, which got replace by VBO.

What suxx the most is not whether NvCards are terrible or not. But the fact that many people bought Nv cards - even slow low-cost 5200. So we're to program code that will run decently on these cards too (at least for the next few years), or our programs/games won't sell much.

davepermen
09-12-2003, 01:42 AM
Originally posted by roffe:
Nice FX features:
- Very long shaders
- Full 32 bit support through the pipeline for pixel shaders
- No limit on dependent texture lookups
- 128-bit floating point textures/render targets(ATI has this too,no?)
very long shaders are cool. 9800 can have unlimited shaders, but nobody knows how to enable that http://www.opengl.org/discussion_boards/ubb/biggrin.gif or does one?

full 32bit support is essencially death with det50, as the general shader optimizer will determine itself in driver how precious a certain shader has to be. hopefully we can choose in drivers to disable lowering of quality, else, THE main feature of the gfFX got essencially killed..

floatingpoint textures are much bether done on r3xx, as we can have floatingpoint 1D,2D,3D,RECT,CUBE textures. nvidia can only have RECT. i love the floatcubemaps.. HDR-envmaps that means..

but yes, never forget the gfFX does have some good features. its goods are just way off what any normal gl or dx app will need, and its hw is way off designed to what should be fast and what not. thats bad.

M/\dm/\n
09-12-2003, 01:46 AM
daveperman: UHHHHH, AAAAAHHHHHHH NICE AND SLOW http://www.opengl.org/discussion_boards/ubb/biggrin.gif
http://www.tech-report.com/etc/2003q3/hl2bench/index.x?pg=1 whats all the security about?

I wana see 3dfx + HL2 http://www.opengl.org/discussion_boards/ubb/biggrin.gif

davepermen
09-12-2003, 01:47 AM
Originally posted by tfpsly:
I prefer that too. The only specific code I ever used is VAR, which got replace by VBO.
i had some specific paths for RC on my gf2.. i cannot watch my stunning perpixellighting demo now anymore.. http://www.opengl.org/discussion_boards/ubb/frown.gif and i don't have the source anymore due hd crash.


What suxx the most is not whether NvCards are terrible or not. But the fact that many people bought Nv cards - even slow low-cost 5200. So we're to program code that will run decently on these cards too (at least for the next few years), or our programs/games won't sell much.[/B]
just as we had to code proprietary for all geforces to gain access to features standard in dx8, to get fast speed (VAR), etc. just as we had to work around the hw bugs of them, etc..

but what suxx the most is that nvidia cannot stand in front of all and say "okay, we agree, our current generation has some big faults. we'll try our best now, and stop doing stupid falsepropaganda". now they bitch on valve, as they did on futuremark. what do they want to bitch on next?

davepermen
09-12-2003, 02:52 AM
Originally posted by M/\dm/\n:
daveperman: UHHHHH, AAAAAHHHHHHH NICE AND SLOW http://www.opengl.org/discussion_boards/ubb/biggrin.gif
http://www.tech-report.com/etc/2003q3/hl2bench/index.x?pg=1 whats all the security about?

I wana see 3dfx + HL2 http://www.opengl.org/discussion_boards/ubb/biggrin.gif

what do you want to say? this is all nonsense. thanks for the link, though

M/\dm/\n
09-12-2003, 03:19 AM
Me wana zay thzat zyztemz are f**d up http://www.opengl.org/discussion_boards/ubb/biggrin.gif

BTW, prelimenary tests showZ Dets 51 are about 15% faster in vp/fp. In usual benches where gap between 9800&5900 is around 15%

davepermen
09-12-2003, 03:35 AM
learn to speak english. why are the systems ****ed up? or what exactly?

and it looks like det50 doesn't guarantee 32bit floatingpoint math anymore. thats rather dissapointing. we'll see.

CatAtWork
09-12-2003, 03:41 AM
Ouch. Davepermen criticizing someone else's English. http://www.opengl.org/discussion_boards/ubb/smile.gif

davepermen
09-12-2003, 04:16 AM
i do at least try to speak english.. he works with english as nvidia works with standards.

zeckensack
09-12-2003, 04:40 AM
Originally posted by davepermen:
just as we had to code proprietary for all geforces to gain access to features standard in dx8, to get fast speed (VAR), etc. just as we had to work around the hw bugs of them, etc..I see a significant difference here, and it bothers me a lot:
Using an NVIDIA proprietary feature is an optimization. You detect an extension, you use it, program runs faster, fine.

Using ARB_fragment_program on GeforceFX cards is just nuts. You must include an off switch for an otherwise completely automatic feature, or users will send bags full of disrespect.
"My FX5200 is very slow, why?"
*snickers*

Ostsol
09-12-2003, 05:07 AM
Originally posted by roffe:
After all the bad things said about the gfFX I'll add some good things about this card.For my current work I had to choose the FX because R3XX just didn't have the features. Shader speed wasn't my primary concern.

Nice FX features:
- Very long shaders
- Full 32 bit support through the pipeline for pixel shaders
- No limit on dependent texture lookups
- 128-bit floating point textures/render targets(ATI has this too,no?)
Just curious: is your work more experimental or meant for pre-rendered scenes? Using very long, 32 bit shaders and lots of dependant texture lookups resulting in framerates too low to be practical in a realtime environment doesn't seem too good. . .

Nakoruru
09-12-2003, 05:35 AM
Dorbie, I did not take the term 'nvidiot' personally. I am not a fanboy, so why would I take it personally? Especially something which was posted BEFORE I posted that did not mention my name. You must think I'm a regular idiot http://www.opengl.org/discussion_boards/ubb/smile.gif

My response about being 'offended' was just a response to the stupid analogy about being teased by one's mom.

The post may be a legitimate news item, but the way it was presented was trollish. The response to my suggestion that it was trollish WAS a personal attack.

I simply do not think that Valve's benchmark results translate into a general evaluation of NV30's performance. I think that it only reflects a single developer's experience developing a specific engine for a specific game. I could call anyone who thinks otherwise a 'fanATIc', but that fails to explain anything, so why bother.

Maybe this will all become moot once nVidia releases its new drivers.

bunny
09-12-2003, 05:55 AM
Interesting how this post has really got the nvidia zealots out of the woodwork. NVidiot really hits the nail on the head. The OP was pretty objective, as was the article, yet it's amazing to see so many people taking it personally.

Nobody wants to rewrite their shaders to work for one specific platform. Shaders are supposed to be cross-platform. If we wanted to optimise for one particular card then there would be commercial games using the register combiners.

If nvidia really expect commercial developers to write shaders specifically for their platform then they're making the same mistake
3dfx made with glide, and they may well suffer the same fate. Real developers simply have better things to do, as Valve has pointed out quite nicely. NVidia is simply smoking crack if they think commercial developers are going to jump through hoops to get our code to work on their platform.

Hopefully their next generation of cards will be better. It would be a shame to see such a great supporter of OpenGL go under because of some bad design decisions.

[This message has been edited by bunny (edited 09-12-2003).]

Zak McKrakem
09-12-2003, 07:13 AM
Originally posted by bunny:
Interesting how this post has really got the nvidia zealots out of the woodwork. NVidiot really hits the nail on the head. The OP was pretty objective, as was the article, yet it's amazing to see so many people taking it personally.

Nobody wants to rewrite their shaders to work for one specific platform. Shaders are supposed to be cross-platform. If we wanted to optimise for one particular card then there would be commercial games using the register combiners.

If nvidia really expect commercial developers to write shaders specifically for their platform then they're making the same mistake
3dfx made with glide, and they may well suffer the same fate. Real developers simply have better things to do, as Valve has pointed out quite nicely. NVidia is simply smoking crack if they think commercial developers are going to jump through hoops to get our code to work on their platform.

Hopefully their next generation of cards will be better. It would be a shame to see such a great supporter of OpenGL go under because of some bad design decisions.



Well said... When they bought 3dfx core assets and got some of their engineers, I think they got the wrong parts/people: their hw design is not good, they have lost their 6 months cycle (they were really late with gf fx), they focus their driver team to include 'optimization' for some games/benchmarks instead of adding some new features (in the past, glslang would be available the same day it was announced. There are some D3D9 functionality still missing) and optimizing the current ones. And they are defending thing with incredible arguments. Read their response: http://www.gamersdepot.com/hardware/video_cards/ati_vs_nvidia/dx9_desktop/HL2_benchmarks/003.htm
I think they should think before say this kind of things: "Regarding the Half Life2 performance numbers that were published on the web, we believe these performance numbers are invalid because they do not use our Rel. 50 drivers. Engineering efforts on our Rel. 45 drivers stopped months ago in anticipation of Rel. 50. NVIDIA's optimizations for Half Life 2 and other new games are included in our Rel.50 drivers".
Seems that they have not take notice of the people saying they donít want some kind of optimizations/cheats for specific applications.


Iím sorry for them, but now that Iíve seen my OpenGL applications working perfectly, with all the extensions used, in the Radeon 9800 (for the first time in ATIís life), I will change my noisy GF FX 5800 for one of those cards.

Korval
09-12-2003, 08:24 AM
I simply do not think that Valve's benchmark results translate into a general evaluation of NV30's performance. I think that it only reflects a single developer's experience developing a specific engine for a specific game.

But we're talking about a known problem with FX hardware. Any game that attempts to use D3D 9 shaders or ARB_fp will experience slower performance on an FX than a Radeon.

The Valve benchmark is only a symptom of a well-known problem.


Seems that they have not take notice of the people saying they donít want some kind of optimizations/cheats for specific applications.

That. Or that they have found a "solution" to the whole fragment program precision problem. Which means that they are probably going to try to dynamically determine the necessary precision of each register and allocate it accordingly. Which is a non-trivial undertaking.

What I have suggested appears to be what nVidia has done, and they infer that in the language of their reply to the benchmarks.

davepermen
09-12-2003, 09:08 AM
Originally posted by Korval:
That. Or that they have found a "solution" to the whole fragment program precision problem. Which means that they are probably going to try to dynamically determine the necessary precision of each register and allocate it accordingly. Which is a non-trivial undertaking.

definitely hard work, espencially as no compiler till now ever optimized for similar constraints, but exactly for the other constraints.. on gfFX, doing calculation several time to save registers instead of storing intermediate values can gain speed.. urgh.

and, independend on hard work, its NOT what we want.
THE most powerful feature of the gfFX is the 32bit float fragment program. its THE reason why people bought it in scientific areas. and it looks like they now dropped that and determine in drivers if they only need partial precicion in some parts of shaders. that makes math inconsistent, and less deterministic than ever. unusable for any scientific calculation. bether use lowerprecicion ati then..

i'm not sure about this, thought. we'll see WHAT nvidia mixed together for det50.. but they cannot get rid of the fact that they fight against a one year old gpu, and still don't beat it really..

Elixer
09-12-2003, 09:52 AM
Originally posted by Zak McKrakem:

Iím sorry for them, but now that Iíve seen my OpenGL applications working perfectly, with all the extensions used, in the Radeon 9800 (for the first time in ATIís life), I will change my noisy GF FX 5800 for one of those cards.

Just wondering, you using those cat 3.7's and you see no issues with openGL apps? This would be a refreshing change!


P.S. I'll take your 5800 off your hands http://www.opengl.org/discussion_boards/ubb/smile.gif

Korval
09-12-2003, 10:03 AM
[quote]and, independend on hard work, its NOT what we want.[/quote

Sure it is. If nVidia could correctly determine 100% of the time which registers could be fixed, half, or float, then you wouldn't mind. Apps that need float precision get it, becuase nVidia correctly determined that they need it. Apps that only really need fixed get it.

Now, the unfortunate fact is that there is no way to determine with 100% when you need which precision. If it is based on the incoming data from a texture, you would have to scan the texture to determine the application's needs. If it is based on a vertex program, you can never really know for certain.

Perhaps the driver will have a slider that allows you to set which side for the shader compiler to err on: performance or quality. Quality would mean that, unless the driver can absolutely determine that the computation can get away with less than 32-bit precision, it will use 32-bit precision. Performance would mean that, unless the driver finds 100% proof that a computation needs half or float precision, it uses fixed.

V-man
09-12-2003, 10:13 AM
Originally posted by Elixer:
Just wondering, you using those cat 3.7's and you see no issues with openGL apps? This would be a refreshing change!


That was gone be my post.

Arguments are made that this card is faster than that (like all those dumb benchmarks you find on the net) and this card can compute this more precisely than that, and this card is more noisier than that,

*but* somehow people always forget to throw in the bug list in the mix.

Without good drivers, any product can look like crap.

davepermen
09-12-2003, 10:21 AM
Originally posted by Korval:
Perhaps the driver will have a slider that allows you to set which side for the shader compiler to err on: performance or quality. Quality would mean that, unless the driver can absolutely determine that the computation can get away with less than 32-bit precision, it will use 32-bit precision. Performance would mean that, unless the driver finds 100% proof that a computation needs half or float precision, it uses fixed.

i really hope for such sliders.. but they wheren't even able to provide correct sliders for fx cards till now (or, they overwrote them depending on app..)..

i really hope for such sliders..

no precicion loss can result in 100% the same image. so for scientific calculations, all such optimisations has to be able to get turned OFF, no mather if u use ps2.0, ARBfp or NVfp (NFfp should turn them off anyways, shouldn't it?)

zeckensack
09-12-2003, 10:48 AM
Dave,
It's "better" and "matter" respectively.
I'm sorry for this, but you frequently make these mistakes and I just can't stand it any longer http://www.opengl.org/discussion_boards/ubb/smile.gif

roffe
09-12-2003, 11:16 AM
Originally posted by Ostsol:
Just curious: is your work more experimental or meant for pre-rendered scenes? Very experimental research http://www.opengl.org/discussion_boards/ubb/smile.gif I'm working on a master thesis.



Using very long, 32 bit shaders and lots of dependant texture lookups resulting in framerates too low.
I'm looking at ways to do rendering that are usually done off-line. See my web page for more details.

Korval
09-12-2003, 11:39 AM
Without good drivers, any product can look like crap.

Bad drivers can be fixed. ATi's drivers are worlds better than they were even 6 months ago, let alone a year ago. And their driver relations people are quite good at responding to test apps, too.

Bad hardware can't be fixed; it must be replaced. Either that, or complicated drivers must be written to hide the badness. And complication always leads to more bugs.

Riff
09-12-2003, 11:58 AM
Any range analysis performed will be an analysis on dynamic precision requirements. The contents of the registers aren't needed to do this analysis, so it can be done as a one-time compile.

All thats needed is knowledge of the precision of the initial inputs and rules that determine the precision and range of the output of each instruction.

The possible inputs are texture component values, constants, and intermediates. The precision of intermediates can be calculated using the precision information of the instructions performed and their operands.

Constant precision is easily computed based on the number of significant digits.

Texture precision presents a little bit of a problem due to the precision being based on the texture format. The driver could either assume full floating point precision for textures or force a recompile prior to execution if the format associated with a referenced texture unit has changed since the last compile.

Once you know the precision and ranges, the analysis is not much different than fixed point analysis other than taking branches into consideration.

Don't make the mistake of thinking that the optimizer is going to arbitrarily downgrade float to half randomly at whim. Not only is it an intractable problem but the compiler would have to be able to infer the larger-purpose of the shader and even then the outcome would be largely subjective. No, the compiler will simply determine if precision can be maintained using half registers instead of full floats. Even a simple implementation of this analysis would improve shader efficiency without affecting output precision at all.

Also, it may be that fragment programs are currently being executed in-order with no optimizations at all. I believe that the det50 performance increase probably comes mostly from reordering instructions to reduce dependencies, stalls, and possibly reduce the number of registers used. Adding precision/range analysis to the register allocation phase is simply icing on the cake.

Of course, I could be wrong and maybe Det50 really does implement some SWAG method to determine what the user really 'needs' with the optimized shader output being different from the non-optimized in-order code. If this is the case, there better well be a means for turning it off otherwise a lot of people will get pissed off when their GPU based precision-sensitive order-dependent simulation code spits out botched results due to the driver's assumption that the shader is too slow.


[This message has been edited by Riff (edited 09-12-2003).]

davepermen
09-12-2003, 12:31 PM
Originally posted by zeckensack:
Dave,
It's "better" and "matter" respectively.
I'm sorry for this, but you frequently make these mistakes and I just can't stand it any longer http://www.opengl.org/discussion_boards/ubb/smile.gif

thanks.. i never learned english in school or so, just from web, movies, and music. so i'm sorry for all faults i make, i always try my best. but espencially the web isn't the best resource for correct grammar or spelling.. espencially with madman freaks in here.. penetrating such a nice language.

pkaler
09-12-2003, 01:29 PM
Originally posted by Korval:
Bad drivers can be fixed. ATi's drivers are worlds better than they were even 6 months ago, let alone a year ago. And their driver relations people are quite good at responding to test apps, too.


I'll ditto that. I'm having no problems at work whatsoever. Their Linux drivers seem to be improving as well so I'll consider purchasing one for home my next time around.


Gabe Newell:
Valve was able to heavily increase the performance of the NVIDIA cards with the optimized path but Valve warns that such optimizations won't be possible in future titles, because future shaders will be more complex and will thus need full 32-bit precision.

That's a knock against ATI as well. Since their shaders only run with 24-bit precision.

And the future is the future. The shaders will change anyway because you can assume that 32-bit precision will perform well for both ATI and NVIDIA so you can do more fancier stuff in your shaders.

Ostsol
09-12-2003, 02:54 PM
Originally posted by PK:
And the future is the future. The shaders will change anyway because you can assume that 32-bit precision will perform well for both ATI and NVIDIA so you can do more fancier stuff in your shaders.
I've gotta wonder if THG was actually there or if they are merely trying to interpret the presentation slides. If they were actually there and heard Gabe say "32-bit precision", I'd concede my point. However, if they weren't actually there. . . there's really nothing to indicate that Gabe said that 32 bit precision will be needed. After all, all the slides say is that partial precisions will be less and less practical, implying a need for full precision to be used more and more often. "Full precision" is, of course, variable depending on the video card.

1234!
09-12-2003, 03:54 PM
What realy surprises me, are the low framerates for the FX's. I guess Valve could have used less eyecandy so it would still be playable even on a pesky FX5200 (using the DX9 path).

On the other hand, the screenshots I have seen so far surely look impressive. Its even more impressive that the R3XX's can run the game at >50 fps.

Regarding the so called "application specific optimizations" where shaders get "optimized" or even "replaced".

I am not an lawyer but I wonder if that is legal (looks at the DMCA).

Even if it is, it sure gives a whole new perspective to Nvidia's "The way it's meant to be played" campain...

*SCNR*

rgpc
09-12-2003, 04:05 PM
Originally posted by M/\dm/\n:
Lets wait for Det 50 http://www.opengl.org/discussion_boards/ubb/mad.gif

Geez, now you've done it M/\dm/\n - you've given us both sides. How can we have a misinformed, highly speculative arguement if we know more than one side. http://www.opengl.org/discussion_boards/ubb/wink.gif

Korval
09-12-2003, 08:45 PM
How can we have a misinformed, highly speculative arguement

I don't know where you got this whole "misinformed, highly speculative" stuff from, but you must clearly be reading the wrong thread.

The performance problems with nVidia's fragment programs are not speculative, nor are they misinformed. They are documented facts. That nVidia may have a driver "solution" (one that is in violation of the ARB_fp spec) does not change this fundamental fact. That Valve happens to like talking up ATi technology doesn't change it either.

Independent performance tests show that nVidia hardware lags ATi's under HL2 by a significant margin. It is no longer a topic for debate; it is a verified fact.

To call this discussion "misinformed" or "speculative" (except for speculation about how Det50 will improve FX performance) is not only inaccurate, but biased as well.

santyhamer
09-13-2003, 02:35 PM
ok ok, this is ALL the truth about HL2 and doom3.... these games are IMPRESSIVE because are interactive MPGEd renders... muahahahahahaha.... 3dsmax and maya uses almost vs/ps v9999 in render!!!! hehe

muhahaha I am crazy :P

davepermen
09-13-2003, 02:40 PM
santy, you know that those games aren't everything.. but the cards are available for real since some while. and its for real testable with own coded applications how good it performs in a certain task. and it shows similar behaviour in about EVERY test as in doom3 and hl2.

means if you tweak and code with proprietary stuff, and let the good stuff of an fx card drop (high precicion), you get faster than on a radeon. but if you try to do calculations in comparable quality, that means at least 16bit floats everywhere, the gfFX cards cannot beat the radeons anymore.

this is fact. done by myself, seen by others, too.

in any normal opengl or dx9 app gfFX suck. if you optimize for them, means dropping any quality, then you can outperform radeons.. but at noway comparable image quality.

Humus
09-13-2003, 03:12 PM
Just busting in to tell my experience.
My demos tend to run at least 2x faster on the 9700 than on the 5900, sometimes up to 6 or 7 times as fast. I do no particular ATI optimization.

Zengar
09-14-2003, 04:20 AM
I would like to speak about optimisation issue one more time. The problem of optimising for NV30 is reducing the amount of temp registers, right? fp16 is not really faster than fp32 as my personal experience shows. I downloaded some demos(one that run 50 fps on radeon and 2-3 on FX5200). So, on my FX5600 they did 8-10 fps. I rewrote the sahder, eliminating 7 of 10 temporaries and I got 20-25 FPS. It took me about 10 minutes. So, I am pretty shure that people that are 5 times better than me are able to do similar kind of task. I never saw more than 20% bost to performance according to Valve data.
I agree, FX sucks. I bought an FX and after two days I wished I could have bought a Radeon. But - the point is - this cards CAN deliver good performance if you want them to(I don't speak of FX5200 http://www.opengl.org/discussion_boards/ubb/smile.gif) - one must just play with shaders a bit. That's why I slightly misunderstand the behaviour of Valve.

Nutty
09-14-2003, 05:24 AM
Interesting Zengar.

I wonder if its todo with the way the two cards store temporaries. Perhaps the NV cards writes them out to some local cache memory, while the ATI card has actual registers for it. Just speculating..

davepermen
09-14-2003, 05:47 AM
Originally posted by Nutty:
Interesting Zengar.

I wonder if its todo with the way the two cards store temporaries. Perhaps the NV cards writes them out to some local cache memory, while the ATI card has actual registers for it. Just speculating..



there is much information on how the cards work, all from measurements, yep, but anyways, it is enough to KNOW how it performs in performance..

fact is, fixedpoint runs at about 2x the speed than floatingpoint. and they should have dropped instead fixedpoint completely, and doing twice the floatingnpoint support. i see NO need in fixedpoint. radeon prooves me right.


about the optimizing. yes, temporary registers are a main-thing. but not everything (again, fixedpoint == 2x floatingpoint, and others). and for tons of shaders of all different kinds its rather boring to rewrite them in all possible ways to see in wich they simply suck and in wich they are acceptable...

yes, fx sucks. it forces to go lowlevel where there would be no need.

Ostsol
09-14-2003, 05:49 AM
Originally posted by Nutty:
Interesting Zengar.

I wonder if its todo with the way the two cards store temporaries. Perhaps the NV cards writes them out to some local cache memory, while the ATI card has actual registers for it. Just speculating..
Exactly what I've been thinking! http://www.opengl.org/discussion_boards/ubb/biggrin.gif http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Elixer
09-14-2003, 11:27 AM
Here are some other benchmarks from Aquamark, that might just show you how nvidia can tweak/optimize their drivers.
http://www.driverheaven.net/articles/aquamark3/index2.htm

He said a 40% jump between 4x and det 50's.

Oh I forgot to say that Aquamark3 uses DirectX9 Pixel Shader 2.0, so this might just show what HL2 will gain.

On another note, remember when Valve said that Nvidia cards had problems with the textures? They said they will release a .avi of it showing the difference, and now they say it works fine?? Hmmm


[This message has been edited by Elixer (edited 09-14-2003).]

1234!
09-14-2003, 12:30 PM
Aquamark3 has only four PS2.0, in fact there are only 37 shaders in total as the engine also uses DX7 features to a great deal.

This just shows you how missleading benchmarks can be if you dont know >what exactly< is beeing benchmarked.

I only thrust benchmarks I faked on my own, thank you!

And no I'm neighter an fanATIc nor an Nvidiot I only deal with facts not fictions.

castano
09-14-2003, 01:17 PM
I just got a gfx and wrote a special path for it. The only changes were in the normalizations and exponentiations (piece of cake). Before those changes, it was about 3 times slower than the 9700, but now both cards are on par. I still haven't enabled the shadows, but I've heard that the gfx is faster drawing unshaded triangles, and with the nv_depth_bounds extension I expect a big performance improvement. So, I wouldn't be surprised if the gfx ends up running faster than the 9700.

Korval
09-14-2003, 04:00 PM
fact is, fixedpoint runs at about 2x the speed than floatingpoint. and they should have dropped instead fixedpoint completely, and doing twice the floatingnpoint support. i see NO need in fixedpoint. radeon prooves me right.

How does the Radeon "prove" that there is no need for fixed-point? An FX will beat a Radeon if it uses all fixed-point (and with an eye to temporaries), so clearly the card is better at fixed-point than the Radeon is at floating-point (note: the high-end FX's have a higher core clock than the higher-end Radeons, so that could explain the difference. But, let's ignore this for the purposes of this discussion).

Clearly, the FX was not designed with an eye towards the PS2.0 spec (ARB_fragment_program was probably not even an issue during the FX's design phase). Instead, it was designed with an eye to optimizing what the user needs.

People may say that their "highly complex" shaders need at least 24-bit floating-point. They tend to be quite mistaken. Any operations on colors as colors rather than as arbiturary data can be done just fine with 12-bit fixed point. The exponent on floats doesn't matter since the colors are all on the range [-1, 1], so you don't need those bits. And, without an exponent that matters, all 24-bit floats function like slower 16-bit integers. So you're only losing 4 bits of precision by going to 12-bit fixed point. This works for normals just as well as colors, too, since normals are normalized on the range [-1, 1], so 12-bit normal operations are quite reasonable.

For non-color tasks (or tasks that stray outside the range [-1, 1], or tasks that really need the precision), fixed-point tends to be a problem. But, how often is a fragment shader doing non-color/normal based tasks? If you look at most fragment shaders, they're doing color operations 90+% of the time.

The only place where this becomes an issue is if you're doing HDR rendering that creates color data outside of the [-1, 1] range. However, since no hardware can display a floating-point framebuffer, actually doing HDR becomes difficult. Also, there's no operation to find the brightest and dimmest pixels on the framebuffer, so doing the HDR scaling by hand is not really an option (at least, not without massive performance penalties, outside of the use of floating-point).

That is not to say that HDR can't be done at all on modern hardware (at the very least, you can assume a particular minimum and maximum for a scene, based on some knowledge you have about that scene). However, that is not to say that doing HDR should be the fastest path avaliable. Why should modern hardware optimize something that it isn't terribly ready for yet anyway? When HDR rendering becomes commonplace and expected of applications, then floating-point computations should be the fastest path in hardware. Until then, using fixed-point is hardly a significant image-quality hit.

castano
09-14-2003, 07:51 PM
I think that one of the reasons why the gfx has more detractors is because it's harder to program. You have different data types with different performances, plenty of instructions and using texlookups is faster than some of the equivalent instructions. But anyway, when talking about performance, the gfx is excelent for my *current* needs.

M/\dm/\n
09-14-2003, 09:53 PM
FX can show some nice thingies in future, when you'll be writing high-lavel GLSlang code & it'll be compiled by driver (CG is doing a lot in this direction allready).
But I guess it's not the case with DX, at least 9.0.

davepermen
09-14-2003, 10:51 PM
Originally posted by Korval:
How does the Radeon "prove" that there is no need for fixed-point? An FX will beat a Radeon if it uses all fixed-point (and with an eye to temporaries), so clearly the card is better at fixed-point than the Radeon is at floating-point (note: the high-end FX's have a higher core clock than the higher-end Radeons, so that could explain the difference. But, let's ignore this for the purposes of this discussion).
if they are clocked at the same speed, i get about the same performance for 24bit floats on the radeon than 12bit fixeds on the gfFX card. tell me now one reason i should stick with an fx card. there is no gain in fixedpoint, except possibly performance. if doubles would not be slower than float, everyone would use them. the only reason to not use the highest quality is performance/storage. and ati shows that you can get high performance at high quality. there's no need to step back


Clearly, the FX was not designed with an eye towards the PS2.0 spec (ARB_fragment_program was probably not even an issue during the FX's design phase). Instead, it was designed with an eye to optimizing what the user needs.
uhm.. what DOES the user need then? i'd say a card wich fits the dx9 and opengl specs exactly and provides rock solid performance in code that fits those specs, as this is exactly what the user expects. if i'd buy an f1-car and it can only be fast when you go and drive rally, then this is NOT what i expected. an f1-car has to run fast in an f1-race, not in rallies.
no fx card runs fast by default in a general dx9 or opengl app. sure in some they are about equal to ati cards, namely those that don't require dx9. but any dx9 requiring app makes them suck.


People may say that their "highly complex" shaders need at least 24-bit floating-point. They tend to be quite mistaken. Any operations on colors as colors rather than as arbiturary data can be done just fine with 12-bit fixed point. The exponent on floats doesn't matter since the colors are all on the range [-1, 1], so you don't need those bits. And, without an exponent that matters, all 24-bit floats function like slower 16-bit integers. So you're only losing 4 bits of precision by going to 12-bit fixed point. This works for normals just as well as colors, too, since normals are normalized on the range [-1, 1], so 12-bit normal operations are quite reasonable.
yeah. first, i switched to hdr completely myself, and second, why forcing the user to choose and think about it while they CAN forget about it and just do it in 24bit at high quality fast on the concurence?. it all works awesome on an ati. its just the fx that is not able to do the work good.

i wouldn't say ANYTHING if an fx would be as fast ati in floatingpoint, and THEN it would be twice as fast in fixed point. because THAT is what extensions are for. to expose additional features and additional performance. but by default, the FX should be as good as the ati. it is NOT. you have to go down to fixedpoint and tweak around to get up to the ati. wich can do all those tasks at high quality in floatingpoint with no mess and nothing.
[QUOTE]
For non-color tasks (or tasks that stray outside the range [-1, 1], or tasks that really need the precision), fixed-point tends to be a problem. But, how often is a fragment shader doing non-color/normal based tasks? If you look at most fragment shaders, they're doing color operations 90+% of the time.
and if you think what you all can do with them, you should realize that fixedpoint has as much to do there in as it has to do in vertexshaders: nothing. except, as i said above, as extension. but FIRST is has to be rocking fast in fpsupport.

the fx has NO thought for future developments. not even for todays ps2.0 developments. its a dx8 designed card.



The only place where this becomes an issue is if you're doing HDR rendering that creates color data outside of the [-1, 1] range. However, since no hardware can display a floating-point framebuffer, actually doing HDR becomes difficult. Also, there's no operation to find the brightest and dimmest pixels on the framebuffer, so doing the HDR scaling by hand is not really an option (at least, not without massive performance penalties, outside of the use of floating-point).
no brutal performance penalties. and the filters don't hurt much eighter. its no problem to map hdr to the screen. just do as a camera does, a little exposure, a little glow, etc. the result looks great.


That is not to say that HDR can't be done at all on modern hardware (at the very least, you can assume a particular minimum and maximum for a scene, based on some knowledge you have about that scene). However, that is not to say that doing HDR should be the fastest path avaliable. Why should modern hardware optimize something that it isn't terribly ready for yet anyway? When HDR rendering becomes commonplace and expected of applications, then floating-point computations should be the fastest path in hardware. Until then, using fixed-point is hardly a significant image-quality hit.
uhm, no. because hdr is something dx9 fits very well for. and i bought a dx9 card so i'd expect it to run fast in all possible dx9 situations. wich hdr-situations of today definitely ARE.
there's written "Direct X 9" on the box of my gpu. i'd be rather dissapointed if my card now would not be able to show those dx9 effects fast, and smooth. and i see tons of people wich expected exactly THAT when they bought a gfFX. and believe me, nobody is happy about it. and when i show them that my one year old radeon9700pro runs those dx9 thingies faster, and nicer, they till now ALL sold their gfFX again and bought a radeon. i've NEVER seen people yet selling what they just bought, in NO place ever.

about the fixed point:
a friend and i thought that fixedpoint is faster than floatingpoint possibly on todays cpus. so we coded a raytracer in fixedpoint (but with help of typedefs we could switch to floats quickly). we worked hard on it, and it ran nice. 50fps. and it even looked nice. till the scene got bigger, and we realised we get at the edge of (32bit) fixedpoint. hm.. funny banding-effects where the result. looked awesome http://www.opengl.org/discussion_boards/ubb/biggrin.gif but we had 50fps! for small scale scenes it looked great. some iritating precicion errors in this and that intersection routine, but in the end, we could minimize them so that it wasn't much visible anymore, and it looked great.

THEN we switched to floatingpoint by just retypedefining our "FP" type. and with no future work, all of the bugs where gone (obviously). and we had? exactly: 50fps as well. we could drop all the funny /FP_ONE and *FP_ONE everywhere in code, all got much simpler, precicion was no issue now, scenes could get big, and performance stayed the same.

guess what? fixedpoints are USELESS in such a situation. not because they have no use. but because they have not more performance, and all of their uses can be done with floats bether.

now tell me i have to code for an fx card with fixedpoint 12bits while i can have the same performance with 24bit floats on the radeon, and don't have to mess around that much with precicion and all.

there is no reason to do. less work, bether quality, higher performance, and fitting every standards.

again, i'd say NOTHING against fixedpoints as additional feature. but requiring fixedpoints to catch up in performance against a 24bit floatingpoint hw, THAT shows a bad hw design.

Korval
09-15-2003, 02:08 AM
if they are clocked at the same speed, i get about the same performance for 24bit floats on the radeon than 12bit fixeds on the gfFX card.

The reason I disregarded this issue is because it is a non-issue. While clock-for-clock, an FX in fixed-point is only as fast as a Radeon, FX's are not clocked the same as Radeons; FX's are clocked higher. Perhaps not having all that floating-point logic allows nVidia to clock them higher, thus making fixed-point advantageous, as they get to have a higher core clock speed.


ati shows that you can get high performance at high quality.

There is one point I think you should consider: an nVidia card can run a 1024-opcode fragment program. An ATi card can only run something like a 64-instruction program. Now why do you suppose that is?

I have speculated in the past (though not publically) that the R300 (fp wise) is nothing more than an R200 that operates on 24-bit floats and can do more passes. Indeed, even back in the R200 days, I suggested that all ATi needed for the next generation was more passes and more instructions. Maybe there are a few other modifications, but that's the basic idea. If you look at what the hardware can do compared to the R200, they do look very similar. Especially when you factor in the need for texture "dependencies", which simply disguises the underlying "pass" logic of the R200 line.

Meanwhile, the NV3x is fundamentally very different from the NV2x line, in terms of per-fragment processing (with the exception of still having register combiners around).

The R200/300 model, if you think about what the "pass" architecture means in terms of hardware design, is not very scalable in terms of number of instructions. Now, supposedly, there is some way for 9800's to implement an f-buffer that offers them infinite instructions. However, doing this must have a significant performance penalty over what there would be if the hardware allowed more opcodes.

By contrast, the nVidia architecture appears much more scalable and flexible.

What this ultimately means is that, for the R400 or the R500, ATi's going to have to completely rebuild their fragment program logic in order to keep up. The R200/300 design, for one, isn't going to be able to allow for branching or dynamic looping. nVidia has already rebuilt their per-fragment logic into something more scalable. Now, this scalable and flexible design, like anything, has costs. Some of those costs probably come in terms of volume of silicon. However, with this increase in scalability comes ease of implementing new features in the future.

Cry about having to use fixed point today all you want; I think nVidia's fragment programs are on a better path than ATi's and you will, in future NV4x hardware, likely have more features avaliable than R400. Unless ATi completely redesigns their architecture.


i'd say a card wich fits the dx9 and opengl specs exactly and provides rock solid performance in code that fits those specs, as this is exactly what the user expects.

First of all, user requested features should drive hardware progress, not what some 3rd party says should be the user interface (ie, API writers, either Microsoft or the ARB). Why? Because it gives API writers too much power. In the case of D3D, it gives someone who should be a neutral party in the video card industry (at least, until they joined with nVidia on the X-Box, and were later sued by nVidia, which ticked Microsoft off) too much power. In the case of OpenGL, it gives all of nVidia's compeditors the opportunity to screw nVidia over; an opportunity that they have not hesitated to take advantage of in recent months/years.

Secondly, it has never been the case that if a card supported feature X in hardware that it was useable in practice. The TNT1 supported 32-bit rendering; however, the practicality of using it was simply not there. Granted, for the newest release of fragment programs to perform well below expectations is not a good thing. However, as we have pointed out, this is the API's fault, not nVidia's. If cross-hardware API's allowed for specifying fixed-point, then they would be fine.


first, i switched to hdr completely myself

Good for you. What about the vast majority of everyone else (including the "great" John Carmack) who considers the benifits of HDR (when not supported by hardware) to be too great compared to not using it. It's not like it's a given that everyone should switch to HDR; it is merely one of many choices, each of which has a cost and a reward.


why forcing the user to choose and think about it while they CAN forget about it and just do it in 24bit at high quality fast on the concurence?.

Because, as I pointed out, everything has a cost associated with it. ATi threw everything into floating-point. As such, they lose on number of instructions and the general flexibility of their shader architecture.


i wouldn't say ANYTHING if an fx would be as fast ati in floatingpoint, and THEN it would be twice as fast in fixed point.

The question is one of trade-off. Currently, FX cards are faster than Radeons when using fixed-point, and slower when using floating point.


the fx has NO thought for future developments.

I disgree with your assumption that the future resides in floating point (maybe if HDR is in hardware). But, even under that assumption, your statement is still false. The FX can do floating-point operations. If it weren't designed for the future, it wouldn't even have that capability.

Granted, the float capabilites were probably added because you can't do decent math on texture coordinates in 12-bit fixed-point. But that's beside the point.


a friend and i thought that fixedpoint is faster than floatingpoint possibly on todays cpus.

I could have told you that it wasn't true. This hasn't been true for quite some time, because integer (ie, fixed point) math is as fast as it needs to be. Non-game applications don't use floating-point much, and games need to use it a great deal. As such, the apps that require performance (ie, games) are optimized for, so floats are optimized. Indeed, with things like SSE and 3DNow, vector floating-point operations can be faster than integer.

GPU fragment programs aren't at that point yet.


but requiring fixedpoints to catch up in performance against a 24bit floatingpoint hw, THAT shows a bad hw design.

It is different hardware design, not bad; there are advantages and disadvantages to both sides. It is only bad depending on how much weight you assign to having floating-point operations in your fragment programs.

Personally, all I wanted out of DX9-era fragment programs last yeat was more instructions and arbiturary texture accesses. That's all. Technically, the Radeon didn't give me fully arbiturary texturing because of the 4-dependency/pass architecture, but they're close enough. To me, fast floating-point operations are a bonus, not a requirement, so I have no problem with nVidia not offering it.

Now, for DX10, it's a different story. I expect fast floating-point operations, arbiturary looping, and more instructions. I could live without arbiturary looping because I know that it is very difficult to implement in the multipipelined architecture of fragment shaders. However, if that doesn't happen, I want other features like floating-point framebuffer blending and the ability to display floating-point framebuffers.

Your argument seems to be, "ATi did it, why not nVidia?" when doesn't give much of an appreciation to how much hardware design differences can effect how the hardware turns out in the end. Clearly, ATi's design goals were floating-point; they sacrificed to attain that (low instruction count, antiquated texture dependency/pass system, etc). nVidia's design goals were different; they wanted high instruction counts and a flexible texture access system (and a flexible fragment program unitin general). To get this, they sacrified floating-point speed.

High instruction count and no dependency change are laudable goals. That nVidia chose one way over the other is only bad if you expected a different choice of goals. To me, it is no less appropriate than what ATi choose. Except for the fact that no cross-platform shader API allows you to use fixed-point; that's the part that breaks the whole thing, and that's the part nVidia probably didn't expect when they came up with this architecture. They couldn't have expected the rest of the ARB to prevent both ARB_fp and glslang from having any kind of precision hints. So, now they have to do this end-run around the problem in their drivers, which must violate these specs.

Ostsol
09-15-2003, 03:49 AM
Originally posted by Korval:
There is one point I think you should consider: an nVidia card can run a 1024-opcode fragment program. An ATi card can only run something like a 64-instruction program. Now why do you suppose that is?
I'm just going to jump in and ask: have you tried running a 1024 instruction fragment program? (I haven't, that's why I want to know. . . http://www.opengl.org/discussion_boards/ubb/smile.gif) If it is impractical in terms of performance to run a program of extreme length, the maximum instruction count available is irrelevant in a real-time application.

davepermen
09-15-2003, 03:58 AM
nice big blabla from you. i simply cannot agree with it, while it has good thoughts and is wellbased on knowledge, too. still, i have the same knowledge, but i can in no way agree with you. fun somehow.

nvidia has to rewamp the full fx shader unit as well, and they know it. the fx can not survive a next generation, as it can not survive the current generation eighter.


and its definitely NOT the apis fault. nvidia knows the apis they generate gpus for, and they knew right from the start that their cards will NOT perform good in common apis. don't ask me why they still built the card that way. but their big marketing for cg right from the start of r300, their very big nv30 emulator and cg combination marketing was just there to hide the fact that for nv30, you should use cg, to be able to develop for nvidia extensions as well without much more work. they KNEW right from teh start how bad their hw will look if nobody supports their proprietary stuff.

and yes, ati prooves me right. what will be the future? fixedpoint lowprecicion or floatingpoint highprecicion? i can get the same everyone on gfFX gets faster at higher quality, and that with a one year old hw. i haven't bought the card just to be able to play quake3 fast. even the gf2mx was enough for this. i bought the then very new 9700pro because it should be good for today, as well as for tomorrow. and seeing how well it performs in EVERY dx9 app, and EVERY opengl app compared to the gfFX lines prooves me right.

there is no big talking. a gfFX does not serve the needs of gamers and developers. it can serve some special needs, yes. but its a gaming gpu. it should just run games fast and good. future games. like hl2, like doom3, like the new tombraider (even while most people think it sucks anyways, it should at least get rendered good).

gfFX scalable? till now it was not able to scale up to radeons, till now it was not able to get programmed that easy for as radeons, till now it is by no means cheaper for its performance compared to ati. if gfFX would be scalable, it could scale well over dx9 and opengl. it doesn't.

it is downscalable. down to old fixedpoint. wich no one NEEDS. tell me one NEED for it. there's no need to step back if you can stay where you are and have it fast, and bether, too.

yes, the ati is still based on those passes. but fact is, the gfFX, too. one pass contains a texturesampler OR a float instruction, and two fixed instructions. about that. there are more issues, but thats about it.

i bet ati could get their radeons to do 1024 instructions as well, no big problem. but there is not much use for it. there is not much use for features that will not be able to run realtime on current hw. we've seen that several times yet. features get adapted at the moment they can get made runable in realtime. not before.

why carmack doesn't do hdr? why should he? doom3 is very old, and he hasn't changed much in it since then. but he knows about hdr and he tried to hack it for doom3 into fixedpoint. all in his .plan ..

as i said yet. i do understand your statements. i know them, too. but i don't see in the same direction you do. you stay rather conservative with your position, thinking of good old tnt2 rendering is still state of the art. i life for getting finally RID of all that, because it all looks the same, and rather crappy.

we're so far away from shrek in image quality, there's still much to do. and staying with old fixedpoint will NOT be the way to get high fidelity graphics anytime soon.

and i do much more math in pixelshaders than just colourcalculations, so yes, i need floats there. but thats another topic..

i can not agree with you, sorry.

Korval
09-15-2003, 11:28 AM
nvidia has to rewamp the full fx shader unit as well, and they know it. the fx can not survive a next generation, as it can not survive the current generation eighter.

I wouldn't say that. Given some of the descriptions of the internals of the FX, all they would need to do is add a few more float units. And, even if improving the speed of floating-point operations requires a fundamental re-write of their architecture, they still have the more flexible system. I'd bet that supporting conditional branches for them is nothing compared to what ATi will have to do.

And, fundamentally, conditional branches are more important than floating point.


and its definitely NOT the apis fault. nvidia knows the apis they generate gpus for, and they knew right from the start that their cards will NOT perform good in common apis.

ARB_fragment_program was developed after the FX was complete. Remember, the FX released well after the hardware was finished; the reasons for the delays were due to manufacturing, not the chip's design. And glslang was even later.

As for DX9, maybe Microsoft had a sudden change of heart at some point and removed fixed-point operations from their design. Unless you have access to an early PS2.0 revision, we can't tell.

So, in short, if the API's had allowed for the possibility of fixed-point math, then there would not be a problem. They didn't, and only NV-specific paths allow for them. So yes; the low performance of FX's is traced directly to API's.


a gfFX does not serve the needs of gamers and developers.

The only reason it doesn't serve the needs of developers is because the API's were not written with an eye towards what nVidia was doing, or what would be a good idea for the future.


gfFX scalable?...if gfFX would be scalable, it could scale well over dx9 and opengl. it doesn't.

The point I was making is that the architecture is scalable. Why isn't the 9200 a DX9 part, when the FX5200 is? Because ATi couldn't produce a low-end R300 piece of equipment and nVidia could. Also, the internal architecture of the FX will serve nVidia for years to come; ATi needs to rebuild its fragment program architecture in order to stay compeditive in the future.


it is downscalable. down to old fixedpoint. wich no one NEEDS. tell me one NEED for it.

Fixed-point has some nice properties (automatic clamping, etc). And if you can get fixed-point to go faster than floats, which is certainly doable, then why not take advantage of it?


yes, the ati is still based on those passes. but fact is, the gfFX, too. one pass contains a texturesampler OR a float instruction, and two fixed instructions.

The FX does not have passes, not in the same sense as a Radeon. A pass on a Radeon incur's some cost above the cost of the instructions themselves, which is why people report significant slow-downs with a long dependency chain. On an FX, they don't have that problem (they have other problems, but not that one).


i bet ati could get their radeons to do 1024 instructions as well, no big problem.

Based on what facts do you say that? I've presented evidence that the R300 is nothing more than a suped-up configurable fixed-function pipeline like the R200 or register combiners. Such architectures scale very poorly in terms of number of opcodes. By contrast, the FX's fragment programs are more like vertex programs than a configurable fixed-function pipe.


but there is not much use for it. there is not much use for features that will not be able to run realtime on current hw.

The point I was trying to make is that 1024 instructions is clearly a rediculous number. And yet, the FX can do it without a problem. Indeed, it is entirely possible that the 1024 number is not there because some hardware engineer said, "Let's make hardware that can do 1024 opcodes", but because it emerged out of the design process, like emergent behavior in AI. They probably didn't plan it; it just happened because their design was so CPU like that a large number of opcodes wasn't a problem.


why carmack doesn't do hdr? why should he? doom3 is very old, and he hasn't changed much in it since then.

HL2 is very old, too. That didn't stop them from doing it. Granted, I think HL2 is aimed at a higher-end machine than Doom3.


you stay rather conservative with your position, thinking of good old tnt2 rendering is still state of the art.

First of all, your statement about "good old tnt2 rendering" is not valid at all, and is a rediculous oversimplification. Anything that isn't HDR or doesn't require 24-bit floats isn't immediately religated to something that could only be done on a TNT2.

And the fact of the matter is that the single most important factor in visual quality today is still what it was on a Voodoo1: textures. If you have good looking, high-res textures, your world looks better than all the HDR/float/bump-mapping/etc effects you can pull off. Good textures are the foundation of all good graphics.


and staying with old fixedpoint will NOT be the way to get high fidelity graphics anytime soon.

We aren't getting "high fidelity graphics" out of graphics cards in real time anytime soon either.


and i do much more math in pixelshaders than just colourcalculations, so yes, i need floats there.

Give me a significant, non-HDR shader that absolutely needs floating-point precision throughout the shader.

BTW, Dave, feel free to use capital letters (in grammatically appropritate places, of course) in your posts.

tellaman
09-15-2003, 02:29 PM
just a performance report for my 5900 ultra:
it took me some serious time but i've managed to reduce the number of registers used in my fragment program test from 10+ to 3
so here are the results @1024x768x32 (all pixels in window covered)
10+ regs: 17 fps
3 regs: 24 fps (40+% faster)
this was with dets 45.23
i've also tried dets 51.75 but that did not result in any speed increase (i've experienced some slowdowns actually), so i guess this driver is optimized for d3d only, though i've heard nvidia is trying to pull 51.75 down because it's not an official release

zeckensack
09-15-2003, 02:49 PM
Korval,
NV3x's temp register issues impede scaling.
The more instructions you have, the harder it gets to sort out the register usage.
I don't see how you can state the architecture scales better than R300.

For all intents and purposes, noone should want to run fragment shaders on NV3x that exceed R300's instruction limit. It'll be too slow to bear in interactive rendering applications (just avoiding the real-time term here).

Let's address the offline-crowd for a moment.
An NV35U's fragment unit peaks at 450M*4=1.8 vector-Gflops per second. A P4 1.8GHz can do that, too, and doesn't have any temp storage limitations to speak of. And there are 3GHz P4s to be had if that doesn't suffice.

The high max instruction count on NV3x is thus completely, utterly useless. And then there are FX5200 and FX5600 models, too. Let's better not speak of those.

Another thing regarding "R300 is a floating point R200 with more passes":
it's not entirely true. ATI ditched fixed function fog logic and, apparently, fixed function coverage computations for line/edge smoothing. They have gone pure fp pipelines.

Also just looking at paperspec, the R300 is bound to smack the FX5900U (as it does), because it has twice the functional units, which is quite enough to compensate for the clock speed disadvantage.
The only thing an FX5900U does faster is fixed point math. So maybe NVIDIA will finally support the ATI_fragment_shader extension, now that they have the necessary fixed point hardware.

That's what NV3x is: a very fast Geforce 4Ti with some dead limb on the side.

Korval
09-15-2003, 05:53 PM
I don't see how you can state the architecture scales better than R300.

Easily. If my supposition is correct, the R300's fragment programs are not run like a program at all. Instead, it is just a sequence of register settings to a highly complex fixed-function pipeline. Not unlike register combiner "programs", only with many more features.

FX fragment programs, however, are executed much more like vertex programs. That's why they are able to have such a high instruction count; they don't have physical silicon getting in the way. The FX pipeline scales up like a CPU.


For all intents and purposes, noone should want to run fragment shaders on NV3x that exceed R300's instruction limit. It'll be too slow to bear in interactive rendering applications (just avoiding the real-time term here).

Not everybody cares about interactive framerates. I'm sure the guys at Pixar would be happy just offloading some Renderman shaders to hardware; it doesn't have to be interactive for them.


A P4 1.8GHz can do that, too, and doesn't have any temp storage limitations to speak of. And there are 3GHz P4s to be had if that doesn't suffice.

Come on; you know better than that.

While on paper, a 1.8GHz Pentium can match the hardware, that'll never happen in reality. Why? Because a Pentium can't spend every single cycle on fragment-program computations. It has to do other work (branches, rasterizing triangles, vertex program stuff, etc). A GPU's fragment programs, when well fed, can spend all of its time in parallel to the CPU simply crunching on fragment programs. Simultaneously, it is rasterizing triangles, and running vertex programs. Even a dual 3GHz Pentium 4 machine can't compete with that, in terms of raw performance, as long as the pipeline remains well-fed.


it's not entirely true. ATI ditched fixed function fog logic and, apparently, fixed function coverage computations for line/edge smoothing.

Those are minor changes and the line/edge smoothing are not part of fragment programs. The basic structure of their fragment program pipeline remains unchanged.


Also just looking at paperspec, the R300 is bound to smack the FX5900U (as it does), because it has twice the functional units, which is quite enough to compensate for the clock speed disadvantage.

The FX can do 2 texture ops per clock, while the R300 can only do 1. It balances out in the end.

Besides, it has been shown that, in a flat race (no significant fragment programs on either) that the FX can be faster than an R300.


That's what NV3x is: a very fast Geforce 4Ti with some dead limb on the side.


Saying it doesn't make it true.

The design of the FX is very good; the implementation may leave something to be desired on the floating-point fragment program end, but this was sacrificed due to the design. That same design, suped-up as the R300 is a suped-up R200, could easily beat the R400 next year in featureset. You may yet be complaining next time that the R400 doesn't have that fragment-program looping operation that nVidia provides.

The direction nVidia took with the FX, while not so good from the floating-point in fragment-program end, is a good, and ultimately necessary, evolution in per-fragment logic. You will see the fruits of this tree born out in later generations, as ATi will have to do something similar (and with less practice at it, to boot) in order to keep up.

As I pointed out before, until the R300 came out, there was little reason for anyone to believe that this generation would be the birth of efficient floating-point fragment computations. That only happened because ATi decided to sacrifice everything else to make this happen (note the lack of obvious operations like SIN and COS in native ATi fragment programs). ATi took a big risk and won; it turns out people wanted floating point more than SIN/COS. However, that doesn't invalidate nVidia's full restructuring of their fragment pipeline; it simply shows that they still have a long way to go.

The point I'm making is that for nVidia to fix the problems in its card will likely be much simpler than what it will take for ATi to fix the problems in their card. Even if nVidia decides to take out fixed-point entirely, this will be a minor alteration compared to the overhaul that awaits the R300 if ATi expects to be compeditive. Fixed-function can't last forever; sooner or later, it has to change into something more CPU-like, or at least something more vertex shader like.

zeckensack
09-15-2003, 07:38 PM
Korval,
rendering with long fragment programs (>=100 ops average per fragment) obviously moves the bottleneck to backend computation. To balance it back out, you'd also need very long vertex programs and extreme polygon counts. That would be a fair scenario (because GPUs could show off their vertex hardware, too). But I don't know whether it's realistic.

So in the you have long programs running on both ends, deemphasizing bandwidth, which is perhaps the primary reason why hardware rendering is so worthwhile. Another thing you deemphasize is texture fetch, which is seriously optimal on graphics hardware, while not so on general purpose processors.

I didn't mean to state that a P4 1.8 will beat an FX5900 in rendering Quake 3. I meant to state that it may do that, or at least come very close, in the compute-limited scenario of very long fragment programs.


I now also got what you meant with scaling. You meant that NV3x is a better starting point for future technology. Well, that may be, but it doesn't make NV3x more viable.

ATI's phase design is not necessarily a bad basis for flow control. Predication can kill conditional branches with small leaves, we've already discussed this at length. This can already be implemented today using SLT/MAD or SLT/LERP sequences. More heavy branching could be integrated much like dependent reads are now, as a phase transition. You'd lose at most eight cycles per branch, less on average. That sounds acceptable to me.

Whatever, there are nice solutions to free form flow control on both sides of the fence. Just because you deem R300 architecture too uninspiring doesn't mean it won't do what it must do and be good at it. It does. ATI had well over a year now to research possible solutions to branching, and they may yet have found something.

Korval
09-15-2003, 10:46 PM
I didn't mean to state that a P4 1.8 will beat an FX5900 in rendering Quake 3. I meant to state that it may do that, or at least come very close, in the compute-limited scenario of very long fragment programs.

I don't recall mentioning Quake 3. The point I was making is that graphics cards can do per-vertex operations, per-fragment operations, framebuffer operations, and rasterization operations all simultaneously. A CPU can only do one of those at a time. It could never possibly live up to the 1.8GFlops, or even close to it, simply because it has to have other logic associated with it. A GPU is massively parallel; a CPU just can't compete with it on a purely computational level.

Granted, for doing any seriously deep logical computations, the limitations of current GPU programs are quickly made apparent. However, for doing computations on vertex and fragment data, no CPU, or even dual-CPU setup can get the job done as well as a GPU can.


Predication can kill conditional branches with small leaves, we've already discussed this at length. This can already be implemented today using SLT/MAD or SLT/LERP sequences.

You are kidding, right? You're not actually proposing this as a real "solution" to conditional branching, are you? I'm expected to waste my precious fragment program cycles doing work that I could skip if my hardware designers were smart enough to make their hardware do branching intelligently? I could be doing something useful with those cycles, like spend them on more expensive floating-point operations http://www.opengl.org/discussion_boards/ubb/wink.gif


More heavy branching could be integrated much like dependent reads are now, as a phase transition. You'd lose at most eight cycles per branch, less on average. That sounds acceptable to me.

Hold on. You're not willing to use fixed point on computations that don't require floats, but you're willing to lose 4 cycles (on average) per branch? I'd be willing to make floating-point operations take 4 times as long as fixed point if it meant that branch operations don't have a cost associated with them that is more than the opcode itself.

Honestly, I would be willing to give up fast floats for real conditional branching in a fragment shader. One of them is a fundamental necessity of programming; the other is a "would-like-to-have".


Whatever, there are nice solutions to free form flow control on both sides of the fence.

If, by nice, you mean, "Give up lots of cycles to idiotic hardware design."

However, I'm confident that ATi isn't stupid enough to not redesign the internal structure of their fragment program architecture. They did, after all, give us fast floating-point in the fragment shader; they must know something about hardware design http://www.opengl.org/discussion_boards/ubb/wink.gif

The R200 architecture took them pretty far; it gave them the chance to one-up nVidia in this generation. The R200 has served them well; now it is time to put it out to pasture and bring in something new.

[This message has been edited by Korval (edited 09-16-2003).]

kehziah
09-15-2003, 11:07 PM
tellaman :
though i've heard nvidia is trying to pull 51.75 down because it's not an official releaseif that's true, they are just nuts.
First they say : anything that uses other drivers than det 50 to measure performance is invalid
and now : no no no, det 50 are not ready...

@Korval :
while you make some good points, you seem to forget that gfFX is a mass market product. I can't agree with you on the fact that NV3X is a good design. It is not up to the task it was created for : run mainstream 3D apps (ie games) fast.

Ok it has some good points for some special cases where you can get good perf. But it turns out it is not relevant for mainstream 3D apps. ISVs don't want to mess with vendor-specific stuff. Valve did spend some resources on this, and regret it : it wasn't worth it.

Regarding the future, yes maybe NV40 will be vastly better and perform well using standard APIs. Maybe NVIDIA will have less work than ATI to produce next generation chips. That does not make NV3X any better.

davepermen
09-15-2003, 11:50 PM
Originally posted by Korval:
We aren't getting "high fidelity graphics" out of graphics cards in real time anytime soon either. http://www.daionet.gr.jp/%7Emasa/rthdribl/index.html
rather high fidelity yet, and realtime. its not anytime soon? its rather soon.


ve me a significant, non-HDR shader that absolutely needs floating-point precision throughout the shader.
hw displacement mapping. geometry extraction. raytracing. material math for the shading, depending on position. procedural textures.
nobody _needs_ floats. you can always do it manually, bit by bit. but doing such stuff without floats is just unnecesary hard work


If you have good looking, high-res textures, your world looks better than all the HDR/float/bump-mapping/etc effects you can pull off. Good textures are the foundation of all good graphics.
[personal attack]bah, you're quite old and braindump.[/personal attack] running q3 at high res with ultrahigh filtered superdetailed superhighres textures just still doesn't look good, or real.
the above demo i've linked does. it does not need any textures.
good, correct shading gives a lot of detail over undetailed geometry. real world does not have textures.
restate your statement: good artists are the foundation of all good graphics. textures are not needed for this.


Not everybody cares about interactive framerates.but everyone who buys a gfFX wants that. dx9 realtime. hl2 realtime. at highest res. at highest settings. smooth. thats what a gfFX is for. thats what dawn is for on the package. to say "look at me, you can have me realtime".


Saying it doesn't make it true.
looking at facts indeed DOES make it true. its an extended NV_texture_shader hw. it is exactly that place in the pipeline, and all they made is wrapping a loop around it.

the design of the FX is VERY BAD. it does nto perform well, it does not scale well, it does not have the features that are useful done the way they ment to be.


The direction nVidia took with the FX, while not so good from the floating-point in fragment-program end, is a good, and ultimately necessary, evolution in per-fragment logic.
what is good in their direction? they forgot to support dx9, they made some own-thought creations, their hw is slow, has not prooven to be scaleable to beat out a one year old gpu, their hw is backward designed, means adds over a gf4 instead of replacing it.


The point I'm making is that for nVidia to fix the problems in its card will likely be much simpler than what it will take for ATi to fix the problems in their card.
thats why nvidia is unable to do this since over a year, yes.


Fixed-function can't last forever; sooner or later, it has to change into something more CPU-like, or at least something more vertex shader like
and thats why ati dropped it completely while the gfFX is still a gf4 with a loop around the texture-shader.


i see you as one of the only in here supporting the gfFX hw design. it is complex. this is a main reason why its bad. perfect is not when you cannot add anything, its when you cannot remove anything anymore. the gfFX is FAR from perfect. and far from logical designed. they made some "too much beer" design choises, wich made their hw loose in the end against the one year old competition.

r300 was revolutionary. gfFX was not. and still isn't. it is not new hw. the r300 is.

and we'll see the future. gfFX will get replaced completely, thats what all rumours say in nvidias coorp. why? because it was a bad design, because it was an "addon on the gf4", because it was not planned for the real useful features. thats what internal nv-rumours say. and if they don't know that bether than you, then i don't know who should.

kansler
09-15-2003, 11:51 PM
Originally posted by Korval:
i bet ati could get their radeons to do 1024 instructions as well, no big problem.

Based on what facts do you say that? I've presented evidence that the R300 is nothing more than a suped-up configurable fixed-function pipeline like the R200 or register combiners.

What evidence did you present Korval? You have only made assumptions. Do you have blueprints of the R300 chip design?

EDIT:

BTW, I think the FX line is a step back for nvidia in terms of hardware design. A new card which performs worse in games than a lower clocked predecessor is pure cr@p. Here an example of the fx5600 performing worse than a ti4200.
http://firingsquad.gamers.com/hardware/msi_geforce_fx5600-vtdr128_review/page10.asp

'Nuff said

[This message has been edited by kansler (edited 09-16-2003).]

M/\dm/\n
09-16-2003, 01:15 AM
I just can't agree with daveperman that R3xx design is revolutionary but FX is not. I'd say it's totally opposite.
R has nothing but short/stright/fast 24bit pipe, FX has flows/inst counts/2 floats+fixed+ints etc.
Moreover I'll repeat, but IF SHADERS WILL BE COMPILED BY DRIVER like the way GLSLANG is going to do that FX has a lot more flexibility than R3xx. Yeap, right now while shaders are still smeling like handwritten R has huge advantage, but for serios Hi-Lev shader programming FX is A WHOOOOOOOOOOOOOOLE LOT MORE FLEXIBLE.

If seriously, I'd say that Rads are NOTHING but one pipe that is fast in ordinary situation, FX's have so much $hit inside that I can only wonder, but you have to f**k or use smthn like Cg to get performance out.

mattc
09-16-2003, 02:08 AM
If my supposition is correct, the R300's fragment programs are not run like a program at all. Instead, it is...
in other words, you don't really know, you're just guessing... as you say,

Saying it doesn't make it true.

by the same token, just cos you think you can argue well doesn't mean it's true. and patronising others over their writing style strikes me as somewhat desperate - shouldn't your argument be enough?

i realise this may come across as a bit personal but it seems like your priority here is to win arguments and as a result you're really not adding much to the quality of the discussion, in fact looks like you've dragged it down to another "me vs. everyone else" bickering thread.

madman: i wouldn't worry too much about any graphics card's future http://www.opengl.org/discussion_boards/ubb/wink.gif

Zengar
09-16-2003, 02:36 AM
Well, I read some serious german article on CineFX. I don't remember the link, but I'm pretty shure it was posted somewhere here in forum not a long time ago.
The guy found a NVIDIA patent in european patent office, a patent describing CineFX hardware. His evidence shows:
1. It's a NEW hardware design, nothing compared to old GF.
2. This is a VERY flexible design.
3. This design is NOT very good on floating-point computation.
4. One big disadvantage is CineFX illness to temp regs. The problem is, CineFX writes temporaries to a quad(source fragment) buffer. More temps -> less quads = less fragments pro once = less performance. I think this is the REAL problem of FX. I can be happy with fixed point, really. But I can't be happy without temp registers http://www.opengl.org/discussion_boards/ubb/smile.gif

R300 is a register combiner hardware - it simply has 16 of them - 8 for texture and 8 for math. I think it's clear evidence - just look how good the card performs! But if we need brunching - R300 architecture is USELESS. CineFX must only replace quads in the buffer to do the trick.
Well, I don't know if CineFX will survive till NV40. I agree with Korval that NV30 slowness is to be bound to API. Well, GeForceFX is a freak card - it's for people that are happy with sitting and optimising the shader - for people like me, for example.

P.S: It's but funny why NVidia is so silent... No new extensions, nothing...




[This message has been edited by Zengar (edited 09-16-2003).]

Ostsol
09-16-2003, 03:17 AM
So. . . Anyone want to try and bring the discussion back to HL2? http://www.opengl.org/discussion_boards/ubb/smile.gif

rgpc
09-16-2003, 03:45 AM
Or perhaps GL?

FSAAron
09-16-2003, 06:29 AM
>> So. . . Anyone want to try and bring the discussion back to HL2?

The discussion was about bashing nVidia from the beginning, not about HL2 itself (you must have confused threads here). Bashing nVidia was actually started by Gabe himself, as he has got reported to be whoring for ATI like Britney for Pepsi.

As nobody cares for that unprecedenced fact, it is not surprising that all discussion went to hardware performance, bashing each other about their knowledge of hardware architecture, patronising somebody over patronising somebody else, war between nVidiots and wiseATIenthusiasts allied with few self proclaimed mentors with monopoly for objectivity.

C++ dammit, where are you when you're needed?

zeckensack
09-16-2003, 07:14 AM
Korval,
I only mentioned Quake 3 as an example of what I was not referring to.

And yes, I'm still serious about the P4 comparison. We're talking stream kernels here. Loop overhead, texture fetch and filter, bandwidth, non-parallel overhead for rasterization are the things that hold general purpose logic back. These things approach irrelevance once you employ longer, compute-heavy kernels.
Regarding vertex shaders, how much is that? NV35 needs three cycles to do four DPHs. Do the math yourself if you will, and make sure to compare prices. Is that really an attractive offline rendering solution?

I also mentioned a minimum breaking point: over R300's instruction limit. That'd be 96 full vector ops per fragment. Up to 160 ops with non-trivial co-issue and texturing involved. Bump that straight up to 1024 if that helps you see the point.


I'm not kidding. Note how I said "small leaves". You, too, must finally start to realize how prohibitively expensive 'real' branches are. As I said, we've discussed the implications of free form flow control on massively parallel hardware already. Parallel fragments may take different routes through a branch and thus require dedicated flow control logic per fragment. Predication is more elegant and more efficient (wrt transistor budgets vs throughput). I had hoped you remembered that.


Next one, four cycles from the point of view of a single fragment. Low end stuff can do four of these in parallel. Higher end stuff does eight. Whatever, a branch being (on average) four times as expensive as an ALU instruction sounds about right.
I'm not saying ATI are going to take this route, but it would work and still be fast.


Zengar,
"But if we need brunching [sic] - R300 architecture is USELESS. CineFX must only replace quads in the buffer to do the trick."
You do realize that a phase architecture also has a loopback that passes data back to the start of the processing pipeline?
Now why is R300 basic architecture useless for branching? There's no fundamental advantage for NV3x wrt branching, in fact, the current mechanisms are very similar.


Ostsol,
this is not the Half Life 2 thread http://www.opengl.org/discussion_boards/ubb/wink.gif

harsman
09-16-2003, 07:17 AM
Zengar, I think you're talking about this article (http://www.3dcenter.de/artikel/cinefx/index_e.php) .

Anyway, I don't know if this has been said but besides the register usage limitation of the FX there's something else which leads to lower performance than the radeon in a lot of cases: It can do a floating point op OR a tex op, while the radeon can perform both simultaneously. So if the latency from the texture fetches gets hidden (something modern hw is good at) the radeon has much more power than the FX provided the ratio of texture ops to arithmetic ops isn't to big.

The article above also shows the FX as sort of a more long and thin pixel pipe compared to radeon which is shorter and wider. This probably means the FX has to work harder to keep the pipe full to ensure efficiency which might impair it even more. Unless of course, all you want to do is massive multi texturing but that doesn't sound a whole lot like the "dawn of cinematic computing" to me.

To be fair though, we all would have thought the FX was a great card if the R3x0 hadn't turned out to be such a beast when it comes to shader performance. My only nag is the lack of native PCF for shadow maps, you better fix that next time ATI, you hear?.

KRONOS
09-16-2003, 07:24 AM
i see you as one of the only in here supporting the gfFX hw design


Can see me too.

After reading all of this I came to a simple conclusion: these is turning into a futuremark like forum. Calling r300 revolutionary because it is faster "the way DX9 compiles it to be" is funny...



it is not new hw


Why? Because it is slow? But wait! People do know that when properly programmed or using a proper compiler, it is fast, as fast as the "revolutionary" r300. Even thought it is offering much more.

I do have to admit that the FX was a bit of a failure for the gamming market even thought there's not much of a need to use such high end board just to play games, but that is another talk...



thats why nvidia is unable to do this since over a year, yes.


I guess ATI fans will have to wait a year when ATI moves to 0.13, uh? Or will it work the first time for them?



what is good in their direction? they forgot to support dx9


Didn't they overpassed it? I'm confused here...



has not prooven to be scaleable to beat out a one year old gpu


That's not their fault! They didn't heard of that gpu! They build it to be slow. "Heck, let's build something slower... Just for fun..." http://www.opengl.org/discussion_boards/ubb/wink.gif



while the gfFX is still a gf4 with a loop around the texture-shader


and them:

"R300 is a register combiner hardware - it simply has 16 of them - 8 for texture and 8 for math. I think it's clear evidence - just look how good the card performs! But if we need brunching - R300 architecture is USELESS. CineFX must only replace quads in the buffer to do the trick."

"That only happened because ATi decided to sacrifice everything else to make this happen (note the lack of obvious operations like SIN and COS in native ATi fragment programs)."

Are you sure it is a "gf4 with a loop around the texture-shader"? :p

EG
09-16-2003, 07:36 AM
> i wouldn't worry too much about any graphics card's future

Amen.

Ostsol
09-16-2003, 08:27 AM
Originally posted by FSAAron:

>> So. . . Anyone want to try and bring the discussion back to HL2?

The discussion was about bashing nVidia from the beginning, not about HL2 itself
I don't know. . . you were quite fast in bashing Valve and ATI right away. . .

vember
09-16-2003, 09:04 AM
"I guess ATI fans will have to wait a year when ATI moves to 0.13, uh? Or will it work the first time for them?"

It actually already exist. :P

Korval
09-16-2003, 09:39 AM
rather high fidelity yet, and realtime. its not anytime soon? its rather soon.

It's nice to see that they can render a skybox with a building on it and a reflective abstract object. Now, do something that would actually be useful in some application. Like, rendering 10 people in Pixar-esque quality.

Ain't gonna happen anytime soon.


hw displacement mapping.

Bah.

The rest of the API for doing this (render-to-vertex-array) doesn't exist yet. And, even so, I've never cared much for so-called "displacement mapping". If you want to do the effect, it should be done the right way; rasterize each triangle and displace the individual pixels. Not by doing tesselation on the GPU.


geometry extraction.

Huh?


raytracing.

I'm pretty sure that not even a 9800Pro can do raytracing with a decent polygon count in real-time.


material math for the shading, depending on position.

I don't know what this means or could even be in reference to.


procedural textures.

Now there's a worthwhile task. Procedural textures can look very nice. However, they don't fall under the "need" category; merely the "would-be-nice-to-have".


running q3 at high res with ultrahigh filtered superdetailed superhighres textures just still doesn't look good, or real.

Of course not. I said good textures. You can hi-res Quake3 textures all you want; they will still look like brown mud.


its an extended NV_texture_shader hw. it is exactly that place in the pipeline, and all they made is wrapping a loop around it.

When I said that the R300 was an enhance R200, I had some actual evidence to make that supposition. The 4-dependency requirement looked very much like an R200 with 4 passes. The lack of SIN/COS instructions, which the R200 also lacked.

By contrast, there evidence against the idea that the FX is just an enhanced NV_texture_shader. NV_texture_shader always worked in floats; clearly, NV_fp doesn't (and doesn't do it well when it does). The NV_texture_shader operations are far less complicated than the SIN/COS/derivative operations that NV_fp provides.

I might be willing to believe that a little NV_ts hardware still lives in NV_fp, but it would be such a small amount that it has no bearing on this issue.


the design of the FX is VERY BAD. it does nto perform well, it does not scale well, it does not have the features that are useful done the way they ment to be.

Hardware design is not about, "Hey, let's make the feature that daveperman wants fast." Indeed, it isn't about making any particular feature fast. It is about building a piece of hardware that can perform certain operations at a given performance.


has not prooven to be scaleable to beat out a one year old gpu

This "can't be a one-year-old-GPU" line is silly, and you know it. The FX is one year old too. They are contemporaries; the idea that one of them is better at some operations than another is perfectly reasonable.


thats why nvidia is unable to do this since over a year, yes.

They haven't released a hardware revision since the 5900. And, back then, the floating-point thing wasn't such a big deal; HL2 was still pretty much under wraps.


and thats why ati dropped it completely while the gfFX is still a gf4 with a loop around the texture-shader.

I have explained my evidence that ATi did not. And I have proven that NV_fp can't possibly just be a loop around texture shaders; the idea itself is so rediculous that proposing it shows the bias of the proposer.


i see you as one of the only in here supporting the gfFX hw design.

That doesn't make my position wrong.


the gfFX is FAR from perfect.

Given. But the R300 is even farther from perfect.


r300 was revolutionary. gfFX was not. and still isn't. it is not new hw. the r300 is.

Based on what do you say this? That the R300 gives you that floating-point performance you've been wanting? That is, after all, their only feature.


thats what internal nv-rumours say.

Do you really believe that internal rumors are released to the public by accident? Internal rumors are another word for marketing; they're saying precisely what they want you to hear. People think that the FX architecture is bad because it performs poorly in floating point. nVidia's engineers know that the entire architecture doesn't have to be rebuilt to fix this. Therefore, nVidia's engineers aren't going to. But marketting can't change the impression that the FX architecture is bad, so they release statements as "rumors" that the architecture is being replaced. It isn't becase it doesn't need to be. ATi went to fast floating-point while still running off an R200 core.


What evidence did you present Korval? You have only made assumptions. Do you have blueprints of the R300 chip design?

Maybe you should have read the thread instead of jumping in the middle.

Quote from me:

"If you look at what the hardware can do compared to the R200, they do look very similar. Especially when you factor in the need for texture "dependencies", which simply disguises the underlying "pass" logic of the R200 line."


in other words, you don't really know, you're just guessing...

Yes, I don't know for an absolute fact, but the avaliable evidence certainly fits the known facts. It is possible that ATi could have completely rebuild their fragment shader logic, but explain why they would have built that texture dependency chain logic into it if they did? That dependency requirement fits the R200's pass architecture precisely.


As nobody cares for that unprecedenced fact, it is not surprising that all discussion went to hardware performance, bashing each other about their knowledge of hardware architecture, patronising somebody over patronising somebody else, war between nVidiots and wiseATIenthusiasts allied with few self proclaimed mentors with monopoly for objectivity.

Hey, I resent that... I argued on both sides. Check the thread http://www.opengl.org/discussion_boards/ubb/wink.gif

I'm anti-bashing, whether it's nVidia-bashing or ATi/Valve-bashing.


Note how I said "small leaves".


You, too, must finally start to realize how prohibitively expensive 'real' branches are.

I'm aware of how bad branching can be in fragment programs. Especially if different fragments that are running concurrently want to take different branches. It is certainly a nightmare for hardware designers.

That doesn't mean I'm going to stop demanding it http://www.opengl.org/discussion_boards/ubb/wink.gif

BTW, if you use a pass to branch, don't you have to sacrifice a texture dependency?

That being said, I think (and, for the purposes of clarification, this is only speculation) the kind of architecture of the NV3x line is better suited to fast, or at least lower cost, branching than ATi's R300 line. It is more likely that the NV4x will have faster-performing branches than R400, assuming ATi doesn't rebuild its shader architecture.

1234!
09-16-2003, 11:09 AM
*grabs chips and beer*

KEEP IT GOING!!!

kansler
09-16-2003, 11:51 AM
Originally posted by Korval:
Maybe you should have read the thread instead of jumping in the middle.

Maybe you should stop accusing people.

SirKnight
09-16-2003, 02:03 PM
Originally posted by 1234!:
*grabs chips and beer*

KEEP IT GOING!!!

I second that! http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Actually this has been a pretty interesting discussion. Now...on with the show!


-SirKnight

tamiller866
09-16-2003, 02:26 PM
Originally posted by Korval:
Clearly, the FX was not designed with an eye towards the PS2.0 spec (ARB_fragment_program was probably not even an issue during the FX's design phase). Instead, it was designed with an eye to optimizing what the user needs.


You sound eerily reminiscent of Brian Burke, prior to carrying the voodoo virus over to Nvidia, telling developers that 32 bit color depth wasn't needed.

What the user needs is for hardware venders to accelerate what programmers develop, as it's developed, not attempting to dictate what is needed.

Korval
09-16-2003, 03:23 PM
Maybe you should stop accusing people.

Considering that my evidence had been laid out quite plainly in several posts, including the one where I bring up the subject, it is obvious that you didn't read the thread. Had you done so, you would have seen said evidence. As such, I think my tone and accusation was warrented.

And if you did read the thread, you clearly didn't do so sufficiently well enough to find what I had clearly stated several times.


You sound eerily reminiscent of Brian Burke, prior to carrying the voodoo virus over to Nvidia, telling developers that 32 bit color depth wasn't needed.

What the user needs is for hardware venders to accelerate what programmers develop, as it's developed, not attempting to dictate what is needed.

There's a difference, though. In the case of the Voodoo4+ vs TNT2/GeForce256, the TNT2/32MB was a viable platform (ie, not rediculously slow) for doing 32-bit rendering. 16-bit was still faster, but you could get away with 32 on high-end machines. It was understandable for 3DFx to miss the boat on 32-bit for the Voodoo3, because no viable alternative existed when the Voodoo3 was in development. To not support it on the Voodoo4, when the writing was clearly on the walls during all stages of its development, was obviously wrong-headed.

By contrast, the NV30 and the R300 were developed simultaneously, much more akin to the TNT2/Voodoo3 era. So, it's not unreasonable to see that NV30 might take a separate direction from the R300.

In fact, depending on when the PS2.0 spec was finialized (with only floating-point ops rather than having both fixed and float), nVidia could have believed during the initial design of their hardware that PS2.0 would support fixed-point. By the time it was clear that it wouldn't, it could easily have been too late for them to change their hardware.

My problem with this debate is that, much like with 32-bit rendering or hardware T&L, nobody was actively clamoring for this functionality. Then, when one hardware vendor gives it to you, the one who didn't is expected to have that feature and blamed for not providing it (or, in nVidia's case, not fast enough at it).

It's a new feature; not every IHV's going to have it when it first comes out, especially if nobody told them it was required. Give nVidia crap next generation if they don't get floating-point up to par. But you can't fairly give them crap for expecting them to predict the future enough to know, based possibly on early revisions of the PS2.0 spec, that fast floating-point was going to be a bone of contention for this generation.

On a philosophical point, the question of whether graphics development should be lead by the hardware makers, API's, developers, or market forces, consider this. Was there an outcry for 32-bit rendering before the TNT2/32MB came out? What about developers asking for hardware T&L? Per-fragment configurability (ala, register combiners)? Even floating-point per-fragment computation?

Not really. Oh, developers wanted them, and developers asked for them. But developers didn't start saying that they needed them until a hardware maker gave it to them.

Developers made the conscious choice to use 32-bit rendering when it was avaliable on the TNT2. They made the "choice" to use hardware T&L on the GeForce (not really a choice since the performance of the GeForce256 killed virtually all competition, so that's what everybody had). And, at least Valve, is making the "choice" to use floating-point computations. But, is that really a choice at all?

The problem with the current API situation and floats is that developers don't really have a choice. At least, not a cross-platform choice. If they don't want to use OpenGL and NV extensions, then they're stuck with floats on FX hardware. They have to use them, even if they don't need them. The end-user didn't pick the winner, nor did the ISV's. PS2.0 and ARB_fp did.

After all, if these two supported fixed-point (even as just a hint), we wouldn't be having this discussion. Oh, sure, HL2 might still run slower on nVidia hardware because they need floats for what they're doing. But other games that use PS2.0 or ARB_fp that don't need floats don't get the option; they're going to be slow on nVidia hardware for no good reason.

Is this not a frightening thought? The fate of a hardware generation being decided, on the one hand by Microsoft and on the other by a council of compeditors, rather than by our preferences or even market forces?

I don't mind ISV's making the determination of which hardware is better. I'm willing to live with market forces deciding it. But, when that fate is decided by fiat (Microsoft) or by a small council in which we have little actual input (ARB), we, as ISV's are in trouble.

newt
09-16-2003, 03:50 PM
I find this discussion rather amusing but entirely too lengthy.

Clearly the ATI card is faster at DX9 things.
Clearly the nVidia card is a pain to code for. At the end of the day, both display games fast. When you're in the middle of a fire fight or whatever, you are certainly not saying to yourself, "I notice a bit of low precision shading there". If you are not using it for games, then pick the card which suits your needs the best.

You'll be buying another in 12 months.

On another topic:

It was mentioned earlier that a CPU (Athlon or PIII - I forget) could do the same kind of float performance as a GFFX GPU.

Well, I reckon they should just stop producing GPU's and just stick a fast CPU onto a board with a real fat/fast memory interface.

Program it with whatever you like.

Forget ATI and nVidia - Intel is the way to go.

tamiller866
09-16-2003, 05:35 PM
Originally posted by Korval:
But other games that use PS2.0 or ARB_fp that don't need floats don't get the option; they're going to be slow on nVidia hardware for no good reason.

I'd argue that they're slow because their drivers report them as DX9/OGL2 ready. Thanks to Gabe the public now knows what the rest of us knew all along, they aren't (truly) ready.
If they didn't pretend to be ready they would be sent down the same path as the Ti 4600 (where they truly belong) and the end user may get "playable" framerates.

I agree with most of your other arguments, facts are facts, but you use them to make a philosiphical conclusion which in the end is largely irrelevant.


Originally posted by Korval:
On a philosophical pointthe question of whether graphics development should be lead by the hardware makers, API's, developers, or market forces,...

Is this not a frightening thought? The fate of a hardware generation being decided, on the one hand by Microsoft and on the other by a council of compeditors, rather than by our preferences or even market forces?

I don't mind ISV's making the determination of which hardware is better. I'm willing to live with market forces deciding it. But, when that fate is decided by fiat (Microsoft) or by a small council in which we have little actual input (ARB), we, as ISV's are in trouble.

That fiat (Microsoft) is the largest software vender right now and for the forseeable future, even without including the OS division, so of course they should and will have some control.

In a perfect world everything would be cross platform, and a Power Mac would probably be sitting on top of the Aquamark list. <G> But Nvidia it seems would prefer to take us back to the stone ages of Glide-like proprietary coding.

A DX9 spec program should run at "playable" framerates on any hardware that is branding itself DX9 ready. <-note the period

If Nvidia had a problem with the DX9 standard they should have removed the acronym from their packaging. They could have called it Cg9, OgL9, NV9 - or any other marketing term - ready, and relegated their product to voodoo4 annonimity.

I usually find it difficult to feel sorry for their end users; ever since the Riva 128 their customer base was built on those who wanted to be ahead of the curve rather than those whom tend to actually buy and use software.

But now those who bought the FX series of cards in order to be "DX9 ready" are being told "just wait for the coming OpenGL products". Sadly though, by the time these heralded OpenGL products hit the shelves (or warez sites, grrr) these same types of customers will be the first ones shopping for the next latest and greatest hardware.

IMHO, Nvidia is following in 3Dfx's footsteps, attempting to use their market penetration rather than technological advancement as a means to control and manipulate the industry. I think they should let Microsoft handle creating the monopolies, and stick to what they are\were best.

[This message has been edited by tamiller866 (edited 09-16-2003).]

[This message has been edited by tamiller866 (edited 09-16-2003).]

[This message has been edited by tamiller866 (edited 09-16-2003).]

Ostsol
09-16-2003, 07:09 PM
Since we don't seem to be getting back to HL2, in this thread, I might as well jump in with a couple comments:

I do agree with Korval on a number of points. Most DX9 games probably don't currently need floating point precision. They may need a large mantissa, but high dynamic range does not seem necessary for the vast majority of operations. Also, if NVidia's suggestions or whatever that fixed point precisions also be supported in PS2.0 and ARB_fragment_program were simply ignored and this happened after the NV30 core's featureset was locked, I'd agree that they got screwed. Still, I cannot help but think that this may make implementations unnecessarily complex. Looking at the NV30, I'm quite sure that the inclusion of fixed point precision did much to compromise the potential for floating point performance.

Regarding the potential for eventually supporting the next generation of pixel shaders. . . I'm not so sure we can really say. Has there been any demonstration that showed that the GeforceFX's support for the functionality in PS2.0_extended really works? Do the loops actually work and do they offer good performance?

Korval
09-16-2003, 07:13 PM
That fiat (Microsoft) is the largest software vender right now and for the forseeable future, even without including the OS division, so of course they should and will have some control.

But, outside of DirectX, they don't make that much software that involves 3D graphics. Why should they, basically, be able to bury one compeditor in a market at will? They did a pretty good job of it with ATi in the DX8 era, thanks to their partnership with nVidia for the X-Box, and, with a new contract with ATi for X-Box2, they're doing it to nVidia.


A DX9 spec program should run at "playable" framerates on any hardware that is branding itself DX9 ready.

Why? Any number of cards, nVidia and ATi, advertised themselves as DX8 without even having pixel shaders of any kind. At least the FX can actually do the things it advertises.

It certainly doesn't hold for OpenGL either. Every card has to fully implement the GL spec, but no consumer level card implements it all fast. Any number of paths can lead to a software fallback, or a sudden loss of speed for an arbiturary, implementation defined, reason.

In terms of games, nobody should expect to run cutting edge games on a $100 card, even if it says DX9 on the box; if they do, they deserve what they get. After all, you wouldn't expect to run a high-end game well on anything less than a 1GHz processor, or with less than 128MB (probably 256 minimum) of RAM.

Not only that, the idea that branding something DX9 implies a particular performance profile, basically, means that which cards are DX9 change depending on the software. Software that doesn't make use of fragment programs, but requires the DX9 API, can run faster on an nVidia card than an ATi, because ATi forces everthing through their shader paths. nVidia still has very fast fixed-function stuff lying around to use. Are you going to mandate that all DX9 applications use shaders, too?


I usually find it difficult to feel sorry for their end users; ever since the Riva 128 their customer base was built on those who wanted to be ahead of the curve rather than those whom tend to actually buy and use software.

That's not true at all. The vast majority of nVidia's customer base uses GeForce4MX's. Before that, it was some low-cost variant of the TNT. They've built their customer base on selling low-price, high-quality graphics cards.

The only reason cards like the 5800, 5900, 9700 and 9800 exist is for high-end gamers, and high-end games. People wanting HL2 can still play it on high-end FX's; it is quite playable, just not as fast as the 9700/9800. Indeed, mixed-mode gives them a pretty good performance boost.

That being said, I am glad my computer is powered by a 9500Pro. Best price-per-framerate there ever was http://www.opengl.org/discussion_boards/ubb/wink.gif


But now those who bought the FX series of cards in order to be "DX9 ready" are being told "just wait for the coming OpenGL products".

Actually, they're being told to wait for the Det50 drivers that, presumably, cheat their way around the precision issue. That "fixes" both PS2.0 and ARB_fp, allowing them to be more compeditive with ATi on fragment programs.


Nvidia is following in 3Dfx's footsteps, attempting to use their market penetration rather than technological advancement as a means to control and manipulate the industry.

I don't understand why people say this. They deviated from a standard that didn't even formally exist at the time. For all we know, PS2.0 once had fixed-point as an option, and nVidia took it as the reasonable alternative it is. Unlike 3DFx, this wasn't a clearly bonehead maneuver driven solely by monopolistic practices; there are good arguments both against and for it. It just didn't pan out very well because both API's were against it. nVidia probably expected to have ARB_fp or even Glslang support fixed-point hints; ATi and 3DLabs made sure that didn't happen because they wanted what they have now: people turning on nVidia.

It's a clearly bonehead maneuver if nVidia doesn't get better floating-point performance with NV40, but they already know that.

There is one point I think bears making that tends to get lost in the overriding zeal for floating-point performance.

NV_fragment_program, outside of the performance problems, is fundamentally more much powerful than what ATi offers. This is without question. It offers sin/cos in hardware, as opposed to emulating them and taking up precious shader instructions. It offers derivative operations, which are really useful for proceudural textures if you want to apply manual anisotropic filtering on them. It has no dependency limitations, which many people have run into with real shaders. You even get more temporaries to play with (once again, ignoring speed problems). And, for those who have the need, you can even write really long shaders.

Indeed, the only questionable design of nVidia's hardware is floating-point shader performance.

OK, that, and whatever bonehead limitation prevents them from having floating-point textures that aren't rectangles http://www.opengl.org/discussion_boards/ubb/rolleyes.gif

tamiller866
09-16-2003, 08:57 PM
Originally posted by Korval:
I don't understand why people say this.

Upon much deliberation and introspection I can see now that you are completely correct. This situation is nothing like the one with 3Dfx. In fact, it's exactly the opposite.

I'm reminded of the time when Unreal ran better on a 3Dfx card than it did on a clearly superior GeForce product. At that time the explanation obviously was also code path, as the game ran best in Glide.

Now, one could argue (as you have quite convincingly) that the FX technology is superior to the R300, but again the code path used handicaps the Nvidia product.

Now, if Gabe had used some proprietary ATi API, I could see where Nvidia and their customers would be upset with him. But in this case the role reversal is that it isn't the API that is proprietary - it's the "superior technology".

Whether this was the result of designed arrogance as I implied or a simple "bonehead mistake" as you put it is neither here nor there. They should stop being arrogant now in trying to pass the buck to the developers and/or API committees. They should stand up as you have here and say they have this weakness and ask, rather than demand, both their customers and the developers to be patient while working to improve the situation for everyone.

M/\dm/\n
09-16-2003, 10:59 PM
"the way DX9 compiles it to be"


LOL http://www.opengl.org/discussion_boards/ubb/biggrin.gif



It was mentioned earlier that a CPU (Athlon or PIII - I forget) could do the same kind of float performance as a GFFX GPU.


I thought P4 had less transistors than GFFX http://www.opengl.org/discussion_boards/ubb/frown.gif




But Nvidia it seems would prefer to take us back to the stone ages of Glide-like proprietary coding.


Nop, they just gave you the oportunity to choose, path/precision/approach. And if GLslang shaders will be compiled by driver, you wont even notice the difference, look above DX9 HLSL http://www.opengl.org/discussion_boards/ubb/biggrin.gif



Now, if Gabe had used some proprietary ATi API


Yess, but they are using NV proprietary API => NV(only)_OGL3.0+DX99.9x+CG2.0 1.1 or was it DX9?

One more time the $hit is in shaders & the way they are compiled/optimized.

BTW, if we would imagine that fixed/ints/halfs&floats would be in DX9&OGL1.5 core, we would be $hiting about ATI/3DLabs!!!!!

[This message has been edited by M/\dm/\n (edited 09-17-2003).]

Zak McKrakem
09-18-2003, 12:57 AM
There are facts:
- 3DMark03 shows that Radeon 9800 is far superior that GeforceFX 5900 at DX9 level
- Tomb Raider shows that Radeon 9800 is far superior that GeforceFX 5900 at DX9 level
- Half Life 2 shows that Radeon 9800 is far superior that GeforceFX 5900 at DX9 level (even with a GeforceFX specific path)

Some developers (OpenGL and D3D) have commented their experiences and they are in the same way.
(for example: http://www.beyond3d.com/forum/viewtopic.php?t=7873)

In my opinion, the pathetical thing about this is the way nvidia is 'managing' it (as noted, this should be 'the way itís meant to be played'): they manage to obligate Futuremark to make those statements that all of us know, they manage to make EIDOS to write this announce: http://www.warp2search.net/modules.php?name=News&file=article&sid=14462
And for HL2, they said that the benchmarks are not valid because they are not using their Detonator 50 were they recognize to have "NVIDIA's optimizations for Half Life 2 and other new games are included in our Rel.50 drivers". And all of us have read in Valveís Gabe Newell comments, some of the 'optimizations' they are referring.

Ostsol
09-18-2003, 03:32 AM
Originally posted by M/\dm/\n:
I thought P4 had less transistors than GFFX
Technically, but the P4 can definitely use all it's general purpose registers without any performance penalty.

Korval
09-18-2003, 08:49 AM
And for HL2, they said that the benchmarks are not valid because they are not using their Detonator 50 were they recognize to have "NVIDIA's optimizations for Half Life 2 and other new games are included in our Rel.50 drivers".

There is some truth to that, however. People using leaked 5x.xx drivers have seen at significant improvement in the performance of fragment programs (15-30% or so). It probably won't eliminate the performance gap, but it does seem to cut out a chunck of it. As long as the final Det50's are publically avaliable the moment HL2 releases, then their statement is valid.

MrShoe
09-18-2003, 05:17 PM
I dont understand why people seem to keep saying that NVIDIA performs badly in 3DMark2003. I have a 5900 and i get better scores than a friend of mine with a similarly specced system with a Radeon. It seems to me that when shaders are rewritten for NVIDIA cards, the performance goes up drastically... Sure, its not a good thing that the shaders have to be written in a certain way for NVIDIA cards to performa well, but hey, thats no excuse to say "no, im too lazy, ill just write generic code and get 1/2 the speed on an NVIDIA card".

Also, although ill admit in dont know much about this, but i heard that for fp NVIDIA uses 32bit precision, while ATI uses 24bit precision. Is this true, if so, wouldnt that be a logical source for the performance difference? As far as i know (which may be wrong ill admit), there is no realy way to compare shaders on NVIDIA and Radeon cards fairly, since ATI uses only 24bit fp, while NVIDIA is either 32bit fp or 16bit fixed precision. Correct me if im wrong please.

Korval
09-18-2003, 05:56 PM
I dont understand why people seem to keep saying that NVIDIA performs badly in 3DMark2003. I have a 5900 and i get better scores than a friend of mine with a similarly specced system with a Radeon.

Are you using a version of the 3DMark2003 that still allows the nVidia's drivers to cheat? Are you comparing it with a Radeon 9800 (the high-end ATi compeditor card. Obviously, there's no point in comparing it to the 9600, or even the 9700)?


Sure, its not a good thing that the shaders have to be written in a certain way for NVIDIA cards to performa well, but hey, thats no excuse to say "no, im too lazy, ill just write generic code and get 1/2 the speed on an NVIDIA card".

The problem is that there is no cross-hardware API to optimize code on nVidia platforms. PS2.0 under DX9 doesn't offer 12-bit fixed-point, which is what you really need with an FX. A newer version of PS-2 offers the ability to use 16-bit floats, but that's not enough to really optimize for an nVidia card. ARB_fp doesn't offer it either; it doesn't even offer 16-bit floats.

However, at least NV_fp exists as an option under OpenGL; this is why Doom3 will run at comperable speed (Carmack once said faster) on an FX.


Correct me if im wrong please.

The only avaliable precision on an ATi card is 24-bit. No more, no less. On an FX card, in the internal hardware, operations can be performed at 12-bit fixed-point, 16-bit float or 32-bit float. However, using any floating-point operation requires 2x the time of a fixed-point operation. And using 32-bit floats use twice the registers that 16-bit cost, so you may start hitting performance changes due to register usage.

When running with 12-bit fixed-point-only code, the FX runs slightly faster than a Radeon, but this is likely due to the clock speed differences in the cores. Clock for clock, the two likely run at the same speed. Which, btw, means that the Radeon is able to do twice the work of the FX or better (12-bit fixed vs. 24-bit float) at the same speed.

When running even 16-bit floats, let alone 32-bit, the FX is significantly slower than an equivalent Radeon.

GT5
09-18-2003, 08:23 PM
Well it seems like nVidia has cheated again
by sacrificing the quality of the image in Det 51.75 (Beta) drivers to achieve higher fps. http://www.opengl.org/discussion_boards/ubb/biggrin.gif http://www.driverheaven.net/articles/aquamark3/index3.htm

[This message has been edited by GT5 (edited 09-18-2003).]

john
09-18-2003, 09:00 PM
ok, this is just getting stupid. I acknowledge that Nvidia (and ATI) have been caught cheating before, but that article about Aquamark leaves me cold. "We take two pictures, adjust the brightness. Those cheating bastards!" That article does NOT demonstrate how the author belives nvidia is cheating. Maybe that flash animation (which I cannot/refuse to load) gives a succient apparisal of the problem, but that webpage is woefully stupid in its lack of analysis.

Since when has OpenGL meant to be pixel perfect. anyway? No one would even remotely suggest that you can diff the images from nvidia and ati cards and get a pixel identical clone. Who is tgo suggest that this is also applicable between driver releases?

this is just a crazy witch hunt. Yes, they were caugth cheating once. BUt rather than just grabbing tow images and waving them about and saying "LOOK HERE!" and then retorting "well, i can no longer recommend nvidia" is just emotive, badly argued and incomprehensive.

M/\dm/\n
09-18-2003, 09:05 PM
Hmm, I could expect sharper color transitions when decreasing precision, but that image just looks grayed?
Anyway, I bet they have something in there (51.75) http://www.opengl.org/discussion_boards/ubb/biggrin.gif

GT5
09-18-2003, 09:29 PM
John,
maybe you should read this instead http://www.opengl.org/discussion_boards/ubb/biggrin.gif http://www.tomshardware.com/graphic/20030918/index.html

john
09-18-2003, 09:46 PM
OK, Tom's hardware actually talks about the issues. Incidentally, I'd like to make it clear, GT5, that I wasn't bashing you or yuor post, merely the webpage you reference.

I still have issues with how this kind of information is reported. Although Tom's hardware *concludes* that the problem is a bug and not some optimisation cheat, they still describe the problems as *potential cheats* and their general distrust on how bugs crept into drivers.

I'm sure you all know how easy it is for your brain to glaze over parts that you assume are right. This is especially relevent when you're working on something very specific and you're only concentrating on one aspect of your program's execution. Yes, nvidia is a company vying for consumer dollars and should get it right, but they're not the first company to inadvertanly ship s/w which has bugs in it.

I don't like the general mistrust. They cheated once and were caught, but reviewers should get over it and not automagically assume that any abnormality is a cheat.

GT5
09-18-2003, 10:28 PM
John,
Your missing the point here. tomshardware.com states that AquaMark has been out since last year and that nVidia has been closely working with the makers of AquaMark3 and that they are "confused" as to why the bug was there in the first place!

I have evidence that since 43.xx drivers, nvidia's image quality is getting worse and worse! Even with 44.03 drivers, I can see distinct image quality degradation when I compared it to previous drivers (like 43.45 for example).

The philosophy behind this is very simple - lose some image quality (not tooo much - enough to fool the eyes) and your FPS goes up considerably! http://www.opengl.org/discussion_boards/ubb/biggrin.gif

john
09-19-2003, 12:43 AM
so what that aquamark has been out for a year? and what does that have to do with confusion?

MS windows has been out for lots of years, and yet there are still bugs in it. The new driver has not been out for all that long, and there's bugs in it.

so, I ask again: so what that aquamark ahs been out for a year? Bugs happen. its a fact of life. <shrugs>

john
09-19-2003, 12:48 AM
also, who says that nvidia has to run around and play every game and every demo to make sure everything is fine, and that if they _don't_ or that they miss a bug, that they're cheating?

no one from nvidia has rung me up to see if my linux s/w works bug free with their new drivers. why should aquamark be anydifferent? because its used by supposedly competant reviewers as their only source of quantitative measurement of cards? there are lots of benchmarks out there; why is aquamark so special?

you're missing MY point. just because nvidia drivers has a bug in it doesn't mean they're trying to cheat. just because aquamark has been out for a year, but the drivers are relatively new doesn't mean that nvidia intentially left a bug in. just because aquamark has been out for awhile does't necessarily mean that nvidia has to, or indeed, *MUST* check their driver against aquamark. just because they do THAT doesn't mean that they necessarily spotted the bug. and just because they haev a bug doesn't mean they're trying to cheat.

kansler
09-19-2003, 02:39 AM
Though it's still weird why the nvidia 'bugs' only occur in benchmarks and not in games...

john
09-19-2003, 02:46 AM
and where is THAT demonstrated? how long has this driver version been out?

Ostsol
09-19-2003, 03:25 AM
Originally posted by kansler:
Though it's still weird why the nvidia 'bugs' only occur in benchmarks and not in games...
Ah. . . So the lack of trilinear filtering on all texture stages and the lack of aniso on texture stages other than 0 in UT2003 is a feature. . .

GT5
09-19-2003, 05:32 AM
Originally posted by john:
so what that aquamark has been out for a year? and what does that have to do with confusion?

The fact that they have been working together and the fact that it was working in earlier drivers and doesnt work in the latest drivers means something is wrong!!

Its like saying, you working with FutureMark to make the upcoming benchmark and everything seems to be running fine, but as later drivers are released, the image quality degrades.
Now ask yourself! Is this a bug?
And stop arguing and face the facts! I have hard evidence against the image quality getting worse with every release of the drivers! If thats the case, why havent they fixed the image quality problem? Its been there since 44.03 drivers!

I bet you are going to say, well they didnt get around fixing the problem. Well, considering almost *every* nVidia owner is aware of this problem, why can't they fix it?
Because... quality image *hurts* the gpu performance. More work has to be done per clock! Get it?



MS windows has been out for lots of years, and yet there are still bugs in it. The new driver has not been out for all that long, and there's bugs in it.
[/B]
You cannot compare this with nVidia because the bugs discovered in Windows get fixed, and the *same* bug does not appear.Your comparing apples with oranges here!

GT5
09-19-2003, 05:42 AM
Originally posted by john:
also, who says that nvidia has to run around and play every game and every demo to make sure everything is fine, and that if they _don't_ or that they miss a bug, that they're cheating?

no one from nvidia has rung me up to see if my linux s/w works bug free with their new drivers. why should aquamark be anydifferent? because its used by supposedly competant reviewers as their only source of quantitative measurement of cards? there are lots of benchmarks out there; why is aquamark so special?

you're missing MY point. just because nvidia drivers has a bug in it doesn't mean they're trying to cheat. just because aquamark has been out for a year, but the drivers are relatively new doesn't mean that nvidia intentially left a bug in. just because aquamark has been out for awhile does't necessarily mean that nvidia has to, or indeed, *MUST* check their driver against aquamark. just because they do THAT doesn't mean that they necessarily spotted the bug. and just because they haev a bug doesn't mean they're trying to cheat.

I guess YOU dont understand what is going on here. Certain companies work with one another in order to achieve their goal. In the case of nVidia, strategic beta partners HAVE a say as what goes in to the code. My point is, if the image quality was all that great in 44.03, why has it degraded since? Just answer that question and we will all be happy!

GT5
09-19-2003, 05:47 AM
Originally posted by john:
and where is THAT demonstrated? how long has this driver version been out?
Seriously, if the nVidia's latest drivers arent mature, then why are ATI's latest Catalyst drivers mature?

[This message has been edited by GT5 (edited 09-19-2003).]

kansler
09-19-2003, 06:13 AM
OK then, what about the fact that every nvidia 'bug' speeds up rendering and not one 'bug' actually slows it down?

Korval
09-19-2003, 09:15 AM
tomshardware.com states that AquaMark has been out since last year and that nVidia has been closely working with the makers of AquaMark3 and that they are "confused" as to why the bug was there in the first place!

Are you a programmer? Have you never sworn up and down that a particular piece of code was working just fine only (thanks to Murphy's Law) to find that it is really broken? And, typically, this happens in the most public forum possible?

Yes, bugs happen even in drivers where the writers of those drivers are "working closely" (does anyone have any idea what that really means?) with a particular game.


My point is, if the image quality was all that great in 44.03, why has it degraded since?

For all you know, it was a bug in the 44.03 drivers that said, "If anisotropic filtering is on, multiply the max aniso-factor by 2 and then pass it along to the card," thus making the image quality more than what it should have been. It is possible to have bugs that improve image quality; they're still bugs if they're not doing the right thing.


Seriously, if the nVidia's latest drivers arent mature, then why are ATI's latest Catalyst drivers mature?

The Catalyst drivers are all the 3.x series; same codebase. The Det50's have major rewrites of significant portions of code.


OK then, what about the fact that every nvidia 'bug' speeds up rendering and not one 'bug' actually slows it down?

How do you know that these "bugs" are speeding it up? How do you know that it isn't some other part of the driver that is responsible for the significant speed increases with new drivers?

Has nVidia been cutting corners on image-quality features for performance? Almost certainly. Is this a cheat? Not really, since they can implement anisotropic filtering however they want to. If an application asks for 4x, the driver has the right to override this request for whatever reason. The card may not even have the notion of 4x, and that number has to be translated into something that the card does understand. How that translation takes place is up to the driver.

This is the price we pay to have a single API interface into all graphics hardware: the uncertainty that anything we tell it to do will happen as we expect it to.

Gorg
09-19-2003, 11:09 AM
I don't want to look like a conspiracy theorist and I am simply pointing this out for amusement.

I don't know if people remember, but during the days of rage128, Rage Fury and the original Radeon, most of the review I read pointed out that

1. Nvidia Cards were faster
2. Ati image quality is superior

Now, this could lead someone to believe that Nvidia always played that game of reducing image quality for speed, but since they were the fastest, nobody really bother to check why, until Ati produced a card that was both faster and looking better.

Korval
09-19-2003, 12:44 PM
Now, this could lead someone to believe that Nvidia always played that game of reducing image quality for speed, but since they were the fastest, nobody really bother to check why, until Ati produced a card that was both faster and looking better.

I seriously doubt that the reason for nVidia's prior performance lead was reduced image quality (though nVidia cards have never been image quality powerhouses). The GeForce 4, for example, was a faster card doing, pretty much, anything compared to an 8500. It could access textures faster, had faster memory, faster fragment programs (2 register combiners can do 4 dot products in the same time the 8500 did 1), faster vertex programs, and various other performance improvements.

Ostsol
09-19-2003, 02:08 PM
Actually, I remember seeing some -really- old video card reviews where image quality between several cards was compared. I think it was in the days of the Rage 2, original TNT, and such. . . Or maybe even older. . .

zeckensack
09-19-2003, 11:30 PM
Just a minor, somewhat OT-ish correction:

Originally posted by Korval:
The Catalyst drivers are all the 3.x series; same codebase. The Det50's have major rewrites of significant portions of code.Catalyst release versions only designate year of release and consecutive release in the year. Ie 3.7 is the seventh public release in 2003.
You won't be able to tell from that number whether or not ATI did a major rewrite or an incremental update.
But IMO "major rewrites" probably never happen with anything as complex as a graphics card driver. You may get a major rewrite of a particular subsystem (say, they implement an all new shader optimizer because the old design was too limited in scope). But that's about as far as it can reasonably get.

M/\dm/\n
09-21-2003, 05:53 AM
Ah. . . So the lack of trilinear filtering on all texture stages and the lack of aniso on texture stages other than 0 in UT2003 is a feature. . .


Ati never had a trilinear filtering in UT, at least if don't override this 'feature' with 3rd party driver config utils http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Ostsol
09-21-2003, 06:58 AM
Originally posted by M/\dm/\n:
Ati never had a trilinear filtering in UT, at least if don't override this 'feature' with 3rd party driver config utils http://www.opengl.org/discussion_boards/ubb/biggrin.gif
Actually, ATI has specifically said that their "Quality" filtering is working exactly as they meant it to. They never said that it was a bug at all. Also, even if trilinear is only on the first texture stage, at least aniso is applied equally to -all-.

Anyways, you don't need a "3rd party driver config utils" in order to get trilinear filtering in UT or UT2003. All you do is set it to application preference and use the application's options. Simple. Too bad NVidia's drivers aren't allowing that to work properly.

Zak McKrakem
09-22-2003, 10:12 AM
More on this:

"Besides this, we have checked the IQ of Detonator 51.75. And we found new "optimizations": Texture stages 1-7 on all D3D games donīt get a trilinear filter, only a pseudo-trilinear (bilinear/trilinear mix). Under Unreal Tournament 2003, texture stages 1-7 are only pure bilinear filtered. Plus, texture stages 1-7 donīt show any anisotropc filtering above 2x on all D3D games. All this in the driverīs application mode!"
http://www.3dcenter.org/artikel/2003/09-20_english.php

Korval
09-22-2003, 11:30 AM
at least aniso is applied equally to -all-.

This is not, necessarily, a good thing. As textures begin to see more non-image use (textures as tables of numbers), anisotropic filtering can begin to actually cause problems. I would be concerned if a user can suddenly decide to force anisotropic filtering on image textures.

Granted, I hope drivers are smart about it, and disable aniso for textures setup for point-sampling.

MrShoe
09-22-2003, 02:26 PM
At the risk of sounding like an "NVIDIOT", here goes.

When the NV35 first came out, review sites like anandtech and tomshardware, and lots of others were full of praise for it: http://www.anandtech.com/video/showdoc.html?i=1821 http://www.tomshardware.com/graphic/20030512/index.html
And a quote from that review from Toms:

"Now, the FX 5900 is able to outpace the Radeon 9800 PRO in all relevant benchmarks and can reclaim the performance throne for NVIDIA. The card offers unrivaled FSAA speed combined with very good anisotropic filtering image quality and performance, thanks to the new Detonator FX driver, giving it a comfortable lead over its rival."

Now, however, NVIDIA is getting a lot of **** for the same card, which was getting high praise just a little while back. I can understand partly that people like us (programmers) are pissed off that to get the best performance out of it we have to use NVIDIA specific shader or whatever. But it still doesnt seem to justify the cries of "NIDIA sucks donkey balls". Sure, the whole clip planes saga in 3DMark pissed people off, but new drivers got rid of that bug, and actually improved performance in some of the tests.
Yeah, naturally both cards have their strengths and weaknesses, but is it really justified to vilify NVIDIA. A perfect example is a quote from Gamersdepot:

"It's in our opinion that both end-users and OEM's should be highly cautioned into buying any of NVIDIA's current hardware under the assumption that it'll offer the best gaming experience for DirectX9-class games. Until NV40 rolls out, it appears for now that NVIDIA will be sitting on the side-lines."

I just totally dont understand how the same website (toms in this case), can say in the original review that the NV35 is the fastest card in the world, and then a few months down the track say its ****. Now, im not crying about any "ATi conspiracy", that is moronic, but im just asking, why has there been such a change of heart?

Ostsol
09-22-2003, 03:42 PM
Originally posted by Korval:
This is not, necessarily, a good thing. As textures begin to see more non-image use (textures as tables of numbers), anisotropic filtering can begin to actually cause problems. I would be concerned if a user can suddenly decide to force anisotropic filtering on image textures.

Granted, I hope drivers are smart about it, and disable aniso for textures setup for point-sampling.
Nope, drivers are by default not smart about it. If you force aniso, it gets applied to all textures -- I've actually even tried this with a texture I specified to be point-sampled. That's why letting the application determine texture filtering is always the best choice. However, if the app is not allowed to choose. . .

Korval
09-22-2003, 04:18 PM
I just totally dont understand how the same website (toms in this case), can say in the original review that the NV35 is the fastest card in the world, and then a few months down the track say its ****.

Stop listening to them. I only pay attention to Anandtech, because they seem to be the most fair about their benchmarks and tests (and they are very thorough in both their benchmarks and their analyses).

Note that Anandtech did not give the 5900 a gigantic thumbs-up. They said it was better and more worthy of nVidia, which is a reasonable conclusion, given that none of the software was DX9 or used ARB_fp on nVidia hardware. And you can't argue with the benches; the 5900 did beat the 9800 on several benches, so their conclusion is valid.

But they didn't say that it was reasonable to run out and buy all nVidia cards; their 5600 vs. 9600 proclaimed the ATi card the winner (technically, the 9500Pro beat them both http://www.opengl.org/discussion_boards/ubb/wink.gif, but that's a different story). They said that the differences between the 5900 and the 9800 were small, which is true as long as floating-point performance is not tested (which it wasn't, since no benches existed to do so).

Nowadays, there are benchmarks that test floating-point performance. And it is reasonable for these sites to factor these new tests into their prior evaluations of nVidia and ATi cards.

Tom78
11-08-2003, 05:06 AM
Originally posted by Korval:
I seriously doubt that the reason for nVidia's prior performance lead was reduced image quality (though nVidia cards have never been image quality powerhouses). The GeForce 4, for example, was a faster card doing, pretty much, anything compared to an 8500. It could access textures faster, had faster memory, faster fragment programs (2 register combiners can do 4 dot products in the same time the 8500 did 1), faster vertex programs, and various other performance improvements.

You are right, the GF4Ti is faster, but you are comapring two different generations. If you want to compare the 8500 with a NV equivalent, you will have to take the GF3.
And can you tell me where the PS support for version 1.4 is ? ATI has it in it's silicon.

cu
Tom

OldMan
11-09-2003, 02:41 AM
In fact they are no equivalent.

We had GF3.. and GF4ti.. and in the Middle time among them the 8500. That generation cannot be compared in a fair way.

And now that we have 5700..+ new drivers what all of you think about it?

It seems to have great improvements.

Zak McKrakem
11-09-2003, 08:05 AM
http://www.notforidiots.com/ULE.php