PDA

View Full Version : OT - 3DMark 03



Adrian
02-11-2003, 09:14 AM
3DMark 03 has just been released.

You can download it from Fileshack among others. www.fileshack.com (http://www.fileshack.com)

There are some screenshots here www.shacknews.com (http://www.shacknews.com)

There's a review of it here http://www.pcextreme.net/3dmark03.php

[This message has been edited by Adrian (edited 02-11-2003).]

JackM
02-11-2003, 09:56 AM
White paper on specific tests (http://futuremark.allround-pc.com/)

dorbie
02-11-2003, 10:46 AM
There seems to be a lot of demand, are there any other download sites. I can't access any of these and I ain't paying fileshack.

Adrian
02-11-2003, 10:50 AM
It all seems a bit broken right now. Pages linking to the mirrors are broken to. Futuremark is down. I hope Id learns from this, it's a bit of a joke.

PH
02-11-2003, 10:51 AM
I was lucky to get into the download queue on fileshack 5 minuttes after it's release ( pure luck ). I'll just go install it now http://www.opengl.org/discussion_boards/ubb/biggrin.gif.

Nutty
02-11-2003, 11:25 AM
Yeah everywhere seems completely hosed..

Hey PH, fancy ftping it to me tonight? http://www.opengl.org/discussion_boards/ubb/smile.gif

zeckensack
02-11-2003, 01:06 PM
The demo has one really good scene, when the shot down bomber scratches over the ground. Nice grass and stuff. The rest ist kinda ... well dull. They use plenty of post processing for DOF effects, which essentially means everything looks like upsampled from low-res (or quincunx, if you prefer). Quite demanding on the gfx card, but IMO not pretty, not at all.

The artwork itself is brilliant.

Unfortunately they seem to have been on a lets-strip-the-last-bit-of-functionality-from-the-freeware spree. Usability sucks big time. You can't even look at your detailed results without registering with them. You can't select which tests to run. You can't select resolution for the benchmark and demo runs. Nothing. All it lets you do is save your results as an .rll, which is a zip archive containing two XML files, but the dialect is not documented and looks strange.

And it dared to fiddle with my file associations, I bet they had a reason for that http://www.opengl.org/discussion_boards/ubb/redface.gif

Short summary: what once was a cool demo troupe now went completely nuts.

Oh yeah, scene two is pretty much like Doom 3 from a technology point of view. Runs pretty damn slow too (single digit fps on a Radeon 8500, 9700Pro in the thirties or so).
It uses normal mapping to reduce geometry complexity and of course stencil shadows. IMO they didn't make good use of the shadows (ie they do look correct, but they don't add much to the overall visuals), but it's interesting to look at. Their normal mapping seems to work very well.


[This message has been edited by zeckensack (edited 02-11-2003).]

[This message has been edited by zeckensack (edited 02-11-2003).]

[This message has been edited by zeckensack (edited 02-11-2003).]

JustHanging
02-11-2003, 01:10 PM
AND it doesn't run on my tnt http://www.opengl.org/discussion_boards/ubb/frown.gif

Elixer
02-11-2003, 01:56 PM
ftp://216.247.236.66/htdocs/mg-3dmark03.exe Is a good one.. Majorgeeks ftp...

Anyway, if you have anything less than a GF3, then don't bother getting it, you can only see 1 'game', and the fill rate test, and a sound test, and 2 others ones.

I think only the 9500/9700 can do all tests + all 'game' tests, until the GF FX comes out...

pkaler
02-11-2003, 02:54 PM
Please mirror on your p2p network of choice if you have downloaded it.

dorbie
02-11-2003, 03:03 PM
Sigh, I should have it in 12 hours according to the estimate. The servers are really getting hammered.

I'm trying to repair my kazaa installation now (probably broken by anti-spyware programs I like to run) but the installer is also a downloader. Heck kazaa will probably be a wash anyway, it's almost never fast, even when I had some ungodly high karma on it it didn't help.

Nutty
02-11-2003, 03:05 PM
Get Kazaa Lite dorbie. Adaware even recognises its dummy advert dll, and doesn't remove it. Best way to get fast downloads on it, is to pick files with the most users.. I can regularly max out my bandwidth on it.

Currently using getright, to switch between a list of about 8 mirrors.. should be down in about 20 minutes..

j
02-11-2003, 04:51 PM
It's for situations like this that people really should start using BitTorrent.

j

Adrian
02-11-2003, 04:56 PM
Originally posted by j:
It's for situations like this that people really should start using BitTorrent.

j

Yes that's how I got it in the end.
Download this http://bitconjurer.org/BitTorrent/
then use this link http://ssiloti-2k.uccs.edu:8080/freestore/pyrogen/3DMark03.exe.torrent

btw I agree with zeckensack's verdict, I wasn't overly impressed.

Ostsol
02-11-2003, 05:37 PM
I like the tests, but I'm really disappointed by the demo. Except for the first part of the demo, it's just the tests + sound. Compare this to 3dMark2000, whose demo was -very- nice to watch.

pkaler
02-11-2003, 05:46 PM
If anyone out there is using Linux, I'd recommend GTK-Gnutella. Works very well.

dorbie
02-11-2003, 06:33 PM
This BitTorrent is nice but err... shouldn't P2P networks like Kazaa do this anyway with the bonus that I can search? Damn. I wish there weren't so many darned BW freeloaders and MP3s on the thing. There's not even a single copy on Kazaa yet.

BTW, is there a searchable directory of bittorrent stuff or is it all just invisibly related to the original URL on the server?

zed
02-11-2003, 07:40 PM
whats this i hear about nvidia not being to happy with it + in fact aint gonna support 3dmark2003 http://www.opengl.org/discussion_boards/ubb/smile.gif
cant say i blame them i downloaded 3dmark2000 (or 2001) a couple of years ago + was totally unimpressed (from a technical viewpoint even though others raved, i remember i was pissed off at the time cause it was a huge download http://www.opengl.org/discussion_boards/ubb/smile.gif )

aside from that even though the scenes might look like games they are in fact not games. give me quake3 (or doom3 when it comes out) Unreal2 benchmark figures anyday. a far more accurate representation of what the hardware actually does

dorbie
02-11-2003, 08:27 PM
2001 was pretty cool, the best up to that point, it's still worth a download. It is ubiquitous in reviews. I'd be surprised if 2003 suddenly lost all that ground.

P.S. how about posting some results to this thread for various cards?

[This message has been edited by dorbie (edited 02-11-2003).]

Zeno
02-11-2003, 09:34 PM
Score: 4383
CPU: Athlon 2800+
GPU: ATI 9700 Pro with Catalyst 3.0
RAM: 1024 MB DDR 333
MOBO: Nforce2


It runs fairly well on my system except that the benchmark stutters horribly for me in everything except the air combat game. Any hints about how to get rid of this?

-- Zeno

[This message has been edited by Zeno (edited 02-12-2003).]

NitroGL
02-11-2003, 10:46 PM
Kill any unneeded background running processes.

[This message has been edited by NitroGL (edited 02-11-2003).]

dorbie
02-11-2003, 11:22 PM
Yep, my performance was very inconsistent until I killed ICQ & Kazaa.




Score: 3928
CPU: Athlon 1900+
GPU: ATI 9700 Pro with Catalyst 3.1
RAM: 512 MB DDR 333
MOBO: (chipset) KT400 AGP 8X



Unfortunately we can't isolate all that CPU & physics stuff they've polluted it with.

[This message has been edited by dorbie (edited 02-12-2003).]

mcraighead
02-12-2003, 12:38 AM
Originally posted by zeckensack:
Oh yeah, scene two is pretty much like Doom 3 from a technology point of view.

Nah, it's actually quite a bit different, and really bizarre, in fact... the following is how someone described it to me.

Apparently they project *every* edge of *every* triangle. So non-silhouette edges generate two triangles, one front-facing and one back-facing, which cancel out in stencil. Lots of unnecessary fill rate there.

In addition, I believe they use an extremely long vertex program to do skinning and then to compute these projections.

It's a good example of how *not* to write a stencil shadow engine. If you're going to do stencil shadows, you really ought to compute facingness on the CPU and locate silhouette loops.

If Futuremark claims that these scenes are representative of Doom, well, they're out of their mind. Doom makes neither of these mistakes.

- Matt

Nutty
02-12-2003, 12:39 AM
whats this i hear about nvidia not being to happy with it + in fact aint gonna support 3dmark2003

Have a read of; http://www.nvnews.net/vbulletin/showthread.php?s=&threadid=7455

I must say it's all rather disapointing. I've seen at least one website show 3dmark2003 scores for the gf-fx using different drivers, and it shows a marked performance improvement.

I guess nVidia just need time to get their latest architecture operating efficiently. But everyones slagging them off in the meantime.

They seem to be in extreme damage control mode at the mo..

knackered
02-12-2003, 02:06 AM
Unfortunately, Futuremark chose a flight simulation scene for this test (game 1). This genre of games is not only a small fraction of the game market (approximately 1%), but utilizes a simplistic rendering style common to this genre. Further, the specific scene chosen is a high altitude flight simulation, which is indicative of only a small fraction of that 1%

It's always annoyed me that all vendors (not just nvidia) seem to be obsessed with high-occlusion graphics (ie. Quake and upwards).
I've always suspected that this early-out ztest thing they do to avoid rendering occluded stuff has an impact on the performance of scenes with low occlusion.
What about games like Battlefield 1942? That is a mixture of high level (low occlusion) and low level (high occlusion) views of the scene. It's a better game because of it, too.

By the hardware vendors narrowing the targets of their hardware optimisations they're actually narrowing the choice of game genres...well, maybe that's a bit strong, but you get the essence of what I'm saying http://www.opengl.org/discussion_boards/ubb/smile.gif

PH
02-12-2003, 06:29 AM
Score: 4364
CPU: Athlon 1800+
GPU: ATI 9700 Pro with Catalyst 3.1
RAM: 768 MB DDR 266
MOBO: (chipset) A7V266-E AGP 4X fastwrites on



Originally posted by mcraighead:

Apparently they project *every* edge of *every* triangle. So non-silhouette edges generate two triangles, one front-facing and one back-facing, which cancel out in stencil. Lots of unnecessary fill rate there.

In addition, I believe they use an extremely long vertex program to do skinning and then to compute these projections.

It's a good example of how *not* to write a stencil shadow engine. If you're going to do stencil shadows, you really ought to compute facingness on the CPU and locate silhouette loops.


I'm glad someone agrees that silhouette computation should be done on the CPU for this type of engine http://www.opengl.org/discussion_boards/ubb/smile.gif.

The nature demo does look very nice though ( almost real from certain angles ). Also, the pixel shader 2.0 test with procedural textures was interesting.

dorbie
02-12-2003, 06:41 AM
Carmack considered the "project every triangle" approach in programmable geometry at one point. He mentioned it in a .plan a while back. He said for geometry it was about equivalent overall to the alternative with the advantage that it freed up the CPU. Clearly for fill it wouldn't be.

One relevant issue here is what do you want to measure in a graphics benchmark. There's a lot to be said for measuring graphics in the graphics tests and not the CPU. At one point NVIDIA was advocating this kind of thing to promote vertex programmability.

It isn't equivalent to Doom3 but why does that matter? Measure Doom3 performance when it arrives. This test measures graphics performance with a different profile, with more emphasis on stencil tested fill and long vertex programs.

Testing a long vertex program is a GOOD thing. It doesn't matter greatly exactly what it does, get over it, it doesn't have to test what Doom3 does. It tests generic graphics capability with the kind of thing we'll be seeing more of... long vertex programs. More importantly ALL graphics cards will have to do the same work w.r.t. the load the benchmark places on the graphics system, and these graphics operations are at least relevant.

Someone releases a benchmark that tests interesting graphics stuff and you shoot it down because it doesn't offload more to the CPU. Has NVIDIA gone insane?


[This message has been edited by dorbie (edited 02-12-2003).]

PH
02-12-2003, 07:03 AM
One relevant issue here is what do you want to measure in a graphics benchmark. There's a lot to be said for measuring graphics in the graphics tests and not the CPU.


Interesting point. I don't really use these benchmark programs much ( mostly for curiosity ). If they wanted to benchmark graphics, I wondor why they put those ragdoll physics and sound tests in. I was under the impression that they wanted to create "game like" environment for their benchmarks ?

tellaman
02-12-2003, 09:32 AM
i guess the main reason why nvidia is not supporting 3dmark2003 is that it actually doesnt reflect today games
and i'm quite sure it wont reflect games for the future one or two years (and at that time another 3dmark will probably be released)
i mean, i can run today games no problem with my ti4600, where it falls back is on pixel shaders and 3dmark game tests seem to stress on this technique
in demo mode you also get lots of post editing based on full screen processing
and i guess game developers wont use such stuff for some time
even in bench mode the quake-like game is slower than doom3 alpha
summing it all up: this is a feature benchmark, not something that can reflect performance on games
just my 2 cents...

zeckensack
02-12-2003, 10:19 AM
Matt,
It was quite obvious to me that they did something wrong, judging by the performance alone. In addition I've already heard some tidbits about their technique. Thanks for confirming http://www.opengl.org/discussion_boards/ubb/wink.gif

What I should have said in the first place is that they render Doom-like visuals. The strategy is the same, but their approach is very brute force, to keep it polite.

I wouldn't want that stuff going on in an actual game engine, hell no. But if they overemphasize fill and kind of counter that with an equal 'waste' of vertex processing, that would be fine for a graphics benchmark.

What I really take issue with is that they went completely nuts on vertex shaders.

[This message has been edited by zeckensack (edited 02-12-2003).]

zeroprey
02-12-2003, 11:09 AM
Originally posted by dorbie:
Carmack considered the "project every triangle" approach in programmable geometry at one point. He mentioned it in a .plan a while back. He said for geometry it was about equivalent overall to the alternative with the advantage that it freed up the CPU.

close but not really. what he was talking about was the vertex program method of doing stencil shadows which you still only project the silhouette you just do it differently. i know its been discussed a lot here before and it turns out to use something like 6x the memory. so when seeing it get the same performance im sure there wasnt that much thinking involved for carmack to pick the cpu based ones.

this project everything system shocks me. ive never even heard of anyone doing this before because it sounds just absurd. if they really wanted to stress test the fillrate why not put many more objects as would be in a scene that would need fillrate like that. and if vertex programs was what they wanted to test then they could have done the vertex program shadow method the way its supposed to be done. with my gf3 i was getting 1-5fps during that test. if anyone’s seen the doom alpha it is far faster on the same hardware doing a scene that looks imo much better and on top of it had sound and other cpu things going on (ai, etc). sounds to me they got lazy and thought that because they were stress testing they didnt need to optimize or even do things in a fast manner.

mcraighead
02-12-2003, 11:29 AM
Originally posted by dorbie:
At one point NVIDIA was advocating this kind of thing to promote vertex programmability.

I don't think we've ever advocated using vertex programs for this particular purpose... there are a lot of things they're good for, but this is not one of them!


Originally posted by dorbie:
Someone releases a benchmark that tests interesting graphics stuff and you shoot it down because it doesn't offload more to the CPU. Has NVIDIA gone insane?

I don't know all the details of the benchmark; there may or may not be additional problems with how it works. All I know about is the way it does stencil shadows, which is just about unspeakably lame. What in particular pisses me off is that Futuremark seems to be suggesting, if not saying outright, that these scenes are comparable to Doom. They aren't. They aren't even *close* to what Doom does.

But the benchmark and what it does is really only half the story. The other half has to do with Futuremark's business model. I'll give you a hint: how does a benchmark company make money? I'll give you another hint: *not* by selling the benchmark to end users.

- Matt

knackered
02-12-2003, 12:10 PM
Were NVidia out-bidded then, matt?
Fair play, then.

I think there should be a group of models/scenes arranged into a demo, say:-
Test 1) Render mesh#1 with diffuse/specular bumpmapping
Test 2) Render mesh#1 with fixed function
Test 3) Render mesh#1 with stencil shadows
Test 4) Render mesh#1 with shadow maps

Repeat for a few more meshes/scenes.

The only things that should be a fixed requirement of the benchmark should be the data, the window size, the camera position, the fov, and the visual effect.
Each vendor can code the demo however they like. Then nvidia can use VAR directly etc.
I think that would maybe sort the men from the boys.
Mad idea. Maybe CPU vendors should do the same.

zeroprey
02-12-2003, 12:19 PM
that is a interesting idea. good to test engines as well. not sure itd be a good thing for reviews for cards more so for engines. would be a cool contest though.

MZ
02-12-2003, 12:28 PM
from www.extremetech.com (http://www.extremetech.com) :
Being a beta partner at all required us to pay money to Mad Onion
I must admit I was shocked. This makes Mad Onion a parasite. If this is true, they deserved to get the spank. Besides, they are notorious DirectX-ass-kissers, so I won't be missing them http://www.opengl.org/discussion_boards/ubb/wink.gif

On the technical side, I can't understand one thing:
Does it make sense for DX9 benchmark to run scene on ps1.1 only and use 80MB of textures??
Anyone knows how 64MB R9500 scores?

Adrian
02-12-2003, 12:54 PM
Well that completely discredits 3dmark. It never crossed my mind that they wouldn't be impartial. I knew there were dirty tricks going on but this is worse than I imagined.

I now understand why hardocp have decided not to use it.

[This message has been edited by Adrian (edited 02-12-2003).]

zeckensack
02-12-2003, 12:54 PM
Originally posted by MZ:
I must admit I was shocked. This makes Mad Onion a parasite. If this is true, they deserved to get the spank. Besides, they are notorious DirectX-ass-kissers, so I won't be missing them http://www.opengl.org/discussion_boards/ubb/wink.gifJust what I was thinking http://www.opengl.org/discussion_boards/ubb/wink.gif

On the technical side, I can't understand one thing:
Does it make sense for DX9 benchmark to run scene on ps1.1 only and use 80MB of textures??
Anyone knows how 64MB R9500 scores?[/B]I like this question. My trusty R8500LE 64Meg scores a brilliant square 1000 points.
I haven't found any 128Meg scores yet, but I'll let you know.


[This message has been edited by zeckensack (edited 02-12-2003).]

HS
02-12-2003, 02:43 PM
Funny!

Nvidia never said a bad word about the older 3DMarks (which where in fact as useless as this new benchmark) but since their cards came out top they showcased them.

Now the tide has turned, Nvidia doesnt shine anymore and all because Futuremark became a greedy company over night that only wants to make money and in that process they became to stupid to program an efficent 3d engine.

Give me a break!

I for one expect a benchmark to stress the resources.

-So it doesnt reflect "a current 3d engine" yeah and?

Oh right you mean the drivers dont have a path for that kind of utilization yet?

-So their stencil shadow method is *very* fillrate intence?

Did it crossed your mind that this maybe could be on purpose?

Isnt that exactly an area where the 3d acceleration should kick in? I agree that this not very good method for shadows on *current* hardware but did the last benchmark looked that good on hardware that was available on its release?

Oh I forgot that was the Geforce3....

So *current* hardware cant beliver the fillrate/bandwidth to make the 'brute force' method faster then the 'smart one' but thats true for how long?

Have a nice day!

zeckensack
02-12-2003, 02:48 PM
Originally posted by HS:
So *current* hardware cant beliver the fillrate/bandwidth to make the 'brute force' method faster then the 'smart one' but thats true for how long?In this instance, forever.
Drawing useless tris that cancel each other out is never going to be faster than not drawing them in the first place. And no, TBDRs aren't clever enough to detect that neither.

zed
02-12-2003, 08:05 PM
>>If they wanted to benchmark graphics, I wondor why they put those ragdoll physics and sound tests in. I was under the impression that they wanted to create "game like" environment for their benchmarks ?<<

correct, this benchmark gets hyped as 'this is a GAMES benchmark'. where in fact its a demo. where the user cant control the camera.
and as we all know that enables MAJOR MAJOR optimations

>>They aren't even *close* to what Doom does<<

correct (IMO) even though visually it might look similar to the layman to doom3 (they could of choosen different style models) it aint nowhere close under the hood, of course this will create a wrong impression how doom3 will perform on their systems, they will prolly shell out cash on a non optimal graphics card.

rgpc
02-13-2003, 03:06 AM
Well I just ran 3dMark03 on my system (AMD 2400+ xp, gf3 ti200 & 512Meg DDR PC3200) and it ran like absolute crap. Most demos I averaged around 3fps.

It really looks to me like an extremely poorly written demo. Mind you it is a DX product so like a large number of DX products we should expect it to run like crap... (ooh that reminds me, better go install the patch to CFS 3 and see if it helps any...)

knackered
02-13-2003, 03:21 AM
Now slagging off another API isn't going to help any, is it?
You don't have to debug the source code, so why complain about it using d3d?
I have nothing to contribute to this 3dmark2003 topic, by the way http://www.opengl.org/discussion_boards/ubb/smile.gif
Carry on...

dorbie
02-13-2003, 03:45 AM
Oh I agree it's not equivalent to Doom3, like I said run Doom3 for Doom3 results. What it does do is stress graphics in interesting relevant ways.

Unless there were something bad like foolish dispatch, graphics stalls or use of irrelevant slowpath stuff there's no good reason for dissing it. I see no evidence of any of these.

The hint Matt gave was interesting, alluding to a 'give us money to optimize for your platform' business model. Another good reason to walk away I suppose, but perhaps a risky strategy, time will tell.

3DMark is not about getting a fast frame rate for the demo, it's about stressing graphics during the demo, this is a significantly different design goal. Did it ever occur to you that they DELIBERATELY chose to project for shadow volumes in a vertex program instead of software (in the 3D test not the CPU test).

There has always been irrelevant stuff in the 3DMark tests, the geometry test with the dragon carrousel and many light sources has nothing to do with games, but I thought it produced interesting results that are useful. But in this case the 3DMark tests are more relevant than they have ever been. There's a load of transparent atmospheric blended fill and geometry in the aircraft test, that's something you want to measure. There's a lot of stencil fill and multipass stencil tested fragment shading in the starship troopers demo. Skining is relevant. The point is to stress graphics paths they expect to be used in future games and they definitely DO.

As for NVIDIA advocating VP for shadow volume projection, I think they even released sample code, but read the Carmack .plan where he mentioned hardware volume projection, I think he mentions it in there.

[This message has been edited by dorbie (edited 02-13-2003).]

evanGLizr
02-13-2003, 06:50 AM
Originally posted by mcraighead:
I don't think we've ever advocated using vertex programs for this particular purpose... there are a lot of things they're good for, but this is not one of them!

Well, last time I checked David Kirk was still working at NVIDIA http://www.opengl.org/discussion_boards/ubb/wink.gif:



Robust Way: Stencil Shadow Volume Generation with Vertex Shaders

Adulterate your polygon model
- Add degenerate quads at every edge
- Normals for new degenerate quad vertices come from real geometry
Vertex shader tests vertex normal N \dot L
- Front-facing vertices are unchanged
- Back-facing vertices are pushed to FAR
Has the effect of extruding silhouette edges away from light to produce shadow volumes
Always works, for all objects (imagine a single triangle)

This is from David Kirk's Advanced Programmable Shading: Beyond Per-vertex and Per-pixel Shading (http://www.microsoft.com/mscorp/corpevents/meltdown2001/ppt/Externals/NVidia_David_Kirk.ppt) Meltdown 2001 presentation.

The presentation also suggests the method of not adulterating the model and sending to infinity only vertices from (lightspace) backfaces (only works for tesselated-enough models).

Anyway, I think it's in line with what 3dmark should mainly be, a graphics card test, and I think it's quite a clever approach: otherwise what do you want a T&L card for if you have to do all the lightspace transformations in the CPU anyway because of silhouette calculation?

Eric
02-13-2003, 07:30 AM
Well, I have downloaded it as well and got a score of ... 1377!

I have a dual Athlon MP 2000+ with a GeForce4 Ti4200 (64Mb) and 1Gb of RAM.

I must say I don't like 3DMark2003 as much as I liked 3DMark2001. Somehow, it misses the "wow!" effect of the previous version. Granted, its purpose is to "benchmark", not to impress but I am disappointed...

One thing that really bothers me is that they call this product "The Gamer's benchmark" while I read that the 3D engine in this new flavour is more geared towards demos than games (I understand the previous engine was used for Max Payne?). I won't go too deep into this because it may be a lot of crap but why creating a "gamer" benchmark that doesn't use any technique used in current/future games?

I understand Dorbie's point of view when he says that some of the rendering techniques may have been chosen on purpose to stress specific capabilities of the card and I agree that it may give interesting results. But if this doesn't represent what you'll get in games then it makes 3DMark2003 pointless for 95% of people who downloaded it (they don't know it yet...)

Now I don't mind that the Radeon 9700 comes first in all the tests: I really like NVIDIA products/people but that doesn't mean I should spit at ATI when they produce a good graphics card. On the other hand, if some of this difference is due to optimizations that were "bought" to FutureMark, I find it more bothering...

Anyway, it'll be interesting to see how the end users sees the results given by this new 3DMark: as someone said, it may become a problem for NVIDIA if people choose their card by these numbers!

Regards,

Eric

dorbie
02-13-2003, 08:51 AM
Whoa, nobody is accusing ATI or NVIDIA of anything. There is a general issue w.r.t. optimization, but seems more like extortion than buying favours to me :-) (and I'm not saying that is the case). Heck the whole thing was inferred from one comment, there could be other interpretations, like licensing for optimization. For example you could charge an arm and a leg for your software to IHVs who need it to optimize drivers for it, charge some for web sites and make it free to consumers. When they start getting too big for your britches a benchmark company might try to change terms in all sorts of ways, I mean where's their growth opportunity.

As for game benchmarks, you can run games for those. Games benchmarks always have varying elements of CPU performance. 3DMark SHOULD be abstract to a degree and test the card as much in isolation as possible. As long as it uses paths that games use it's relevant. If they expand into other areas like CPU that's a separate issue.

3DMark 2003 blows 2001 away, I think the bar has been raised, that's all. I don't think I've seen a game with as good lighting as the starship troopers demo, with the exception of Doom. Adn teh new Nature demo is better than anything else I've seen, but I'd seen it before at the GeForce4 launch.

V-man
02-13-2003, 10:13 AM
Originally posted by Adrian:
Well that completely discredits 3dmark. It never crossed my mind that they wouldn't be impartial. I knew there were dirty tricks going on but this is worse than I imagined.

I now understand why hardocp have decided not to use it.

I dont care much for these benchmarks, since we don't know what exactly is going on in them. By that I mean it should be open sourced!

It's been long known that 3dmark favors Intel and optimized their code specifically for Intel chips. If you were to benchmark a real world app, you would get AMD as the winner or close performance to Intel, but with these guys, ...

I dont take benchmarks seriously ( unless they are my own of course http://www.opengl.org/discussion_boards/ubb/smile.gif )

dorbie
02-13-2003, 10:16 AM
I heard about the Intel vs AMD benchmark debacle, don't think it was 3DMark though, but PCMark, but I'm not sure. Hmm.. things seem to fall into place.

Adrian
02-13-2003, 10:24 AM
Originally posted by dorbie:
I heard about the Intel vs AMD benchmark debacle, don't think it was 3DMark though, but PCMark, but I'm not sure. Hmm.. things seem to fall into place.

I thought it was sysmark http://www.opengl.org/discussion_boards/ubb/smile.gif

knackered
02-13-2003, 12:31 PM
176mb?!!! Do I look like a bitch?!!

HS
02-13-2003, 12:34 PM
Originally posted by zeckensack:
Originally posted by HS:
So *current* hardware cant beliver the fillrate/bandwidth to make the 'brute force' method faster then the 'smart one' but thats true for how long?
In this instance, forever.
Drawing useless tris that cancel each other out is never going to be faster than not drawing them in the first place.


I disagree.

Since the GPU's increase in speed alot faster then CPU's, the CPU and the AGP bus will soon become a bottleneck.

As GPU's are increasing in speed, there will be a "break-even" point soon enough.

[This message has been edited by HS (edited 02-13-2003).]

dorbie
02-13-2003, 12:45 PM
Adrian you are right I think. So it's a completely different company.

HS any break even point will depend on resolution and some other details and it applies now too. Brute force potentially costs a vast ammount of pixel fill and maybe more CPU->GPU bandwidth. The other costs CPU cycles.

I doubt the brute force approach will ever be the way to go, but perhaps with more sophisticated vertex programs on the GPU you will be able to implement silhouette projection on the GPU instead of triangle projection. Realistically that is the more likely future IMHO. I doubt brute force projection will ever be the right thing to do.

mcraighead
02-13-2003, 02:57 PM
"Extortion" is a rather strong word, but it is interesting to read their page on their beta program.
http://www.futuremark.com/betaprogram/

I have absolutely no idea what the membership levels cost ($100? $100K? could be anything), but being a beta program member definitely costs money.


In part, I think this is being blown out of proportion. From what I saw on the HardOCP preview, with our latest drivers vs. ATI's latest drivers, we were ahead a little bit. Obviously, ATI, being a "strategic beta member" and all, has had a lot more time with the benchmark than we have, so their older drivers could have been tuned for this benchmark already -- not to mention that they undoubtedly had more influence over the benchmark design. So if, despite all that, we can *still* win, we must be doing something right...

- Matt

mcraighead
02-13-2003, 03:01 PM
Also, as for that thing about adding degenerate quads... yuck!!! We've learned a lot about how to do shadow volumes since then.

- Matt

Nutty
02-13-2003, 03:27 PM
From what I saw on the HardOCP preview, with our latest drivers vs. ATI's latest drivers, we were ahead a little bit.

Matt, it seems the latest beta drivers seem to miss out alot of the point sprites/billboards used in the smoke of the aircraft scene. It even does it on my GF4. Prior to this beta driver being used, the score was substantially lower than the 9700 score.

Care to comment?

What _is_ the deal with the GF-FX anyway? I know ppl are dying to ask. But your marketing department claimed "free anti-aliasing in all modes and all resolutions" Clearly this is not the case. Are the drivers just duff at the moment, or is the current core fabs screwed/broke?

I'm not trying to stick the knife in here, but you cant escape the fact that everyone is slagging you guys off at the moment, for what seems to be a bit of a let-down with the NV30. Nvidia it seems doesn't want to say anything, officially anyway.

I'd be really interested to hear whats going on.

Nutty

mcraighead
02-13-2003, 04:16 PM
I haven't heard anything about a bug like that.

I don't know about "free" AA, but we've definitely reduced the performance hit for AA a lot. I've often seen a 10-20% perf drop for turning on 2x AA. Unfortunately, a bunch of the web reviewers screwed up in taking screenshots and incorrectly concluded that our 2x AA wasn't doing anything; and then they didn't run any 2x AA benchmarks. (They also seem to be chronically unable to run benchmarks w/ AA and w/o aniso, or w/o AA and w/ aniso, to assess the effect of the two independently, but that's another gripe for another day. Wouldn't it be great if reviewers followed basic scientific practices like changing one variable at a time?)

However, we do very well in 2x AA modes, and in fact I'd suggest that you'll get better quality *and* faster framerates running at 19x14 with 2x AA than 16x12 with 4x AA.

- Matt

dorbie
02-13-2003, 04:25 PM
So you are saying that the performance hit for 4x AA is huge with FX then right?

mcraighead
02-13-2003, 05:02 PM
No, I didn't say that. You misinterpreted what I said to imply that.

It's a bigger hit, but still significantly reduced from other chips.

- Matt

JackM
02-13-2003, 05:08 PM
Matt, it seems the latest beta drivers seem to miss out alot of the point sprites/billboards used in the smoke of the aircraft scene. It even does it on my GF4

According to this site (http://www.tomshardware.com/column/20030213/3dmark2003-02.html) , the the difference between two drivers versions results in 6fps increase * 7 = 32Marks difference. Unless there is similar stuff missing in shader tests (G2-4), it's most likely driver bug.

Still, the results between drivers are very interesting, to say at least.

dorbie
02-13-2003, 05:29 PM
Sorry Matt, I couldn't resist after that opening :-)


[This message has been edited by dorbie (edited 02-13-2003).]

zed
02-13-2003, 07:13 PM
>> I've often seen a 10-20% perf drop for turning on 2x AA.<<

after seeing some pictures of the 2x AA quality. a 10-20% drop is way too large. even a 1-2% drop is a bit rude http://www.opengl.org/discussion_boards/ubb/smile.gif

gking
02-13-2003, 07:21 PM
Matt, it seems the latest beta drivers seem to miss out alot of the point sprites/billboards used in the smoke of the aircraft scene

The images with the missing particle effects were taken with the 42.86 drivers.

The tests on Tom's Hardware and HardOCP used the 42.68 and 42.67 drivers, respectively.

M/\dm/\n
02-13-2003, 10:08 PM
I got killing score of 10 on my GF2GTS + PII 3,5@124 + 384 RAM, what's interesting - there was no difference with forced 4x antialiasing http://www.opengl.org/discussion_boards/ubb/smile.gif Afterwards I managed to achieve 12 with my GF2 overclocked 200=>250(GPU) 333=>366(VRAM) + SB adressing enabled + 4x AA or AA off.
I like my HW http://www.opengl.org/discussion_boards/ubb/biggrin.gif http://www.opengl.org/discussion_boards/ubb/biggrin.gif

zeckensack
02-13-2003, 10:57 PM
Originally posted by zed:
>> I've often seen a 10-20% perf drop for turning on 2x AA.<<

after seeing some pictures of the 2x AA quality. a 10-20% drop is way too large. even a 1-2% drop is a bit rude http://www.opengl.org/discussion_boards/ubb/smile.gif
Well, that's a bit rude too http://www.opengl.org/discussion_boards/ubb/smile.gif
The shots don't necessarily show the AA (I think Matt just said that), but it's on screen. Prolly the downfiltering takes place in the RAMDAC for 2x at least, so a frame buffer grab won't capture it.

mcraighead
02-15-2003, 02:10 PM
Correct, the screenshots that several websites posted were pretty much worthless. We would render with 2x AA, and their screenshot would only grab a single sample of the two. So it's no surprise that the 2x AA quality would appear to be no better than the no-AA quality; and in fact you can see that the only difference is that the image shifted by a fraction of a pixel. On the other hand, the DAC would use both samples, so if you were to, say, feed the output into a video capture board, you'd clearly see that antialiasing was going on.

What's particularly bizarre is that websites got confused about this, even though we've supported this "filter on scanout" mode in all chips GF4 and up (including GF4 MX). This is nothing new!

We will definitely make it harder for websites to screw this up in the future with new drivers, but it seems that "crap 2x AA quality" is likely to become a permanent web urban legend.

For OpenGL, at least (I don't know the full story for D3D), and in our current drivers, it's really simple. ReadPixels will give you the filtered image, while PrintScreen (which is GDI) will give you the bogus image. You can easily confirm that I'm telling you the truth about this by starting up Quake on any GF4 part with 2x AA. Hit F9 (the game's builtin screenshot, which uses glReadPixels), and hit PrintScreen (again, a GDI screen capture). Compare the resulting images. The difference is very, very obvious.

I find it remarkably shameful that Anand still hasn't updated his review with correct screenshots, even though several other sites have. It's crap journalism in action.

- Matt

dorbie
02-15-2003, 02:15 PM
Matt, an alternative view of this is that it's NVIDIA's fault. When screen capture doesn't capture what's on the screen (on video) it's a *BUG*. The incompetence let me suggest, lies with the driver authors, not the web masters.

DFrey
02-15-2003, 04:42 PM
You have to be kidding dorbie right? I mean, not even the overlay is captured via a simple screen capture. Is this too a failure of the driver authors? No, this is clearly a failure of the person taking the screenshot. A failure of assuming how to take a proper screenshot.

dorbie
02-15-2003, 04:47 PM
You're the one who has to be kidding. I draw an AA scene and screen capture grabs a single sample and you think that's not a bug?

There's a distinction to be drawn between an internal readpixels on a multisample visual and a system level read that should be WYSIWYG. The lack of capture of an overlay is a separate issue from accurately representing what appears in a single visual. But even so two wrongs don't make a right.

[This message has been edited by dorbie (edited 02-15-2003).]

gking
02-15-2003, 04:59 PM
A bug? That's going a bit far (it's not as if the GDI knows what a multisample buffer is). Filter on scanout is almost certainly going to become more ubiquitous in the future, and since there are plenty of equally supported (if not better-supported) methods of taking a screen shot than using PrntScrn, the only obstacle is educating users/reviewers on proper screen capture techniques.

Filter on scanout has caused issues with web sites since the Voodoo 4/5 days -- I can't say I was surprised when similar issues cropped up in GeForce FX reviews.

The issue that Matt takes exception with is incorrect conclusions (reached due to flawed data) not being amended when an explanation and evidence is provided. 2x (and Quincunx) AA are definitely working as advertised, and the quality is exactly like what you would see on a GeForce 3 or 4.

Unfortunately, what has become uncomfortably commonplace on the web these days is posting reviews or articles with flawed information without fact-checking (2xAA not working, QuadroFX 2000s at 324MHz, GeForceFX 5800 Ultra cancellation), in order to "get the scoop" on other web sites. Due to the viral nature of the web, these stories propogate as truth before corrections can be made, and the correction articles (if they are ever made) never get as much publicity as the original erred article.

dorbie
02-15-2003, 05:40 PM
When screen capture gives you results that vary from what's presented on the video, YES it's a bug. Technical excuses w.r.t. architectural flaws and your GDI implementation don't make it any less of a bug.

The web sites do an admirable job IMHO (on the technical side). Technical competentence in journalism is almost non existent, graphics reviews are surprisingly good for such a specialized area (although a lot of it is spoon fed from the big two). A few mistakes here & there are not a big deal, every card is reviewed to death simultaneously on about 10 sites, if you're still confused then you have some sort of problem. The 2XAA missreporting is especially understandable.

The 5800 cancellation was hysterical, but had more to do with traditional journalism and a public desire to gloat. What has NVIDIA said about it? Nothing that I've seen. Are their PR department asleep, or is there some truth to it? If you read between the lines it seems like more of the truth than they'd like is already out there. They don't want the release of their next gen product later this year to gain any kind of attention in the public consciousness incase it deflates 5800 sales. That may change when the next ATI hits the shelves and they try to undermine it by preannouncing again. This is a launched product and they can't find 50 working cards to give away in London.


[This message has been edited by dorbie (edited 02-15-2003).]

gking
02-15-2003, 06:12 PM
Why not call it Microsoft's bug, for making the GDI such an opaque layer in the first place?

Just because one method of capturing the frame buffer doesn't work properly is hardly a reason to handicap hardware for all current and future versions of Windows. Besides, using built-in screen capture functions in games (or using a 3rd party utility, like HyperSnap) will always dump a screen shot directly to a file, instead of requiring lots of Alt+Tab switching between the game and your image editing app of choice, making it more convenient for web reviewers while significantly reducing the likelihood that these mistakes will be made.


The 2XAA missreporting is especially understandable.

But is not correcting it? In any other form of journalism, when mistakes are made, some effort is made to correct the error. Is it particularly hard to write something like the following paragraph?

"NVIDIA has informed us that using Print Screen to capture screen shots will not capture the anti-aliasing performed by the card in 2X or Quincunx modes due to their bandwidth-savings technology, although the filtering will be visible on-screen. We regret the error, and are currently working with NVIDIA to find a method to properly capture anti-aliased screen shots."

dorbie
02-15-2003, 10:22 PM
Obviously that one site should correct the 2xAA comments I agree, although there are better choices of wording. The notion that these sites are at fault and need to apologize is a rather distorted view of the situation IMHO.

Your comment w.r.t. blaming Microsoft is inane nonsense. If you architect something that doesn't behave correctly you've created your own problem.

[This message has been edited by dorbie (edited 02-15-2003).]

DFrey
02-15-2003, 10:59 PM
dorbie, if you can point out any Microsoft documentation that says the Print Screen key is supposed to capture the output of the DAC rather than contents of the frame buffer, then it is a bug. Otherwise it is not a bug. It is a failure of the person taking the screen capture to understand how to take a complete screen capture.

gking
02-15-2003, 11:12 PM
I don't believe it's anyone's bug (and blaming NVIDIA is as nonsensical as blaming Microsoft). Print Screen still works (and it does capture the image in the frame buffer), it just doesn't perform the filtering that the card is performing because the filtered image is never stored in the frame buffer. The precedent for capture results not matching the DAC output had already been set with overlays (and you can argue that multisample buffers are a form of overlay), so all that happened was a simple breakdown of communication.

[This message has been edited by gking (edited 02-16-2003).]

dorbie
02-15-2003, 11:24 PM
Oh come on. The output of the DAC vs the screen contents? What about what the user is looking at! Infact it gave neither. And lack of overlays in the capture is not a suitable precedent for screwing up when capturing the screen. What about consistency? It's OK to screw up capturing 2 sample but get it right with 4 sample?

Geeze, enough!

:-)

[This message has been edited by dorbie (edited 02-16-2003).]

knackered
02-16-2003, 07:17 AM
Downloaded it, and DX9.
It's pretty crap. I like the final scene with the tombraider/troll things - really like the pixel shading.
The flight sim bit is terrible - the landscape appears to be a single quad textured with a photograph. Has anyone actually seen IL-2 Sturmovik? If you want a flight sim benchmark, then that's the way to go.

dorbie
02-16-2003, 09:44 AM
Here's an interesting article w.r.t. all the fuss over 3DMark at Extreme Tech.
http://www.extremetech.com/article2/0,3973,888060,00.asp

The link to NVIDIA's Tamasi taking pot shots at futuremark is interesting.

This is FM's response:
http://www.extremetech.com/article2/0,3973,891446,00.asp

[This message has been edited by dorbie (edited 02-16-2003).]

dorbie
02-16-2003, 10:08 AM
Hmm looks like a lot of the issue is Pixel Shaders 1.4 support not being there in NVIDIA drivers. They now map the shaders up to PS 2.0 or down to PS1.1 in the newer drivers.

I find the other arguments that the card is made to 'do too much work' at certain tasks just assinine. The skinning in the vertex shaders is an example where the FUD is given that it's skinned too many times each frame.

DFrey
02-16-2003, 11:27 AM
The only things I don't really like about it is the bizarre stencil shadow method used, which in combination with the PS 1.1 fallback creates an unjust penalty. No actual game would ever do that. In the real world you would do whatever you could reasonably do to get rid of unnecessary tris. And the other point is the marketing tag line Futuremark is using, "The Gamers' Benchmark". Which is obviously supposed to imply that it is a gaming benchmark. But it clearly isn't, it's synthetic.

dorbie
02-16-2003, 02:23 PM
Well anything that does multipass fallback will create a penalty for the hardware with lesser single pass capability, but it's linear with the number of passes. All you can say in this case is it's geometry heavy. That means the transition from one bottleneck to another trends at a higher resolution. It's not an unfair disadvantage. This is what real games do when they implement advanced shaders. When 1.4 is a subset of 2.0 you might ask why 1.4 was such an issue.

[This message has been edited by dorbie (edited 02-16-2003).]

dorbie
02-16-2003, 02:49 PM
Sure a fixed camera track might allow optimizations, but if you don't make them what's the difference?

The rest has been discussed to death already. They say they're a synthetic benchmark. There is a place for an unadulterated *graphics* test. This triangle projection dispute all boils down to them drawing "too much stuff" which is complete horse****.

[This message has been edited by dorbie (edited 02-16-2003).]

PH
02-16-2003, 03:11 PM
The problem has to do with skinning and shadow volumes done with vertex programs, for the multipass fallback case. You would end up doing all the skinning and shadow volume extrusion work for each pass.

On single pass hardware, VP based skinning and shadows might be a good idea though.

dorbie
02-16-2003, 03:20 PM
Yes I know this. The 1.1 path falls back on multipass and redoes skinning etc, (although not extrusion, that's not multipass), the extrusion pass requires skinning too. This is not a problem it just means the geometry stage is busy. It is highly misleading for anyone to claim this is a foul ball because something gets reskinned each time it is transformed. That's the chosen implementation. At one level skinning is merely representative of an intensive vertex program. PS 1.4 implementations have an advantage over 1.1 implementations and they SHOULD in an advanced shader test. This is completely fair. If someone has a PS2.0 implementation and doesn't support PS1.4 that's their lookout. Wheeling out the marketing guns and taking aim at Futuremark is a stretch.


[This message has been edited by dorbie (edited 02-16-2003).]

PH
02-16-2003, 03:52 PM
Ah yes, only the skinning is done multiple times of course ( I was going to edit my post http://www.opengl.org/discussion_boards/ubb/smile.gif ). The shadow extrusion work is obviously seperate.

I've recently read the response from 3Dmark and some of the arguments seem resonable. I did find something confusing - in the CPU tests, are they using CPU executed vertex shaders or did they actually write optimized code for this ?

Elixer
02-16-2003, 05:57 PM
Originally posted by knackered:
Downloaded it, and DX9.
It's pretty crap. I like the final scene with the tombraider/troll things - really like the pixel shading.
The flight sim bit is terrible - the landscape appears to be a single quad textured with a photograph. Has anyone actually seen IL-2 Sturmovik? If you want a flight sim benchmark, then that's the way to go.


176mb?!!! Do I look like a bitch?!!

Let me be the first to call you on this... Bitch! :P lol

For screen shots, wasn't this the same argument from the voodoo 2 days? The correct way of doing this, is you capture what the user sees, nothing more nothing less. http://www.opengl.org/discussion_boards/ubb/smile.gif

dorbie
02-16-2003, 06:30 PM
PH, I dunno what's going on in that test. It may just be Microsoft's emulation layer.

As for shadow extrusions, I'd expect it to skin then extrude (it'd have to) but it won't break into multiple passes irrespective of PS version. It'll always happen only once per light source.

pixelpipes
02-17-2003, 11:52 AM
Quiz: how many pixel pipelines does the GF-FX have? Hint: don't believe everything you are told...

JackM
02-17-2003, 12:12 PM
Quiz: how many pixel pipelines does the GF-FX have? Hint: don't believe everything you are told...

Quiz: should you believe "hit and run" posters pretending they know something based on rumors from reputable sites such as Inquirer?

Hint: No, you shouldn't http://www.opengl.org/discussion_boards/ubb/smile.gif

Oh, and what is this have to do with 3DMark 03?

pixelpipes
02-17-2003, 12:34 PM
Originally posted by JackM:
Quiz: should you believe "hit and run" posters pretending they know something based on rumors from reputable sites such as Inquirer?

Hint: No, you shouldn't http://www.opengl.org/discussion_boards/ubb/smile.gif


It is not pretending or based on rumors, its based on hard facts.



Oh, and what is this have to do with 3DMark 03?

NVIDIA:
"Unfortunately, Futuremark chose a flight simulation scene for this test (game 1). This genre of games is not only a small fraction of the game market (approximately 1%), but utilizes a simplistic rendering style common to this genre"

Read: fillrate for single textured ("simplistic rendering style") is low, because it has 4 pixel pipelines with 2 TMUs each, and not 8 with 1 each

zeckensack
02-17-2003, 01:09 PM
If you (or your 'sources') base this assumption on the 3DM fillrate tests, let me tell you that they're the worst fillrate tests imaginable. Their so-called ST-fillrate is a bandwidth test, if anything.


[This message has been edited by zeckensack (edited 02-17-2003).]

dorbie
02-17-2003, 03:06 PM
You mean fill performance is bandwidth limited?!!!!!

Old Chinese proverb say, "When the finger points at the moon, the fool points at the finger.".

zeckensack
02-17-2003, 04:12 PM
They have blending and full z testing active in the single texturing test. I would have understood the z clear/z test/z writes, but blending? Hell no. That's not how you'd want to measure pixel fillrate.

3DMark2k1 shares this 'anomaly' btw.

dorbie
02-17-2003, 04:48 PM
This is what deep multi-stage pipelines are for :-). Sorted coarse z makes many other tests less interesting.

[This message has been edited by dorbie (edited 02-17-2003).]

cass
02-17-2003, 07:05 PM
Originally posted by dorbie:
Unless there were something bad like foolish dispatch, graphics stalls or use of irrelevant slowpath stuff there's no good reason for dissing it. I see no evidence of any of these.


This inserting tons of degenerate polygons and performing projection in the vertex program - coupled with multi-boned matrix palette skinning (of both the position and the face normal) is extremely inefficient.

It probably jumps back and forth between being bottlenecked in transform and fill with all units *way* below their peak throughput, and it's not the way a real game developer would implement shadow volumes.

Benchmarks drive hardware and driver development because high scores are essential. Good benchmarks benefit the consumer - usually because they measure things that games do - or because they are actually a game or application. Bad benchmarks hurt consumers because push technology in the wrong direction.

So in the end we spend effort educating people on what's wrong with bad benchmarks or we spend hardware and engineering resources on optimizing for bad benchmarks.
This approach for stenciled shadow volume rendering makes for a Bad Benchmark.

Thanks -
Cass

dorbie
02-17-2003, 07:26 PM
Nvidia DEFINITELY used to recommend this approach to shadow projection, but that aside this does not qualify as the kind of mitigating factor I was talking about. It is a legitimate way of loading and testing the geometry transform. It is misleading to claim these are degenerate, after projection they are not, they are only degenerate to supply the verts for the projection process and side wall creation.

The most you can say about this is the vertex program is atypically long after skinning and projection, but then it's not one of the combinatorial passes. Your complaint is a huge stretch, the paths this exercises are used in games and in particular the 1.4 shaders that fall back on combinatorial 1.1 multipass are completely legitimate. Apparently this is the main issue for NVIDIA drivers rather than the single pass shadow projection. Again it is deceptive of someone there to complain that the model gets skinned so many times when this is part of the vertex program and the reason it does is a fallback to multipass because 1.4 support is not reported. It get's skinned many times on all hardware and on poorer shading hardware it get's transformed more times. The only thing it means in practical terms is the passes are geometry heavy. Any hardware that required multipass would take the appropriate hit for this.

Bottlenecks and where they happen would depend on resolution in this case as in others. I could make the same riddiculous throughput comment about any piece of software I run. Benchmarks and real games & applications are always below their peak throughput. I'll leave 'peak throughput' tests to benchmarks like 'BenMark5'. The observation about effort spent tuning games vs benchmarks is a bit too convenient and idealistic for me even though I do appreciate this in principal, this benchmark is not that distorted a test, it probably plays well to the peanut gallery though.

Each episode in this spectacle of NVIDIA trying to discredit Futuremark seems to bring an new excuse.

You're welcome.

[This message has been edited by dorbie (edited 02-17-2003).]

cass
02-17-2003, 07:45 PM
Originally posted by dorbie:
Nvidia DEFINITELY used to recommend this approach to shadow projection, but that aside this does not qualify as the kind of mitigating factor I was talking about. It is a legitimate way of loading and testing the geometry transform. It is deceptive to claim these are degenerate, after projection they are not, they are only degenerate to supply the verts for the projection process and side wall creation.



EVERY edge has an extra quad inserted. Edges that are not on the silhouette (most of them) *are* degenerate. Do you understand how pathologically inefficient this approach is? It's around 6x the geometry and it has to be re-skinned for each light source. It doesn't load the geometry engine consistently, because sometimes (not always) it'll get backed up with fill. What do you think it does test?

We have discussed this technique in the past, but would never have advocated its use as the sole SSV method for any game engine.

dorbie
02-17-2003, 07:53 PM
Absolute efficiency is not the issue. The point is to measure performance equally on all platforms. If this deviated dramatically from some reasonable paths I'd tend to agree but I'm not in agreement that it does. Few things load the geometry equally, the whole business of decent implementation is maintaining throughput with FIFOs, I'm only mentioning this because this is where you took us. One thing it might measure within the confines of your example (since you asked) is a cards ability to sustain throughput under the varying conditions you describe. I'd have expected NVIDIA hardware to do rather well at this.


[This message has been edited by dorbie (edited 02-17-2003).]

dorbie
02-17-2003, 08:03 PM
P.S. I was under the impression that all triangles might be getting projected not just those on the silhouette. For me this additional lack of efficiency would make the test more valid. I wasn't aware that only silhouette degenerates were projected. I'm still not sure that's the case. This would place an undue emphasis on degenerate regection IMHO, although it would obviously be more efficient. Neither is a show stopper IMHO but I'd preffer it if the test were using the less efficient method. But the selective projection is definitely closer to something you could legitimately find in a live game w.r.t the ratios of stencil fill load for the overall algorithm.

[This message has been edited by dorbie (edited 02-17-2003).]

pixelpipes
02-17-2003, 08:09 PM
Originally posted by zeckensack:
If you (or your 'sources') base this assumption on the 3DM fillrate tests, let me tell you that they're the worst fillrate tests imaginable. Their so-called ST-fillrate is a bandwidth test, if anything.


I base this on fillrate tests I program and perform myself. I take care to make them NOT bandwidth limited by disabling Z test and Z write, and applying a tiny texture to a huge polygon. BTW, note there isn't a message here denying what I wrote.


[This message has been edited by pixelpipes (edited 02-17-2003).]

cass
02-17-2003, 08:16 PM
This SSV rendering engine deviates dramatically from reasonable paths.


I don't think it gives NVIDIA or ATI any particular advantage today. I just don't think it's a good target for hardware and driver optimizations for the next 2 years.

cass
02-17-2003, 08:25 PM
Originally posted by pixelpipes:
BTW, note there isn't a message here denying what I wrote.


Do you have any tests that do Z/stencil only (no color buffer writes)? If so, how many pixels per clock do you get for those tests?

Thanks -
Cass

dorbie
02-17-2003, 08:29 PM
This seems to be at the heart of the problem. The definition of a reasonable path with flexible programmable hardware. In this case it now seems to boil down to the silhouette hull projection method (the P.S. version issue seems to be mute now), and rather than it being off some favoured hardware path the debate is over ratio of one type of work vs another. The individual paths seem reasonable on the face of things. It's the algorithm exercised that offends more than the hardware path and the algorithm is the same for all platforms.

zeckensack
02-17-2003, 11:45 PM
Originally posted by cass:

EVERY edge has an extra quad inserted. Edges that are not on the silhouette (most of them) *are* degenerate. Do you understand how pathologically inefficient this approach is? It's around 6x the geometry and it has to be re-skinned for each light source.OMG. They're nuts.

dorbie, how's hardware supposed to handle this? I mean, graphics architectures should be somewhat balanced, shouldn't they?
A 6x oversize triangle setup is not something I'd like to pay money for.

I agree that the testing method is equal for all vendors. I agree that it's polygon intense, and that's something you want to benchmark.
But if that's what they wanted to do, they just could have mande a HPC test like they used to, and kept the 'game' tests sane. With my meager understanding of graphics hardware, you could probably do equally well in this test with a brunty 36 tris-in-flight 6x triangle setup, 3 verts per clock transform engine and a f***ing single pixel pipeline. I don't fancy stupidity being rewarded, and that's what 3DM seems to do.

dorbie
02-18-2003, 12:42 AM
It's geometry heavy yes(for that part of the algorithm), it's not invalid. Practically my first post was if you want doom3 performance run doom3. This is still a valid benchmark. Different games databases and resolutions have significantly different responses to geometry vs fill.

No they are not nuts. They've made a deliberate decision to load the graphics pipe instead of the CPU.

It's still not clear that non silhouette edges aren't projected, not that it matters you're talking about intense geometry vs intense stencil inc/dec. Your final statement is an indication of the hysteria that's been caused by all this bull****.

FYI- when Carmack tried this method he said it was a wash with the CPU approach in his .plan, with the advantage that it freed the CPU. He opted for the CPU driven approach to allow further beam tree optimizations.

[This message has been edited by dorbie (edited 02-18-2003).]

Jurjen Katsman
02-18-2003, 02:20 AM
I think the sudden bashing of this stencil volume algorithm is a little surprising. It has been suggested by various IHVs in the past, and it does offload all work to the GPU, which in actual games is often a good thing.

Games tend to be CPU limited, adding a lot of stencil volume generation onto the CPU load is only going to decrease performance.

Shadow volumes are usually fill limited, it's extremely rare for them to be geometry limited. The vertexdata for such volumes is also extremly lightweight, so while adding some bandwidth, it's not extreme, certainly not compared to the bandwidth the extruded shadow volumes will use up.

It certainly is a valid concern that you might lose parallism between the transform and the fill, but I'd assume the hardware is parallel enough to atleast be able to cull a fair amount of degenerates while drawing any actual extruded triangles (or capping triangles). (Considering how many degenerates the NVidia triangle stripper introduces, they shouldn't be all that bad at handling that.)

Ofcourse, extruding a 60000 polygon sphere would be incredibly inefficient (on the GPU), but that's hardly typical game mesh. I can also see that including complex skinning as part of this might be pushing things to far, but for rigid mesh I'm not so sure this is a bad idea.

I am aware of major games that are very likely going to ship using this technique, and I can't blame them. A cpu based algorithm that is capable of saving a large amount of fillrate, sure, but using the CPU to reduce geometry load for shadows? Naaah.

Ysaneya
02-18-2003, 03:40 AM
Shadow volumes are usually fill limited, it's extremely rare for them to be geometry limited.


Well, the real thing about extruding non-silhouette edges is not the increased amount of geometry, since as you say shadow volumes tend to be fill limited anyway, but really the useless fill-rate overhead, since for non-silhouette edges, the stencil values will cancel each other.

Y.

knackered
02-18-2003, 03:46 AM
Nah, CPU speeds have shot through the roof while the demands that games put on them has hardly changed over the last couple of years - still got basic AI, still got convex hull collisions, even rigid body dynamics isn't being used all that much, mainly because it can easily get out of hand computationally and in terms of level design.
So long as the GPU doesn't have to wait on the CPU everything will run far faster than leaving all this geometry processing up to the GPU. Admittedly, shadows done with stencil volumes are going to be used only with simple meshes...but that's why they're crap. They even look crap in doom3 in my opinion - it's the perpixel lighting (geometry reduction) that gives the wow factor in my opinion, not the hard stencil shadows.

BTW, I can't believe they're re-skinning the mesh for every light pass - that's such a waste in anybody's book. It wouldn't be done that way in todays or tomorrows games. It's a mad mixture of tests that couldn't possibly produce a meaningful benchmark result (a vertex shader bone skinning test, yes, but how many lights?!! ??) - I certainly couldn't use that 3d mark score to tell me anything about the card that produced it without the source code to the benchmark itself - so it's worthless to me at least.

Jurjen Katsman
02-18-2003, 04:00 AM
Ysaneya: I'm assuming they're actually only extruding the silhouette edges. Anything else would be completely absurd. This also seems to be what the NVidia guys are complaining about, the geometry overhead and the overhead caused by the degenerates.

Knackered: I agree, CPUs have gotten very fast. (Although many games might keep the XBox in mind as well). But even with that, it's rare to see games which aren't mostly bottlenecked on the CPU, especially on lower resolutions. I don't expect that to change anytime soon.

Fundamentally not a lot has changed in how games do things (although advanced physics are used more and more), but simply the amount of stuff that's moving around and has all sorts of sensors and probes is increasing dramatically.

Another reason that the CPU ends up being the bottleneck is that it indeed is often percieved as a limitless resource, and a lot more people on a team tend to write code that runs on it. If you have 5 programmers writing game systems, and 1 programmer writing the graphics system, the game programmers will use up more CPU resources, simply because they can produce so much more code http://www.opengl.org/discussion_boards/ubb/smile.gif Sounds insane, I know, but in practice it often appears to be true.

stefan
02-18-2003, 04:06 AM
Interesting! There's an article on gamedev.net called "The Theory of Stencil Shadow Volumes"
( http://www.gamedev.net/reference/articles/article1873.asp ) that also uses the approach with the degenerated quads. It's conclusion: "... and performance is generally much better than non-shader implementation". Guess he was just kidding http://www.opengl.org/discussion_boards/ubb/wink.gif

PH
02-18-2003, 04:58 AM
Besides the fact that you need 6x the geometry to generate shadow volumes on the GPU, you also lose out on some important optimizations that are only possible with the CPU, such as culling of caps. These do save fill and improve overall performance for other reasons. Without two-sided stencil testing, GPU shadows would also need to be generated twice.

In any case, shadow volumes will NEVER be generated on the GPU for level geometry ( I'm not sure if the 3Dmark demo does this ). At least if performance is of concern.

In my opinion, it's the technique that *overall* gives the best performance on a wide range of hardware that stays http://www.opengl.org/discussion_boards/ubb/smile.gif. Using a combination of GPU and CPU seems to be the most efficient ( for example, silhouettes extracted on CPU, extrusion done on GPU ).

Jurjen Katsman
02-18-2003, 05:18 AM
What various optimisations would that be? I admit I'm not exactly familiar with all the various methods available to optimize volumes on the CPU. Any links would be appreciated.

It seems that culling the caps is something you could still do when using the GPU, as long as you send the caps using a seperate drawelements. Or would it require information created during the CPU shadow volume generation?

I indeed do believe that, given a fast enough CPU, CPU based shadow volumes can become a win, as long as they end up actually optimizing the fill requirements.

ehart
02-18-2003, 05:19 AM
First, I am not speaking for/as ATI for this post. I have no special knowledge of anything 3DMark related, nor have I even run the benchmark personally.

I had really been hoping this thread would just die, but since it won't here goes...

The idea that skinning a model mulple times in a frame to handle multipassing is inefficient and should never be done is misguided. I used to hear people get concerned about transforming vertices multiple times on HW T&L cards. The fact is that anything that could be done to avoid a multple transform was likely to be no faster. If the cross over hasn't happened yet, I expect it will eventually, especially since skinning each time allows fewer updates of the vertex memory. I am not saying everyone should do it, but that they shouldn't be afraid of it, and that they should find the right answer for their own app. (For a benchmark that wants to streatch the graphics card, this might be the right thing.)

Next, it was mentioned previously that CPU's are very fast, and that CPU limits are not a problem. I would say that most games today at 10x7 or below are at least somewhat CPU limited. The real issue isn't always processing cycles like AI, but it can also be memory bw pushing large amounts of data around. Increasing CPU work only makes this worse.

I will agree with Cass that the shadow volume technique as described for use on the characters is pretty expensive, especially fill-rate wise. I don't however believe that makes it any less valid as a test. I can see two advantages to doing things the way they seem to be. First, it could be used as a way to simulate more shadow fill rate, and accentuating that portion of the test. Additionally, this method while fill inefficient is just about the only one I am familiar with that allows them to avoid CPU overhead and produce a correct shadow volume on a skinned character. Hacks to determine sillouettes on the GPU when skinning don't tend to work.

As for paying to get into a beta program. Well, it does give the sense of bias. The only thing is that this is typical with other software or hardware. MS doesn't give free copies of all its OS's to every SW company on the face of the planet to allow them to optimize for the next release.

Declaring a benchmark as bad due to inefficiencies is just silly. It is completely impossible to predict what some random game might do between now and the time the next version of 3Dmark is released. As a matter of fact, a lot of apps do really stupid things, like twiddle state unnecessarily, or not do a great job view frustum culling. To simulate a real app, you almost need to screw some stuff up.

Finally, my most important point. If you as an ISV dislike how a particular synthetice benchmark works, the best thing you can do is to make your own software useful as a benchmark. Lots of games don't include all the things that make them reliable benchmarks. You should build in a framecounter, and the ability to time over a period of time. You need to be able to play back some automated sequence, typically both at real-time and also such that every system draws the same frames. You want to make the results repeatable. You want to allow the user to automate running this from a command line. You want to allow the results to be logged to a file. You may want to allow it to play a single frame in isolation with little or no overhead from collision detection or AI. These would make a good start at making your app a better benchmark. They also will likely help you in tuning the app.

-Evan

PS, I forgot one on the benchmark list. You want to sell 10 million copies of it. Of course you never know when this will happen, so it is best to put that stuff in ahead just in case.

dorbie
02-18-2003, 08:15 AM
The method described by Cass is not excessively fill rate limited as he claimed that non silhouette tris remain degenerate, this was the keystone of his comments, and it meant the opposite. If they are all projected then all you're complaining about is the ratio of sencil inc/dec fill vs the fragment lighting fill, as I've said all along. The ratio of geometry to fill would tend to weight the shadow hull pass more in the test.

The idea of keeping geometry resident on the card and applying all transforms locally, freeing up the CPU and reducing the bandwidth across the principal bottleneck, the AGP bus, is a good one. Especially in a graphics benchmark. Any per frame CPU task defeats this, hull projection is one but skinning is a bigger one because it affects all passes, although you could try to lock and send once.

Reskinning is inefficient, but, again, inefficiency alone is not the issue (ignoring bus traffic for now). The idea is to load graphics and see what kind of results you get. The fact that skinning is being done multiple times is not the point. That a complex and interesting vertex shader is being applied during transform IS. It's a complete red herring to say it skins multiple times. It runs an interesting vertex program during all transformations is the correct observation.

I agree with the algorithmic efficiency observations, it's just not as relevant in a benchmark intended to load graphics as it would be when you're tuning a game.


[This message has been edited by dorbie (edited 02-18-2003).]

pixelpipes
02-18-2003, 10:58 AM
Originally posted by cass:
Do you have any tests that do Z/stencil only (no color buffer writes)? If so, how many pixels per clock do you get for those tests?
Cass

Normally I have Z test disabled, but Z write enabled, and of course also color write.
Enabling Z test will invoke the 'early out' tests, which are done per tile, thus screwing the measurement.

I tried it with Z write DISabled, and the result is the same. (equivalent to NV25 with appropriate GPU clock ratio boost)

If you are hinting at memory bandwidth limitation, I don't see the logic here.
With 1GHz memory and 128 bit bus, you have 4 Gpix/sec if you are writing either only RGBA
(32 bit) or only stencil/z (24+8). But disabling Z write didn't increase performance.

But here is the strange thing:
With color write DISabled, Z write ENabled, and stencil test that does both read and write, the performance doubles. (glStencilFunc(GL_NOTEQUAL,0,-1);glStencilOp(GL_INCR_WRAP_EXT,GL_KEEP,GL_INCR_WR AP_EXT))
I have no explenation for this. Do you?
Is it some special optimization intended for the stencil shadow path?

mcraighead
02-19-2003, 06:23 PM
dorbie,

As for the screen capture, it's rather ill-specified how some of this is supposed to work. Microsoft has not ever specified, and is not likely to ever specify, how GDI interacts with AA. Worse, using the filtered image can cause serious problems, too; so "fixing" this may cause other problems. The driver behavior was not ideal, not by any means, but it would be hard to call it "wrong". Also, PrintScreen screenshots being inaccurate is nothing new. PrintScreen doesn't capture overlays, and in fact MS would probably have a fit if it did (copy protection and all)...

And I already told you that we *are* going to fix this, didn't I? So no need to get all worked up. I just wish certain web sites would, oh, post corrections or something... kind of like how newspapers often print corrections when they print news articles that have mistakes in them.


As for everything else...

If 3DMark's algorithm was part of a synthetic vertex program benchmark, that might be OK. However, it's not presented as such. It's presented as a gaming benchmark, and it's presented as even being comparable to what Doom3 does, and certainly as being a reasonably representative shadow-volume-employing game. However, it is grossly incomparable to Doom3, and it is definitely a very, very poor way for any game to implement shadow volumes.

This isn't in any way comparable to how some developers get all paranoid about "T&L doesn't let me output intermediate vertices". That was always a stupid complaint, and now it's even a false complaint -- there are ways to do this by using an appropriate vertex/fragment program plus techniques like PDR.

First of all, in this case, that previous problem is taken to an extreme. This is not some app that, say, draws a model with 2 rendering passes and has to skin twice; it uses a *very* large number of rendering passes, and the skinning program is complicated, *and* the shadow volume stuff is particularly bad in how much skinning it does.

Second, there are far better ways for the app to accomplish the same thing. (Skinning on the CPU is not the only one!)

But third, *because* the benchmark does *so much* stupid and wasteful work, the best way for us to win the benchmark down the road is to put in incredibly convoluted driver logic that -- in short -- detects everything stupid that the app is doing, and reimplements it using a smarter/faster algorithm, all behind the app's back. I'll leave thinking of the possible ways we could do this up to you; suffice it to say that implementing such code will help 3DMark03 and 3DMark03 *only*, at the expense of our time and at the cost of making our driver quite a bit bigger and buggier.

The dumber and more popular the benchmark, the worse the driver benchmarksmanship that is required. The more driver benchmarksmanship, the less valuable the benchmark.

And at least if it was a game, such benchmarksmanship would improve people's gaming experience. People who like Quake 3 engine games (and there are a bunch of those games) have undoubtedly benefited from the driver tuning we and ATI and others have done for the Quake 3 engine. I'm sure the same will be true for UT2003 and for Doom 3, and it's been true for other popular game benchmarks in the past.

But what gamer benefits from benchmarksmanship targeted at 3DMark03?

Bad but popular benchmarks hurt everyone. The primary issue is not whether 3DMark03 is unfair to NVIDIA or unfair to ATI or too geometry-heavy or whatever. (Although a benchmark that is unrealistically geometry-heavy could also be argued to be a bad gaming benchmark, too, and as promoting bad hardware design -- if no real app wants that much geometry power, then it forces us to spend too many transistors there.) The issue, as I see it, is benchmarksmanship. (Combined, in part, with Futuremark's business model.)

- Matt

FXO
02-19-2003, 06:47 PM
Seems like a very honest and good clarification of this issue to me.

I must admit that the GFFX dropped in value to me when I first saw the AA-comparison shots on www.anantech.com, (http://www.anantech.com,) it would be intresting to see a "real" comparison of
AntiAliasing.
-Nice to hear that the printscreen issue is a priority.

I have a question though:
Will OpenGL 2.0 and CG make your optimizations more general and easier to implement?

By more general I mean that more games that you have not targeted your optimizations at will benefit from them.

cass
02-19-2003, 07:43 PM
General-purpose shader optimization will benefit the shaders in all apps.

Driver optimizations that make the inefficient shadow volume rendering in 3DMark03 faster will likely not help anybody.

JMichaelWhi
02-20-2003, 10:40 PM
Mat,


In part, I think this is being blown out of proportion. From what I saw on the HardOCP preview, with our latest drivers vs. ATI's latest drivers, we were ahead a little bit. Obviously, ATI, being a "strategic beta member" and all, has had a lot more time with the benchmark than we have, so their older drivers could have been tuned for this benchmark already -- not to mention that they undoubtedly had more influence over the benchmark design. So if, despite all that, we can *still* win, we must be doing something right...

- Matt

Nvidia were strategic Beta partners for 16 of the 18 months Development Time for 3dmark03. A difference of only 2 months from the time you guys left and the release. This hardly supports your assertion that ATi somehow had more time, and was allowed a bigger input to the project.

It makes me pretty Uncomfortable that you would insinuate such things, knowing full well the total amount of time Nvidia had as a Beta Partner.

JMichaelWhi
02-20-2003, 10:46 PM
As for the screen capture, it's rather ill-specified how some of this is supposed to work. Microsoft has not ever specified, and is not likely to ever specify, how GDI interacts with AA. Worse, using the filtered image can cause serious problems, too; so "fixing" this may cause other problems. The driver behavior was not ideal, not by any means, but it would be hard to call it "wrong". Also, PrintScreen screenshots being inaccurate is nothing new. PrintScreen doesn't capture overlays, and in fact MS would probably have a fit if it did (copy protection and all)...

And I already told you that we *are* going to fix this, didn't I? So no need to get all worked up. I just wish certain web sites would, oh, post corrections or something... kind of like how newspapers often print corrections when they print news articles that have mistakes in them.

You are Talking about only the filter effects being applied to 2x FSAA. Which was publicly addressed on several of the reviewing websites after the initial findings. This does not cover the AA Quality of your higher levels including all your Blended modes (Xs).

Suggesting that websites were misrepresenting the overall quality of the Nv30's AA due to issues with Screen Captures is not Justifiable in my view.

dorbie
02-21-2003, 12:01 AM
Matt, I understand why you see things the way you do. I disagree but I don't have much to add to what I've written. I think all points have been exhausted.

pixelpipes
02-21-2003, 02:15 AM
I hesitate to add another post to a thread that has grown so long, but I think I really didn't get my question answered.
This thread has also addressed the GF-FX FSAA quality, so I think it wouldn't be more OT to ask if NV30 has 8 pixel pipelines as said all over the place, or only 4, like the NV25
So far, the only way I got it to behave like an 8 pipe machine is to turn off color and Z writes completely.

dorbie
02-21-2003, 03:48 PM
http://www.theinquirer.net/?article=7920

It doesn't fully explain the 8 pixels per clock though in some modes. Something more complex is going on.

[This message has been edited by dorbie (edited 02-21-2003).]

Chalnoth
02-21-2003, 10:06 PM
Originally posted by pixelpipes:
[BBut here is the strange thing:
With color write DISabled, Z write ENabled, and stencil test that does both read and write, the performance doubles. (glStencilFunc(GL_NOTEQUAL,0,-1);glStencilOp(GL_INCR_WRAP_EXT,GL_KEEP,GL_INCR_WR AP_EXT))
I have no explenation for this. Do you?
Is it some special optimization intended for the stencil shadow path?[/B]
As a side note, have you checked the rendering performance of an odd number of textures? If the NV30 actually does act like a normal "4x2" pipeline, then the performance of three textures will be roughly 3/4 that of the quad-texturing performance. If it runs at full speed, then it is more like an "8x1 architecture that can only output a maximum of 4 complete pixels per clock." I don't know how likely this is, but it shouldn't take very long to test.

mcraighead
02-22-2003, 01:23 AM
Originally posted by JMichaelWhi:
It makes me pretty Uncomfortable that you would insinuate such things, knowing full well the total amount of time Nvidia had as a Beta Partner.

Actually, funny enough, I had/have little to no idea what our level of interaction is. All I know is what the web page says...

- Matt

pixelpipes
02-22-2003, 09:03 AM
Originally posted by dorbie:
http://www.theinquirer.net/?article=7920
It doesn't fully explain the 8 pixels per clock though in some modes. Something more complex is going on.

Wow! I swear I didn't read this article until now (nor write it...).
As you say, something complex is going on, but we are not getting here the full picture. My findings match what the article says exactly. The only way to get 8-pipe behaviour is to turn off color writes completely (including alpha). Memory bandwidth is not the whole story, because turning off Z and stencil writes (which like color are 32 bit) , doesn't make it work that fast (8x clock).

But to be fair, this sentence:
"all games these days use Color + Z rendering. So all this Nvidia talk about the possibility of rendering 8 pixels in special cases becomes irrelevant."
is not completely correct -- stencil volume rendering.

pixelpipes
02-22-2003, 09:09 AM
Originally posted by Chalnoth:

As a side note, have you checked the rendering performance of an odd number of textures? If the NV30 actually does act like a normal "4x2" pipeline, then the performance of three textures will be roughly 3/4 that of the quad-texturing performance.

Why? I think it would be equal to quad-texturing. The 2nd texture unit is idle during the second "loop". But your principle is correct, if it has single texture unit per pipe, any increase in number of textures will reduce speed. If it has two texture units per pipe, only the move from 2 to 3 textures (or 4 to 5, etc.) will reduce speed.
I will check this out.

dorbie
02-22-2003, 12:34 PM
The article is only dated yesterday so it's clear you hadn't read it before now :-)

dorbie
02-22-2003, 12:40 PM
pixelpipes, could you run another test. I'm curious to see what the 2xAA hit is for z + color vs just Z. As expected the hit for z+color should be very low, but if a pet theory I have holds then the hit for just Z with AA may be very large (~100%), bringing fill performance back into line with z+color.

Please post results.

Chalnoth
02-22-2003, 05:49 PM
Originally posted by pixelpipes:
Why? I think it would be equal to quad-texturing. The 2nd texture unit is idle during the second "loop". But your principle is correct, if it has single texture unit per pipe, any increase in number of textures will reduce speed. If it has two texture units per pipe, only the move from 2 to 3 textures (or 4 to 5, etc.) will reduce speed.
I will check this out.
I was trying to infer that it may not be a 4x2 pipeline, but a more flexible structure capable of outputting 4 full pixels per clock. A test with three textures per pixel would bring to light whether or not it actually behaves like a traditional 4x2 architecture.

If the only deficiency is when it is running in single-texturing mode, then this really isn't much of a problem. I don't think many people are going to be running in 16-bit with an FX.

Update:
Whether or not it behaves as a 4x2 pipeline will probably depend on how well tasks are scheduled, and, possibly, any potential caches that exist before pixel color output.

[This message has been edited by Chalnoth (edited 02-22-2003).]

pixelpipes
02-22-2003, 08:59 PM
Originally posted by dorbie:
The article is only dated yesterday so it's clear you hadn't read it before now :-)
But I did get one reply a couple of days ago telling me not to trust what I read in the inquirer, I spent time searching for such article, and din't find it...

pixelpipes
02-22-2003, 09:06 PM
Originally posted by Chalnoth:

If the only deficiency is when it is running in single-texturing mode, then this really isn't much of a problem. I don't think many people are going to be running in 16-bit with an FX.


What does 16-bit have to do with it? And about how many people run single texture -- I beg to differ. What percentage of the fragments out there rendndered TODAY have <=1 textures bound to them? I would guess much more than 50%...

I will post the results that you and Dorbie asked for later

pixelpipes
02-23-2003, 12:10 AM
Dorbie: I think I see where you are going - the sample coverage hardware being use to update two Z+stencil's in one clock from the same pipe. However, the results don't indicate this: at 2 samples, rendering without color still doubles the performance. (even with Z AND stencil updates). Even at 4 samples, turning off color writes doubles the performance.

Chalnoth: the test results are like this: one and two textures have exactly same performance. Three and four textures also have euqal (lower) performance. What is strange is that 3or4 textures have 1/3 performance of 1or2 textures!

GT5
02-24-2003, 02:06 PM
According to [url]http://www.theinquirer.org/?article=7955]/url]

the main reason why nVidia was not happy about 3dmark03 was because the program used single texturing extensively in their game tests. And the fact that GeforceFX has 4 Pipelines and 2 Tex Units per pipeline.


[This message has been edited by GT5 (edited 02-24-2003).]

[This message has been edited by GT5 (edited 02-24-2003).]

dorbie
02-24-2003, 02:33 PM
pixelpipes, interesting. We may never know, and really who cares if you know how it performs?

GT5, I don't buy that. Many games use and will continue to use single texture, you often don't have anything to add beyond this, and 3DMark only used single texture stuff in some parts of some of the demos. Besides, it's clearly not the only factor based on statements from NVIDIA reps here and elsewhere. The story of why 3DMark is bad changes every time NVIDIA talks about it. They've decided they don't like this test and sought as much ammunition against it as possible. You can shoot holes in any benchmark if it's not a game as either unrepresentative or redundant. You can even shoot holes in most games for doing things badly in some way or another.

[This message has been edited by dorbie (edited 02-24-2003).]

GT5
02-24-2003, 02:44 PM
well the site claims that Futuremark replied to nVidia's comments in a 4 page long PDF file and not using multitexturing was one of the questions raised by nVidia.

dorbie
02-24-2003, 03:04 PM
GT5, I was objecting to the implication that NVIDIA cooked up all these other excuses because thay knew they were hosed by single texture fill and set out to systematically undermine futuremark as they covered their 8x1 vs 4x2 asses. While it's clear NVIDIA is looking for any mud they can get to stick on 3DMark, that would be a pretty low reason to do it. I don't really buy that, but maybe I'm naive. More generally I was also objecting to the claim that 3DMark uses single texture too much. Single texture is only used extensively in one game demo, and even then not exclusively, many games will continue to use this path for some stuff, it seems reasonable to factor it into any benchmark.

I can't divine the truth from this any more than you can. Your take on this is as valid as mine, I was just stating my opinion.


[This message has been edited by dorbie (edited 02-24-2003).]

Talisman
02-24-2003, 07:27 PM
I read somewhere that the graphics code in 3DMark03 was not optimized for anyone's chipset, in order to make the benchmark as impartial as possible. That is to say, the code being run is "standard" code (think plain old C as opposed to hand-tuned vectorized assembly loops for example), and should therefore run equally well on comparable hardware.

That being said, I wasn't as impressed with this new version of 3DMark, only because I had expected to get a much higher score (comparable to the score 3DMark2001 gives) than I did. Well, that and the stuff that used to be part of the free version that's not anymore. I'm assuming that the score business has to do with the fact that I'm running one of the first-generation DX9 cards (R9500Pro), and that the score will improve with future cards.

I think if you're going to make a benchmark, then it makes sense to use unoptimized code. That way you get a sense for raw performance. Sure, the algorithms could be made faster, but I don't think that's the point here.

zed
02-24-2003, 08:09 PM
speaking about money changing hands http://www.opengl.org/discussion_boards/ubb/smile.gif

>>-NVIDIA Corporation (Nasdaq: NVDA), the worldwide leader in visual processing solutions, today announced that the NVIDIA® GeForce™ FX graphics processing unit (GPU) received the distinguished Analysts' Choice award for the "Best Graphics Processor" of 2002, as selected by Cahners In-Stat/MDR, publishers of the Microprocessor Report<<
http://www.nvidia.com/view.asp?IO=IO_20030218_9274

Humus
02-25-2003, 02:11 PM
"Best Graphics Processor of 2002" for a card that's not available two months into 2003. http://www.opengl.org/discussion_boards/ubb/eek.gif