View Full Version : OT: GeForce3 slow?

05-11-2002, 08:39 AM

I just got a new GeForce3 Ti200. The fillrate compared to a GF2MX is really amazing but I'm wondering about the triangles per second rate. I was trying out NVidia's Benmark5 ( http://developer.nvidia.com/view.asp?IO=BenMark5 ) and I just got 17 million triangles per second. That's not much, I think that my old MX even got 24 millions.

Hmm, I think that must be a problem with AGP. Is there a possibility to see in windows xp if 4x AGP is enabled for the card? I remember that I had a similar problem with the gf2mx where I had to enable agp manually in the driver settings but I don't know how to do that in xp with the standard NVidia drivers.
Or could someone with a GF3Ti200 please try this BenMark5 and tell me how much tris/sec he gets?

Thanks a lot in advance and sorry for posting an off-topic question


05-13-2002, 01:40 AM
Tested on a GF2 MX/MX 400 and I get almost exactly 16million (a smidgeon less on average). I'll test on a GF3 if I remember when I get home.

05-13-2002, 10:09 AM
Doesn't the CPU & other system specs also play a role in this? (not to mention what screen res, bitdepth, and all that good stuff is)

05-13-2002, 10:39 AM
Don't worry about it. Neither of your GeForces will likely reach that triangle throughput. In real-world conditions, your GeForce 3 will come out ahead, since it has less of a bandwidth bottleneck.

05-13-2002, 10:52 AM
Korval, the guy has a GF3, he wants a test not an opinion, I know the feeling because I've had issues with AGP 4X in the past :-).

Elixer, check out the test.

[This message has been edited by dorbie (edited 05-13-2002).]

05-13-2002, 10:59 AM
I get almost exactly the same result with the GF3, it's not a Ti 200, just an original GF3, but both the GF2 MX400 and the GF3 seem to get almost exactly 16M tris with this test.

05-13-2002, 11:08 AM
What CPU is that on Dorbie ? I get 20M on Radeon 8500 ( P3 500, AGP2x ).

I'll try it on a standard GeForce3 ( Athlon 1400 AGP4x ) tomorrow.

I ran the test on the GeForce3 setup and got 16.4M ...

[This message has been edited by PH (edited 05-13-2002).]

05-13-2002, 11:20 AM
BTW, I had problems with AGP 2x vs 4x on 98SE, now I'm on XP it seems to be reporting the 4x I set in my bios. I use the free version of the SiSoft Sandra benchmark utilities to check my system. I also use wcpuid3 which tells me the command mode of AGP I'm in instead ow what's supported. It reports 4X but says fast writes and side band addressing are disabled. Hmmm... let me check my bios here.

05-13-2002, 11:24 AM
PH, I tried the GF2 MX on an 800MHz PIII and the GF3 on a 1900+ Athlon. CPU is not the issue in this benchmark. I'm going to check if I have fast writes disabled in my bios, I'll run again and get back to you. Your GF3 results are in line with mine, just a smidgeon over 16M/sec.

[This message has been edited by dorbie (edited 05-13-2002).]

05-13-2002, 11:33 AM
Well I looked and fast writes are enabled in the bios. The WCPUID chipset utility reports it as supported but disabled. More annoying chipset/driver/graphics card quirks.
I'm sure I checked this a while back and it was reported as enabled.

05-13-2002, 11:33 AM
I have fastwrites enabled on the Athlon system ( at least that's what I specified in the BIOS. My P3 unfortunately doesn't support fastwrites.

What about the AGP aperture size - would that be an issue in this benchmark ? It gave quite a boost in the CodeCreatures benchmark ( I changed the settings from 64 to 256 MB ). Probably due to the large number of textures ... ?

05-13-2002, 11:38 AM
Shouldn't I don't think, there's not a lot being drawn.

05-13-2002, 11:44 AM
PH, could yo see what wcpuid3 reports in the chipset section for fast writes on your Athlon system?

05-13-2002, 11:56 AM
Hmmm... all disabled. Even my P3 has side band addressing enabled.

That's what wcpuid3 reported ( though it also said it was supported ).

I remember something about this having to be enabling in the registry for GeForce3's ( I think ).

[This message has been edited by PH (edited 05-13-2002).]

05-13-2002, 01:21 PM
I benchmarked my two systems., this is what i get :

Amd 1300 GF2 mx 400 i got around 19M.
Amd 800 Gf3 , i got around 27M.


05-13-2002, 01:24 PM
Puh, so my GeForce3 isn't slower than all its brothers out there....

Thanks for testing!!

05-13-2002, 02:56 PM
I think you posted too soon labas :-) You missed the last post.

I think we need fast writes enabled. I wasn't quite trusting of my reporting tool, but the results speak for themselves. Something is wrong.

Bruno, what drivers?

05-13-2002, 05:46 PM
I just read about this the other day when I noticed FW/SBA disabled even though I'd enabled them in the bios.

Supposedly, even if enabled in the motherboard bios, most (retail/OEM?) NVidia cards don't support this by default because it can reduce stability. The solution is to flash the vid card bios, but this comes with warnings, since it's a risky process. I didn't go into much depth reading about the process and I haven't had the time to think about doing it myself, but I will in the next few weeks probably.

I found a tool just now via google that may help. Haven't read about it much either but there seems to be a fair bit of information:

If anyone is game to try this, let us know how it goes.

Hope that helps.

05-13-2002, 08:10 PM
Using the default 1024x768x16, and with fastwrites on, sideband disabled using 28.32 drivers I get 22.64 M tri/sec on a Duron 800 + gf2.

Using 640x480x16 I get 22.79M tri/sec, and for a laugh, I tried 320x200x16 and I get 22.97 M tri/sec. http://www.opengl.org/discussion_boards/ubb/smile.gif

Oh, if you want to play around with fastwrite and SBA, goto http://www.geforcetweak.com/ They have a util to toggle lots of options, works with latest drivers.
Also, I think ffish is correct, SBA is disabled by the bios of the vid card, since it caused more problems than speed gains.

05-13-2002, 10:46 PM
On the GF3 machine i'm using 28.32 drivers.
Fastwrites and AGP4x is on, and the last VIA drivers are used.
On the GF2 machine, i don't remember exactly which drivers was, but it was from this month, 29.xx something.

Try to update your VIA, maybe you get lucky


05-14-2002, 12:05 AM
Yes, dorbie, I've been a bit too early lucky about my GF's speed... http://www.opengl.org/discussion_boards/ubb/wink.gif

I've just updated my Via drivers, but it doesn't make any difference...

05-14-2002, 05:42 AM
There's a section in the GeForce FAQ about enabling fast writes ( it needs to be enabled in the Windows registry ),

I just tried this - wcpuid3 reported fast writes as supported and enabled ( reboot was required after using the .reg file ).

The performance on the Athlon system did not change strangely enough.

05-14-2002, 08:31 AM
I might be wrong, but I read something ages ago, saying nvidia drivers refused to use SBA/fast-writes on AMD systems, regardless of how it was setup, due to stability problems.

Although, turning fast-writes on, on my machine causes a guaranteed lockup in about 10 minutes.

I'm quite confused as to the benmark results. 24million on a GF2 was it? I only get 38million on my gf4. I thought it would be loads more, but it wasn't. That was tri-strips with AGP4x.


05-14-2002, 08:57 AM
A somewhat related tech tip: if you're having problems with AGP in windows 2000, be sure to install the first and second service packs (at least). I was having completely bizarre texture glitches in just a few programs, repeatedly, and neither installing new graphics drivers nor tweaking the settings nor completely reinstalling the OS did any good, but installing the service packs fixed the problem.

05-14-2002, 11:41 AM
I thought I'd have to wait longer to see the word "only" and "38 million" used together.

05-14-2002, 02:10 PM
> I only get 38million on my gf4

Try sending vertices that only have x/y/z, each as a 2byte SIGNED_INT. Once you turn off rasterizing and load the simplest possible vertex shader (or use the fixed pipeline in vanilla mode) you may be vertex transfer bound, at which point sizeof(vertex) starts to matter.

05-14-2002, 06:45 PM
OK I used the geforcetweak utility here http://www.geforcetweak.com/ to enable fast writes and a few other goodies and I now get 27 million where I used to get 16 million on my geforce3. The registry patch for fast writes didn't do it for me. Thanks for the advice.

I also benchmarked a GeForce4 Ti4600 and it only got ~17 million. Obviously it needs the tweaks too.

Bruno, I'm curious, which OEM supplied your card, and did you apply any tweaks or registry edits beyond installing drivers? What's your modo?

[This message has been edited by dorbie (edited 05-14-2002).]

05-14-2002, 07:12 PM
I have an ASUS GeForce3 but I have never used their drivers ( always NVIDIA's reference drivers ). Maybe that's one reason to start using these drivers ( I doubt it but just maybe ). I'll try the tweak utility too and see if that helps.

05-14-2002, 11:25 PM
I thought I'd have to wait longer to see the word "only" and "38 million" used together.

Yeah, but it's a far cry from the 100-odd million it can supposedly do. Given that the optimal tri-strips should equate to 1 tri per vertex, it only had a vertex throughput of 38million also.

I can only assume that Jwatte is correct and I'm transfere bound.

Anyone know if switching from SDR to DDR main ram has any impact on AGP throughput?

I see to re-call a thread where someone was trying to achieve the theoretical figure of the GF4's but they changed the method of attaining it from GF3 to 4. Which IMHO is quite dirty. For the same reason that I loath CD-rom manufacturers! 50x speed!! yeah right!! But dont get me started on that. http://www.opengl.org/discussion_boards/ubb/smile.gif


Jurjen Katsman
05-15-2002, 01:02 AM
You want 100-something million? Try this:

- Use trianglestrips
- Use VAR
- Make all indices the same


This will give you the maximum number of triangles the trianglesetup unit (in this case better called 'backface/degenerate culler') can process.

Tom Nuydens
05-15-2002, 01:28 AM
Using BenMark on a GF4 I only get about 37 million. I've been able to get 54 million using my own OpenGL-based test.

-- Tom

05-15-2002, 10:21 AM
Originally posted by Nutty:
I see to re-call a thread where someone was trying to achieve the theoretical figure of the GF4's but they changed the method of attaining it from GF3 to 4. Which IMHO is quite dirty. For the same reason that I loath CD-rom manufacturers! 50x speed!! yeah right!! But dont get me started on that. http://www.opengl.org/discussion_boards/ubb/smile.gif


What, you don't like having 50x speed for the last 10 secs of a CD? http://www.opengl.org/discussion_boards/ubb/wink.gif
At least the P-Cav drives are starting to appear now.

DDR system ram vs SDR system ram does play a role with AGP throughput, since AGP4x +PC133 = ~1066MB/sec, and DDR should be close to ~1200MB/sec (or more?)

I think that Fastwrites should be on by default for all ref. drviers, and that includes both nvidia, and ATI cards, and I recall nVidia saying, to force sidebanding and fastwrites to coexist, is not only redundant, but may present stability issues and performance loss.
Besides, SBA is only AGP2x, not 4x right ?

05-15-2002, 11:28 AM

>Bruno, I'm curious, which OEM supplied your >card, and did you apply any tweaks or >registry edits beyond installing drivers? >What's your modo?

My card is an Asus., it was one of the first to show up in the market., i got it a Taiwan store in July 2001 or something...,
Yeah, i use the tweakutility too.
At first i remember i was really worried because my 3dMark was really slow, and i wasn't even able to do the tests because somehow i couldn't get Direct3d to work correctly with it., but as drivers started to appear, i got it working.


Moshe Nissim
05-15-2002, 11:28 AM
Originally posted by Tom Nuydens:
Using BenMark on a GF4 I only get about 37 million. I've been able to get 54 million using my own OpenGL-based test.

-- Tom

Yes, I too got around 60 M polys/sec , with display lists and large triangle strips. Of course this is also 60 M vertices/sec. With VAR and 'tight'mesh you can get 'more' because of vertex re-use and the post-T&L cache.

05-15-2002, 11:36 AM
edit: <this is part 1>

The methos of attaining peak performance might change depending on the absolute performance level. The API has to evolve to make new performance levels possible as the older hardware bottlenecks are eliminated, and to address situations like dispatching many smaller primitives.

edit: <this is part 2>

I'm not very impressed by NVIDIAs silence on this thread. This seems like a dirty secret and they've only reinforced that impression by offering no help here when they must have known what the issue was from post one. They're just washing their hands and hoping that nobody will notice the huge performance deficit as they market the peak numbers.

To expect users to download a 3rd party tweak to make their card reach advertised performance or get lucky with an OEM is unreasonable. This situation stinks.

[This message has been edited by dorbie (edited 05-15-2002).]

05-15-2002, 01:47 PM
Huh, dorbie, what are you talking about? I didn't reply to this thread because I didn't see anything specific worth replying to. (Now, there _are_ some threads that I simply won't reply to at all, except perhaps to correct a gross falsehood. Such threads typically relate to our competitors' products, or our future unannounced products, or other such things. If you want to ask me a question or otherwise expect a reply from me, don't post it in a thread like that.)

The matter was clarified in another thread. People can measure three things, all of which are very different from one another. One is _vertices_ per second. One is _triangles_ per second. And the last is _indices_ per second.

The last is hard to measure because you will probably just hit the triangle limit. (But in theory a given chip can only process indices at some rate, even if they all get cache hits for post-T&L results.) So let's ignore it.

The triangle rate is strictly a matter of _triangle setup_. Different chips have different setup rates. The original GF3, for example, would hit a setup limit at 40 Mtris/s. (You can deduce clocks/triangle from such a number; I leave this as an exercise for the reader.) Of course, it's entirely conceivable that setup rates might depend on the attributes that need setup (since setup typically involves things like computing d/dx, d/dy for a triangle).

Another number is the vertex rate. This can depend heavily on the T&L modes in use, of course. It can also be limited by the size of the vertices, if those vertices are coming via AGP. Typically, peak vertex rates are measured in cases where AGP is not a bottleneck (in the limit, you may need to use shorts for vertices or use video memory) and the only computation needed is a transform from object space to window space.

The trick is that you often can't measure _vertex_ rates by using triangle strips, only _triangle_ rates; because the triangle setup load is effectively tripled. So if your GF3 was running at 40 Mtris/s with long triangle strips, it would be incorrect to claim that its vertex rate was 40 Mverts/s. Instead, its vertex rate would have to be measured by drawing independent triangles, where triangle setup is not a bottleneck.

Once you keep all this in mind, the benchmark numbers all make sense. The short of it is that the GF4 Ti ends up with about twice the vertex rate (higher clock speed + architectural changes) and a moderately higher triangle rate (higher clock speed) than a GF3 Ti500. In many cases, vertex rates will be over twice as high.

As for fast writes and AGP sideband addressing, those are completely different issues. The decisions we have made w.r.t. these features reflect such things as our need to work around certain chipset bugs, for example. The same is true of falling back to AGP 2x on some platforms. However, if you're using a GF4 Ti on the right motherboard, I believe you should end up with both FW and SBA on by default, as well as AGP 4x.

Which motherboard is that? To be honest, I have no idea. I have enough other things to keep track of... and I'm still stuck at AGP 2x myself with my BX system, so it doesn't affect me.

Anyhow, please cut out the conspiracy theories. (If the conspiracy is no more than that marketing departments always choose the largest-looking number possible to promote a product, that's hardly a conspiracy; if our marketing _wasn't_ doing that, now then I'd have a complaint.)

- Matt

05-15-2002, 02:49 PM
I think that there is something wrong with this BenMark5(it is Direct3D app, isn't it) benchmark.
On my prehistoric Celeron 333, AGP 2X, with GeForce2MX i got 3.5 Mtris/sec on 640x480 display.
When I use that NVIDIA SphereMark(OpenGL) program I got 13 Mtris/sec no problem and when I use my own
little OpenGL fullscreen demo that renders 130 teapots with static VAR in video memory i got up to 20 Mtris/sec,
when I render 130 spheres I to got around 13 Mtris/sec, all this on same machine
(models are teapot and sphere4 from DirectX SDK).
I use fulscreen 640x480x32 with one directional light using GL_TRIANGLES with glDrawElements.
What's the deal with this BenMark5?
How much tris you guys get with SphereMark?
Thanks, mproso.

05-15-2002, 03:33 PM
Sigh, Matt it's you who are confusing two or three issues.

The first part of my post relates to the API issue raised in the more recent discussion, and frankly I don't really care about that. Nothing about what I wrote is incorrect and I don't need to be told these are two separate issues when I addressed them entirely separately in my post.

The second part relates to the bull**** of having to download a tweaker to get the full performance with a GeForce card. For example a GeForce4 Ti 4600 delivering 17 million tris when it should be nearer fifty. It looks like very few if any have managed this without tweaks & hacks on a range of motherboards and drivers. God help joe public, who NVIDIA hopes just won't notice the performance delta on his crippled hardware. In the mean time nobody from NVIDIA comments on this. This isn't a conspiracy theory it's a fair description of the situation.

Most of the performance deltas being discussed relate to the same software on the same hardware measuring the same thing, before & after "tweaks" have been applied. I don't really care about someone complaining about the finer points of 38 million tris vs some even higher number with different data types, state or whatever, I used to deal with exactly the same customer issues at SGI and I can sympathize but that doesn't excuse the other issue.

Maybe you should be more up front with people buying these cards. The fact that you are using AGP 2x is completely irrelevant, I don't care if you have a Riva 128 on your desk. My Geforce4 languishing at 1/3 of its geometry performance without fast writes is the issue with a dearth of information on how to remedy that from the manufacturer.

[This message has been edited by dorbie (edited 05-15-2002).]

05-15-2002, 06:49 PM
At least for OpenGL, fast writes are unlikely to have that sort of performance impact. I don't know about D3D. But it could easily be something very simple: for example, we might disallow video memory vertex buffers in certain situations (which this app might be hitting) unless fast writes are available, and then the app might hit some AGP limit. I have no idea, to be honest.

But the simple fact is that we disable fast writes in some cases because we *have no choice*. There are a lot of chipsets out there that have bugs where data gets _corrupted_ if you use fast writes. Say what you want about customer issues, but I think customers would rather not experience system hangs due to data corruption...

I'm not about to get in the business of telling you exactly which products are broken -- but these data corruption bugs are real, and when we hit them, we have a choice between (A) disable fast writes on certain platforms or (B) don't ship the product at all to retail because some user somewhere might plug it into the wrong motherboard. I think it's obvious that we are going to choose (A).

- Matt

05-15-2002, 07:16 PM
Fast writes do appear to be the issue, at least that's the main thing I explicitly enable using the tweaker, but there may be some other arcane thing going on under the covers.

Choice A is a reasonable one only if you inform customers. Allowing them to make an informed decision about their level of acceptable risk would be even better. You deliberately overrode my explicit bios settings, and I have to go and find a 3rd party utility (WCPUID3) just to discover that you did that.

It's not as if this is a marginal issue, we're talking about a 3X performance delta under some circumstances. Just look at the confusion at the start of this thread, most people contributing to this thread don't get confused as easily as your average retail customer. It looks like nobody here has lucked out and got this to work spontaneously with their hardware. What do we have to do, buy an nforce mobo?

I'm skeptical about the criteria for tracking and testing this. There are lot's of chipsets and even more mobos and still more BIOS patches, that (for example) patch AGP stability issues in AMD chipsets.

05-15-2002, 07:46 PM
Arent the marketing figures fakes:

maybe done like this

Heck, why create an RC at all? Maybe this is how they get their numbers. BenMark5 actually is rendering. Thats the problem.


05-15-2002, 09:18 PM
There's often some credible basis for the numbers but it depends on who is making the claim. It's a distinct disadvantage to be honest at times :-)

There are two separate issues raised w.r.t. this, the original one is 16 million vs 27 million on GeForce3 both numbers can be seen using benmark5, depending on your chipset features and driver support/tweaks. It appears that NVIDIA sacrifices performance for stability depending on your mobo by disabling certain AGP functionality.

The other issue is 38 million in benmark5 on GeForce4 Ti vs some theoretical peak as advertised. I think your comment applies to the latter, so see Matt's first comment because he mainly addressed this.

In general claims vary from vendor to vendor and at times it's kind of a marketing arms race which can get totally out of hand. For example I worked for one company which used to turn off back face culling and count 2 triangles for every one. That's really low, their justification was that SGI did this too (SGI has never done this). I tried to get them to change this practice to no avail.

05-15-2002, 09:31 PM
From my understanding of how this works, the BIOS on the mainboard needs to support SBA &FW, and the BIOS on the videocard checks the mainboard BIOS if it supports SBA & FW, and finally, the drivers need to support SBA & FW to make it work.

So the drivers query the vidcard to see if it supports FW & SBA, and if so, they *also* check for known problems with certain chipsets that *MAY* have problems, and if all is OK, then they go ahead and use FW & SBA.

I say may only because of the numerous patches that nvidia may or may not have tried out yet. They have to deal with at least 16 chipsets minimum, from intel, Sis, Via, AMD, and I know I am forgeting someone else.

I don't see the real reason why Nvidia (or ATI or Matrox or...) don't say who is not following the AGP 2.x specs (if that is the case), since now, the user has no way of knowing if FW & SBA is fully supported (besides using the aps that tell you what your current settings are), but the MARKETING GUYS all assume that all is peachy, and throw out those optimal numbers.

05-16-2002, 01:28 AM
You've summed things up nicely.

I think that when it comes to electrical signals on a wire, EMI inside a PC, timing & voltage tollerance etc exactly who is to spec and what tolerance is where is not as clear cut as things often seem in the software world. It may not be something as simple as protocol 'bug', but who knows, nobody is talking afterall. I don't take it at face value that it's the mobo makers to blame.

I totally agree that keeping up with the various mobos and drivers seems like an unlikely prospect, that's just one of the reasons I'd like to be able to force the issue without a 3rd party piece of software which hacks the bios/registry/driver libs or whatever the heck it does. I'd at least like some sort of compatibility information, I think I'm owed that when I hand over my cash.

Lest we forget, this thread was started by someone who upgraded his NVIDIA graphics card and got a performance slow down.

05-16-2002, 11:58 AM
dorbie, we're not about to give people control panel options that say "turn on fast writes (your computer may hang!)". Unfortunately these things are often corner cases, and even though in normal usage you might not hit it, it might not be good enough to pass WHQL, or to pass OEM testing.

Turning everything off is not necessary. It's just that you should do that when you want to hit the number for the first time. Then, you can start turning things back on. Obviously, if you have 1 Gpixel/s and 100 Mtris/s (for example) you can't get the 100 Mtris/s unless your triangles are smaller than 10 pixels each. But if you want to talk about the peak rates, then just that some _particular_ benchmark doesn't achieve N Mtris/s doesn't mean that the peak rate isn't *actually* N Mtris/s. Just that BenMark doesn't achieve some number doesn't mean that the number isn't achievable (even perhaps drawing the same thing as BenMark draws).

Let's put this in perspective. BenMark is not some fancy benchmark app. It's just a little D3D app that someone hacked up a _long_ time ago to test this stuff.

Also, don't fall into the trap of "newer hardware MUST be faster than older hardware in all possible scenarios." Indeed, that's generally an impossible design goal!

- Matt

Jurjen Katsman
05-16-2002, 01:19 PM
Matt: While totally agreeing with most of what you said. (Basically just 'don't make assumptions on one thing based on another') I do place serious doubt on the actual realworld ability of the GF4Ti to draw the same sort of stuff as BenMark and get 100+Mtris out of it.

I would think it would be very interesting if NVidia would provide a demo showcasing this. I would think it's fairly normal to have a demo for an advertised feature http://www.opengl.org/discussion_boards/ubb/smile.gif