Screw NVidia!

Resently I have upgraded to a GeForce2 card from an old TNT2 (not ultra). I was hoping to get much higher frame rates in games and also in my little OpenGL project I’m working on. Well, I got a significant boost in all my 3D games, no complains there. But the thing that totally surprised me (to say the least) was a 3x drop in framerates in my application! That’s right, GeForce2 runs my code 3 times slower than the old TNT2. I’m not saying my code is all around optimized and absolutely efficient. In fact, it’s not. But the worst thing I was expecting was no increase in performance! Isn’t this card supposed to be several generations above tnt2? WTF is going on? Actually I found a way to get almost all of my old performance back, by turning off T&L module with a tweaker utility. But that’s just nonsense. Ok, maybe NVidia wants us to write code in a certain way to take advantage of T&L acceleration, but why put it there in the first place if it’s 3 times slower than my Duron 700? So, I basically have two quesions. 1) Am I the only one getting screwed by NVidia? 2) How do I turn off the goddamn T&L unit from inside OpenGL application? Thank you in advance.

I dont have an answer to question #1, but why dont you just write your app to take advantage of hardware T&L, that was the point of creating the Geforce 2. There is a point when just raising the clock speed isnt going to help much and nVidia knows this. You can learn about it in a digital eletronics course at your local Community College. Eletrons can only travel so fast before they start jumping the gaps in the silicon. After that point you have to either supercool your video card (not very cost effective) or rearrange the way things work to get speed. Maybe you should step back at figure out what your code is doing wrong?

This message isnt meant in anyway to be a flame, just some info & suggestions, I appologize if any of this comes off rudely.

Mario,

Can you provide a simple test app that illustrates the problem you’re having? Are you certain it’s a T&L problem?

Email me at cass@nvidia.com and I can help try to sort this out.

Thanks -
Cass

First of all, there is no way to turn off T&L in OpenGL. Whatever utility you’re using, that’s probably not what it’s doing.

Second, all GeForce cards will easily outperform any CPU at T&L tasks. There are a bunch of folks out there on the web who will tell you that “a P3-700 will outperform a GF1”. But these people simply do not know what they are talking about. Every benchmark that has been quoted to “prove” this is pretty thoroughly flawed.

Finally, you haven’t said anything about what you are doing in the app in the first place.

I could contrive cases where newer hardware would be slower than older hardware. That should really be no surprise. But these cases should be somewhat rare. If you have hit one, there’s probably a way around it. And there are many other possible explanations for what happened.

  • Matt

So, are you saying that electrons in GeForce are traveling 3 times slower than in TNT2? As I said, I don’t expect and don’t really need an enourmous speed increase for my app. All I want is to run it at least as fast as on TNT2. Besides, not everyone has a GeForce. It may be just a driver issue, although I have installed original drivers that came with the card. At any rate, I find it very disturbing that the same code runs much slower on the latest generation videocard than on a 2 year old “obsolete piece of junk”.

Thanks for all replies. I’m on a different computer right now and don’t have access to my program, but I’m going to post detailed information as soon as I get home. It’s probably too early to jump to conclusions, I was just really pissed when writing the original post. All I can say is that the exact same code runs 3 times slower with GeForce on an otherwise exactly same machine. I don’t remember the name of the utility I used (GeForce Tweaker or something), but it probably just toggles a switch which is already built in OGL drivers. Anyways, as soon as I get back from work I’ll check it again and write a detailed report including what OGL functions I use.

Oh, by the way, I know that in theory GeForce is supposed to be much faster, and as I said all my games got a major boost in framerates. But they can’t be all tailored for hardware T&L, can they? At least not the older ones. Maybe I’m doing something that exposes a bug in the drivers. I really want to find the reason behind this “oddity.”

[This message has been edited by Bloody Mario (edited 01-05-2001).]

My only complaint with the GeForce2 is wireframe performance. My engine is irracionally slow in wireframe mode compared to point and solid modes. I have tryed turning on and off different states, used Arrays, Triangles, Strips, Display Lists, and all are unusable in wireframe mode. I think my app makes the drivers fall to software mode when in wireframe mode (only explanation). My engine is now 2 years old, and have been optimized and debugged a lot, and works great in other cards (Intergraph, Mitsubish, TNT, SGI Visual Workstations, etc…), so I think is a particularity of the GeForce implementation of OpenGL.

That could be Bloody Mario’s problem, but dont assume a driver problem, because 99.9% of the time our bugs are the cause of this type of weird things.
Bye the way, my GeForce2 simply rules, beatting all other cards at work (including $2000+ cards like the Intergraph Intense 3D 3410 AGP with geometry acceleration).

Well, I got home and did some more tests. Still can’t figure out what the problem is. So I stripped the project of all rendering and other code except terrain and archived it. You can get it here http://geocities.com/bloodymario_cs/project.zip Control camera with A,S,D,W,C,V switch between GL_LINE and GL_FILL with Z.

The problem still appears. When I start the program I get 180fps, when switching to GL_LINE it goes up to 250. When I turn T&L off these numbers are 270 and 240 (it slows down for some reason). When I move around with T&L on, fps sometimes jump to 250. They go down when I move back.

To switch T&L on and off I use Geforce Tweak Utility http://www.guru3d.com/geforcetweakutility/

If you need any additional informaion, let me know.

[This message has been edited by Bloody Mario (edited 01-05-2001).]

Tried your application, and I got no problem what so ever with speed. 270 fps with triangles, and about the same in wireframe, and I didn’t turn off HWT&L.

You also said you use the drivers shipped with the card. Maybe thats the problem, that these drivers are not so optimized. Try download the latest drivers from NVidia .

Well, I’ve downloaded latest drivers from nvidia site and still having the same problem. It seems that terrain is the only thing that slows my pragram down. I don’t know what to think. I’ll try to install fresh copy of windows and see if it fixes the slowdown.

If all it is is wireframe performance, I’m guessing that the results you are seeing are the expected results.

  • Matt

I would also like to say that the GeForce2 is absolutely **** hot compared to the Wildcat 4210 (GeForce2 = £200, Wildcat4210 = £4000).
This is all down to fill rate of course, as the Wildcats geometry engine is far faster.

Actually, I’m somewhat doubtful that the Wildcat has a faster geometry engine.

  • Matt

If you have tested your card with games and they ran faster, maybe you should try optimizing your code for the GeForce… there are several issues about using T&L and other stuff correctly, that it’s actually more easy to use it wrong (and to get poor performance as a result).

If you have bought the GeForce2 for use in the development of you project, I think it’s not a crazy idea to optimize your project for the GeForce2, instead of expect a great performance boost when you app. is not optimized for the card…

  • Royconejo.

[This message has been edited by royconejo (edited 01-16-2001).]

[This message has been edited by royconejo (edited 01-16-2001).]

Well, I’ve downloaded your project… and:

void CTerrain: raw()
{

float color;
float fx,fz;
int x,z;

/* < Modified by Royconejo >____________(added comment bars…) */
//// glPushMatrix();
//// glPushAttrib(GL_CURRENT_BIT | GL_DEPTH_BUFFER_BIT | GL_LIGHTING_BIT | GL_ENABLE_BIT);

//// glEnable(GL_TEXTURE_2D); // Enable Texture Mapping
//// glEnable(GL_DEPTH_TEST); // Disable Depth Testing
/* _____________________________________________________________________ /
/
Performance improvement:
_____________________________________________________________________
With those lines : Solid: 175 fps / Wireframe: 205 fps
Without : Solid: 205 fps (+15%) / Wireframe: 250 fps (+18%)

 * Tested with GeForce2 MX, AGP4x, PIII 733.
 _____________________________________________________________________ */          

glLightfv(GL_LIGHT1, GL_AMBIENT, LightAmbient);
glLightfv(GL_LIGHT1, GL_DIFFUSE, LightDiffuse);
// glLightfv(GL_LIGHT1, GL_SPECULAR, LightSpecular);

etc…]

I think it’s only a design problem… (those innmediate mode calls are killing the GPU, for example, and you shoud try to avoid setting as many render states as you can…)

  • Royconejo.

> Second, all GeForce cards will easily outperform any CPU at T&L tasks

nonsense

Originally posted by john:
nonsense

Sorry, but it’s true. Make it a P3, P4, Athlon, G4, anything. You’ll have a real hard time making the CPU win in a real apples-to-apples comparison.

No, 3DMark “High Polygon” scores don’t count – that benchmark is pretty flawed.

  • Matt

nVidia…the coolest company on the face of this planet! How DO you compete with a HW T&L chip?!

Originally posted by john:
[b]> Second, all GeForce cards will easily outperform any CPU at T&L tasks

nonsense[/b]

I am totally with Matt on this one.

The 3d cards have much more specialised algebra functions in hardware than a cpu. Just with that, you kick any CPU butt! It probably takes 10 times less cycles to compute a matrix multiplication on those cards than on a cpu.

okay. i was actually in a bad mood that day, and being a TRIFLE sarky. ANY cpu is amazingly generic. Tera? MIPS? UltraSPARC? Alpha? &c.

sorry =)

oh, but on a SMALL point… T&L is just matrix multiplication. I haven’t seen anything on the architecture of the GeForce chip (though I can imagine it’d be pretty freaking impressive with the number of transistors it has), but it’d need to be pretty superscalar and pipelined to justify T&L (as a SOLE facet) is faster than cpu. Don’t forget that the GeForce chip also has to scanline convert and do a myriad of other stuff.

If you’re arguing that it is faster for the GeForce to T&L along with all the other stuff as opposed to purely comparing the fploating point performance, then that’s another thing.

Don’t get me wrong, you’re probably right. BUt… like I say… there are a LOT of cpus to say that “any” cpu is better.

cheers
John

[This message has been edited by john (edited 01-18-2001).]