max tris a second

beware im not a hardware junkie,
(keep in mind this has nothing to do with realworld examples)
im just curious what actually is the bottleneck to achiving(sp) maximum tris/vertices a second. eg my gf2mx200 is advertised at 20million tris/second.
i can achieve ±20.5 million a second (but no more, of course not u idiot thats why is a maximum ) but im curious as what is the bottleneck,
if i enable texture,normal arrays + lighting + color material (+ draw the benchmark with these i can still achieve 20million) if i change the indices from GLushorts to GLuints i can still achieve 20million.
so my question is what is the bottleneck?

Hi Zed, I use a GeForce3 Ti200 and I can only achieve 15Mtri/s. What kind of primitives are you rendering? What kind of pixel pipeline are you using? What buffers are you drawing in?
Thanks!

im just doing your plain old var with strips (think benmark5)
Q/ so its impossible for me to achieve 21m/tris+ on my card?
theres a challenge if i every heard one (dont think ill do it though)

I think there can be lots of bottleneck when rendering triangles. The performance greatly depends also on your hardware. I have a quite slow motherboard with AGP1x, so I’ll never get maximum performance with this board…
Another bottleneck can be the size of your vertex structure.

no this is definitly a bottleneck on the card (im hitting the advertised 20million tri/sec figure)

eg transfering verts is roughly the same speed as sending verts/norms/texturecoords even though im sending a lot less data,
btw glCullFace( GL_FRONT_AND_BACK ) has bugger all effect
also theres not much difference between sending the same indices over + over (thus nothing leaves the cache) then more normal data

Even knowing it is the card doesn’t always help. The various bottlenecks that are possible (on the card)
Interrupt speed of the GPU (an AGP input issue)
GPU tri processing speed
Graphics Memory throughput
Graphics Memory delay

I hate to say it, but the only way I know to verify WHICH part of the card is the bottleneck is to err… speed up the parts. If you overclock the bottleneck it goes up. With the new Geforce4’s and Geforce3’s overclocking does only a little performance gain because the bottleneck on the high end cards is now the AGP 4X bus. Thus the extensions to support sending down less information to get the job done. In a bus I/O issue you have to send less to get more, overclocking does little.

adding texturing and other extras slows down the max. If you check the spec on the card it “sometimes” says something like “20 million guaraud shaded/filled tris per second” Some cards, if you are lucky, will tell you all the specs (texture fills, etc). But be careful, some specs assume cull of up to 30% backfacing polygons (out here we call that a bold-faced lie. )…