Geometry performance FAQ

With so many questions about geometry paths (CVA, VAR, display lists, strips), memory allocation, AGP performance etc, shouldn’t there be either a specific FAQ on this, or a detailed section in the dev FAQ on this site?
I haven’t seen an exhaustive text on this around. Of course there is good information around, such as the Geforce performance FAQ (obviously nv-specific), and this forum, but it would be best to have everything (unextended GL, ARB/EXT GL and card/vendor-specific optimizations) in a single place.

So my question is:
Is there such a text? XOR Could anyone write one? (I would do it myself if I was more knowledgeable.)

I don’t know of such a thing, but it would be a great idea.

Maybe you (or someone else) could make a benchmark program that draws triangles with each of the methods you mentioned. Make sure that it varies the number too. Once that was done, you could post it here and collect results from everyone’s video cards and make a decent faq.

Unfortunately some of the methods will depend on CPU/bus speed, but this would be better than nothing and pretty safe as long as you didn’t compare too much across cards.

– Zeno

I think that a better way would be to go over these forums and collect information posted here. There’s some official information that’s pretty helpful. Check for example the read buffer performance suggestions by Matt in the “Pointer to Framebuffer” thread in the OpenGL suggestions forum.

Originally posted by Zeno:
[b]Maybe you (or someone else) could make a benchmark program that draws triangles with each of the methods you mentioned. Make sure that it varies the number too. Once that was done, you could post it here and collect results from everyone’s video cards and make a decent faq.

– Zeno [/b]

Sounds like a good idea to me. We could really do with a definitive TnL benchmark program.

It would be good to be able to look at the results and say if I rewrite my app using method X I should get about Y times the performance with Z graphics card. (Assuming a geometry bottleneck)

The only problem is there are so many variables, the test would need to be for VAR Video, VAR AGP, CVA, display lists, normal vertex arrays and immediate mode. For each of those it would need to test coloured unlit triangles, coloured lit triangles, textured unlit, textured lit, textured lit with fog etc…

Also pre detonator 10’s only certain vertex array formats were accelerated by the geforce, should we use one of those formats for the benchmark program?

Not sure why you would want to be able to vary the number of trianlges, dont we just want enough triangles so that the geometry is the bottleneck and not fillrate. I would suggest about a million.

Sorry, I should have been more clear. When I said you should be able to “vary the number of triangles”, I didn’t necessarily mean per frame, but rather per vertex-array. I have heard that there are optimal sizes for vertex arrays (and display lists) so being able to vary the numbers would allow you to quickly find the sweet spot.

Being able to vary the total number drawn per frame could also be useful…some cards may be geometry limited at 1 million while some (imaginary) card may be fill-rate limited still, and you could get better tri throughput by sending more tris.

– Zeno

Even with all the mentioned combos of modes, there are still more…

Index reuse can make a big difference, and whether it makes a difference depends on the specific mode.

There are all sorts of extensions to the T&L pipeline. Supporting them will be a pain.

You have to set a texgen mode for each individual texture coordinate for each texture unit (so 16 different modes to set on a GF3), and a texture matrix for each texture unit as well…

  • Matt

Another number that’s useful to measure is the cost-per-buffer-submission versus cost-per-vertex versus cost-per-rendered-pixel.

Also, the number of triangles per “scene” is quite important. Just because a card can do, say, 2 million triangles per second at 10 fps does NOT mean it will get 40 fps when using 50,000 triangles per frame. All measurements should be taken with vsync off, but including a glFinish().