This answer comes from a NVIDIA called “OpenGL Performance”
Very excellent document
-
Fastest
GL_TRIANGLE_STRIP
GL_TRIANGLE_FAN
GL_QUAD_STRIP
These maximize reuse of the vertices shared within a given graphics primitive, and are all similarly fast.
GL_TRIANGLES
GL_QUADS
These aggregate (potentially multiple) disjoint triangles and quads, and amortize function overhead over multiple primitives.
Slowest
GL_POLYGON
A bit slower than the independent triangles and quads.
===========================================
-
Fastest
DrawElements/DrawArrays Using wglAllocateMemoryNV(size,0,0,1)
Saves data in video memory, eliminating any bus bottleneck. Very poor read/write access.
DrawElements/DrawArrays Using wglAllocateMemoryNV(size,0,0,.5)
Saves data in AGP (uncached) memory, and allows hardware to pull it directly. Very poor read access, must write sequentially (see below)
Display Lists
Can encapsulate data in the most efficient manner for hardware, though they are immutable (i.e. once created, you can’t alter them in any way).
DrawElements using
Compiled Vertex Arrays (glLockArraysEXT)
Copies locked vertices to AGP memory, so that the hardware can then pull it directly. Only one mode is supported (see q, 7 below).
DrawElements and DrawArrays using Vertex Arrays with Common Data Formats
Optimized to assemble primitives as efficiently as possible, and minimizes function call overhead. 13 formats supported (see q. 6).
Immediate Mode
Multiple function calls required per primitive results in relatively poor performance compared to other options above.
Slowest
All Other Vertex Arrays
Must be copied from application memory to AGP memory before the hardware can pull it. Since data can change between calls, data must be copied every time, which is expensive.
=======================================
- T&L is automatic. But now, instead of T&L, the Vertex Shader is the new generation. But the GeForce 3 & 4 always support T&L for comptability.