Relative cost of modelview transforms?

Been meaning to ask this for a while…

Does anyone have any benchmark data about the relative cost of modelview transforms on modern GPU consumer accelerators? By “relative cost” I mean in comparision to switching textures/materials/blendmodes, etc. Assume “proper” transforms (glTranslate, glRotate) rather than just an arbitrary glMatrix.

In more concrete terms, when designing an engine to handle lots of independently moving objects, does it make sense to draw primitives grouped by modelview transform (at the cost of additional texture binds) or by texture (at the cost of additional transforms, as you traverse your scenegraph once for each texture)?

Any info would be very much appreciated. All optimization tips I’ve seen - including the latest NVidia FAQ - seem to assume a Quakelike FPS setup where all scene geometry is static and all moving objects use 1 texture each, so the transform-sort-vs-texture-sort question gets ignored.

Although I have no evidence, everything I have read says to group by texture. This makes sense because loading a texture from system memory into video memory would take longer than a transformation. Although you can request that a texture remains in video memory (through glPrioritizeTextures), I do not think you can lock it there.

If anyone actually does some tests, it would be interesting if they confirmed that all textures were resident.

Yeah, this is the standard recommendation. My worry is that what they actually say is something like (direct quote) “textures first, then lights, then blending modes, materials and so on”. Nowhere do they explicitly mention transforms.

What really worries me is that I remember seeing several statements (in the context of D3D-RM, IIRC) stating that coordinate frames (== modelview transforms) was a major performance killer. Before GPUs I think most game developers did all their own modelview transforms and just used OpenGL/D3D-IM for rasterization, but in the Age of GeForce this seems a little perverse.

Oh well. I’ll see if I can find some poor unsuspecting soul at NVidia to badger; I’ll let you know if I get anywhere.

Mike,

For NVIDIA hardware, the relative cost of pushing, popping, and otherwise modifying the modelview matrix is generally in the noise. I don’t know whether this is the case for other IHVs.

Hope this helps -
Cass

Cass, you’re a star. Thanks a million.

Mike, all your vertices that you send up the pipeline will be multiplied by the current modelview and projection matrix’s anyway. The only cost of doing transformations is one matrix by matrix mult which is pretty small compared to the transformation of all your vertices.

Greg,

What I think everyone is saying is this: if you have nvidia hardware, load your OWN transformation-matrix onto the model view matrix, and have the graphics card transform the vertices. That way the bulk of the math (multiplying the vertices by the matrix) is done by hardware. You can use glRotate() etc if you want the matrix multiplications done in hardware as well. But you probably won’t see any huge performance increase (ie, it isn’t worth the hassle).

If you’re not optimizing for NVIDIA, create your OWN transformation matrix, using special case optimizations/trade-offs. This is MUCH better than having OpenGL use tons of matrix multiplications to produce your final transformation matrix. When you’re done, multiply your vertices by your transformation matrix. Chances are you’re gonna do it faster than OpenGL would (unless you have T&L), OpenGL isn’t a math-library.
Then just load the identity matrix to the modelview matrix, that way OpenGL will skip multiplying the vertices going through the pipeline by the modelview matrix.

Gotcha, thanks Caeser.

Caesar, I’m not so sure about memcpy-style matrix loads being faster than glRotate, glTranslate etc. I seem to recall reading that OpenGL lighting calculations require matrix inverses, and the code to generate an inverse for a completely arbitrary 4x4 matrix is fairly hideous. With glRotate etc the inverses can be generated directly from the function parameters, much more cleanly.

Using glRotate etc also provides hints to non-T&L drivers to optimize for special cases - if you’ve only done glRotates on the current matrix they only need to multiply verts by a 3x3 submatrix; if you’ve only done glRotates and glTranslates they can use a 3x4 submatrix etc.

Caesar,

As MikeC says, the general rule is to favor using the OpenGL transform calls. They do allow the driver to (more easily) recognize simplifications in matrix arithmetic. This is generally for cheaper matrix-vector multiplies, but it could also be useful for simplified matrix inversion.

Don’t actively avoid using glLoadMatrix() under the assumption that it’s a performance killer, but certainly don’t favor it over the specific transform calls. They make your code more readable. :slight_smile:

If you’re trying to avoid an OpenGL driver’s software T&L, then obviously you have to do everything yourself. Expect
hardware T&L (or a well-optimized software implementation), and you can ignore this headache! :slight_smile:

Thanks -
Cass