When should you not use OpenGL's Matrices?

I was just reading a bit about optimizing matrix commands for SSE, SIMD, MMX… But I have no idea for what operations you would implement it for.

If it’s faster and better to use these instructions instead of opengl commands, shouldn’t we just ignore opengl’s matrix operations altogether?

The matrices are an integral part of OpenGL and are needed for rendering one way or another. You could in theory perform all transformations in your program and then pass the final vertices through identity GL matrices, but this would nix any hardware acceleration gains.

I think where exploiting special instruction sets comes in is where you calculate custom transformation matrices. Whether you then LoadMatrixd or MultMatrixd or do something else is up to you.

“You could in theory perform all transformations in your program and then pass the final vertices through identity GL matrices, but this would nix any hardware acceleration gains.”

OK. I thought it might have improved performance with using those instruction sets to calculate matrices, not reduce it. So I take it that OpenGL has already optimized it’s matrix routines for different processors? How is it that openGL’s matrix math is faster?

If you have hardware acceleration then the transformation probably isn’t even performed by the CPU. Your OpenGL driver calls upon your video driver to call upon the built in features of your video card to do all the work. That’s why hardware acceleration works so fast–the CPU doesn’t have to do all the video work too; the onboard controller of the video card takes that load off its back.

If you do NOT have hardware acceleration, then how the transformations are optimized CPUwise depends on the provider of your OpenGL driver. If you’d like to skip it altogether, then you can write your own per-CPU optimized code with those special instruction sets and see how that goes.

Originally posted by Omaha:
If you have hardware acceleration then the transformation probably isn’t even performed by the CPU.

As far as I know is this only true for matrix-vectors operations. The hardware is there to transform a lot of points. Building the matrices themself is done on the CPU with the possible support of SSE, 3DNow and other extensions depending on the actual implementation. In reality do I think that both NVidia and ATi is using them to speed up glMultMatrix and similar operations.

This setup does not take a lot of time so the reason to use your own matrix functions is not speed but that you have to in some cases. The most common reason has to do with “Gimbal lock”.

Actually, I was referring to the transformation of an input vertex by the modelview and then projection matrices and so forth to ultimately become a vertex in the window.

When we are now seeing even vertex arrays being stored on the server, it seems counter-intuitive to send vertex data from main memory to the video card and then back to the CPU and then back to the video card.

Thanks everyone, so I guess I’d be basically wasting my time writing processor specific code, except if I’m doing software rendering which Mesa probably does that for me already…

With a gfx card using with a hardware T&L engine, the transformations will be done on the gfx card, rather than the CPU.

But if things are done “in software” isn’t that all on the CPU, like say with Mesa?

Hi!

As previous posters said, its better to allow OpenGL to do the matrix transformations for you. Most cards these days support hardware accelerated transform and lighting (all GeForces and Radeons), so they can do the transformations much faster than you can do in the CPU.

Also, most games tend to be limited by the CPU than the GPU, and you’ll find that Nvidia and ATI are constantly encouraging developers to move as many computations as possible onto the GPU.

There are a few cases where you might want to perform your transformations on the CPU and in those cases, the special optimizations you mentioned are essential. For example, suppose you are doing skeletal bones animation with vertex weighting and you want to do high precision triangle-vs-triangle collision detection. Or if you need to calculate silhoutes for stenciled shadowed volumes, you might want to store a transformed copy of the vertices in main memory.

Usually you are muling matrices on cpu anyway -> glRot/glTran/glScale/gluLook/… and then pass concatenated matrix to vp as param. But I think that I heard that Muling matrices(rots/trans) are faster than loading them , so I guess those SIMDS are used anyway