Processor specific math operations

Everybody would like to have a library wich able to do the most important math operations like matrix multiplication or inverse calculation very fast.
To achieve this we need to use 3dnow, sse and other processor specific instructions so we loose the platform independency.
I’ve seen that DirectX have such functions which are very optimized (it sets up a call table at first call).
Why don’t we have such thing in OpenGL? The manufacturers already have these functions in their drivers, just we can not access it…
The only problem would be the data alignment, but that can be solved with a glSet(GL_MATH_ALIGNMENT, int) function (in a process or thread context).
For example:
glSet(GL_MATH_ALIGNMENT, 16); // the program will give all vectors and matrices with 16 bytes alignment
glMathMulMatrix(float* dst, float* src1, float* src2); // dst, src1 and src2 have 16 byte alignment
On Pentium4 this function may be use SSE, on Athlon 3DNow etc.
Of course this fuctions should be context independent.
With this functions we could able to access ALL processors specific instrucions for vector math…
I don’t think that it’s difficult to write such thing to the driver writers.
What’s your opinion?

This topic was opened originally at the advanched forum: http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/011233.html

How about using feedback mode and letting the driver figure it all out?

Originally posted by al_bob:
How about using feedback mode and letting the driver figure it all out?

??? Sorry, I don’t understand.

[url [Google"]opengl feedback mode - Google Search]Google](opengl feedback mode - Google Search) to the rescue!

I think, you misunderstand me.
I would like to have only some math functions wich runs totally on the CPU, but they are optimized (SSE, 3DNow, for PowerPC etc).

As time goes on, hardware companies and driver writers are going to spend less time writing processor specific optimized code. Hardware is getting more and more capable, so the need to even to any TNL on the CPU is gradually going away. If you want optimizied math routines, there are many companies and open-source groups that make such libraries.

I think you’re assuming there’s an SSE optimized Vector and Matrix library inside the driver and it may not be so. Even if it was, it would be difficult to expose that to the user, except as additional OpenGL extensions (for now). Who wants to call glMultVector3fVector3f_EXT((float*)myVec1.data, (float*)myVec2.data) ? And even then, it would probably go through a DLL, and the compiler won’t be able to coalesce multiple SSE instructions into register-coherent blocks for the real speedup.

We’d have to see an OO API first, IMO, and don’t even try to bring that up with Korval around

Check out libsse (google libsse) for a cross-platform open-sourced SSE library that, last I checked, had SSE/3DNow code for vectors and matrices and was only lacking quaternions.

Avi

[This message has been edited by Cyranose (edited 01-15-2004).]