Processor specific math operations

Everybody would like to have a library wich able to do the most important math operations like matrix multiplication or inverse calculation very fast.
To achieve this we need to use 3dnow, sse and other processor specific instructions so we loose the platform independency.
I’ve seen that DirectX have such functions which are very optimized (it sets up a call table at first call).
Why don’t we have such thing in OpenGL? The manufacturers already have these functions in their drivers, just we can not access it…
The only problem would be the data alignment, but that can be solved with a glSet(GL_MATH_ALIGNMENT, int) function (in a process or thread context).
For example:
glSet(GL_MATH_ALIGNMENT, 16); // the program will give all vectors and matrices with 16 bytes alignment
glMathMulMatrix(float* dst, float* src1, float* src2); // dst, src1 and src2 have 16 byte alignment
On Pentium4 this function may be use SSE, on Athlon 3DNow etc.
Of course this fuctions should be context independent.
With this functions we could able to access ALL processors specific instrucions for vector math…
I don’t think that it’s difficult to write such thing to the driver writers.
What’s your opinion?

[This message has been edited by Csiki (edited 01-03-2004).]

OpenGL’s server side math operations (ie. glMultMatrixf) can (and will) be optimized every way driver developers can think of.
For client side operations, you can use any library you want (even DirectX routines, if you think that it’s cool) or write your own (your code optimizer should do it’s job nicely enough).
There are Open Source math libraries for various programming languages (even if I prefer to write my own routines and collect them in my base code).
Maybe a standard library for “everything” would be fine - but I don’t think that it should be integrated in OpenGL (maybe call it OpenMath or something like that ).

Originally posted by mw:
OpenGL’s server side math operations (ie. glMultMatrixf) can (and will) be optimized every way driver developers can think of.
For client side operations, you can use any library you want (even DirectX routines, if you think that it’s cool) or write your own (your code optimizer should do it’s job nicely enough).
There are Open Source math libraries for various programming languages (even if I prefer to write my own routines and collect them in my base code).
Maybe a standard library for “everything” would be fine - but I don’t think that it should be integrated in OpenGL (maybe call it OpenMath or something like that ).

I have some problem.

  1. If I use DirectX, I loose the platform independency.
  2. If I write optimized library, I loose the platform independency.
  3. There is no optimized open source math library for 3d.
  4. The drivers already have these functions I would like just access them.

I agree with you.
D3D does give you some good reasons to use it instead of OpenGL. Of course OpenGL was designed to be a graphics API and nothing more, but i think it´s a shame that because of this definition some stuff simply gets ignored by the ARB (or whoever is the boss).
OpenGL might be “The Industry Standard for High Performance Graphics” but that does not mean, that it does not have to compete with D3D.

And if stuff like this does not fit into the concept of OpenGL, then why not creating a new API (like OpenMath), it should really be no work for the driver writers to simply copy&paste some code and rename the functions.

I always have the impression as if someone at the top thinks “Hey, everyone who wants portability is forced to use OpenGL, so we don´t have to worry about competition.” But in fact all the guys who really need the portability are usually CAD developers and therefore can mostly live without pixelshaders and some other advanced stuff, so they are not the guys who push OpenGL´s improvement. (Yes, there are exceptions, i know, so don´t flame me for this, please.)

Jan.

but OpenGL math needn’t (necessarily) be done by your CPU - it may be done on the GPU instead, and a data fetch could be rather expensive, event if the implementation itself could be lightning fast.

For Open Source math functions you may search (as an example) for Geometry.pas by Mike Lischke (Delphi source, but can be adapted for nearly everything, since the code is written well).
One port of it is optimized with 3D-Now assembler - and I’m sure that a lot of similar libraries exist in C/C++ as well (you could even use my BaseGraph, but I use strictly Delphi and C/C++ code there and rely on the code optimizer of the used compilers, strongly believing that the speed loss is negligible).

If you have to transform huge amounts of vertices locally, you’ll write your own routines anyway, if you don’t want to call a lot of sub routines for batched transformations.

Originally posted by mw:
[b]but OpenGL math needn’t (necessarily) be done by your CPU - it may be done on the GPU instead, and a data fetch could be rather expensive, event if the implementation itself could be lightning fast.

For Open Source math functions you may search (as an example) for Geometry.pas by Mike Lischke (Delphi source, but can be adapted for nearly everything, since the code is written well).
One port of it is optimized with 3D-Now assembler - and I’m sure that a lot of similar libraries exist in C/C++ as well (you could even use my BaseGraph, but I use strictly Delphi and C/C++ code there and rely on the code optimizer of the used compilers, strongly believing that the speed loss is negligible).

If you have to transform huge amounts of vertices locally, you’ll write your own routines anyway, if you don’t want to call a lot of sub routines for batched transformations.[/b]

Nobody said that it will be done with the GPU. The functions I need surely will run on CPU. But the drivers HAVE them…
And don’t say that there are surely such library. NO there is NOT such library (not optimized and/or not platform independent).
And from the call overhead: the D3D makes matrix multiplication in 100 clock, where the standard C code need more than 140 clocks. The call overhead is negligibe in such cases.

Well I don’t really want to argue about this. Some OpenGL drivers may do computations on CPU - others won’t (and with programmable processors and vertex data stored on server side this is rather a good idea), so I don’t know, if each and every OpenGL driver would really expose the functions you want.

As I said, a standardized library would be practical, but it’s not essential, since it’s not sure what to include in it (simple vector, matrix math? of course; Quaternions? maybe; physics? collision detection? ray tracing/ casting? a scenegraph for object/world representation? inverse kinematics?).
Since you do not seem too keen about Geometry.pas (which is really a good starting point), I found in Google the following link: http://www.ddj.com/topics/cpp/libraries/
which lists some C++ libraries, which may be useful to you - and of course there are OpenSceneGraph and ODE.

@csiki:

>>2. If I write optimized library, I loose
>>the platform independency
no - you are loosing COMPUTER SYSTEM independency. your SSE or 3D-now powered chip is available on linux as it is available on windows.

edit:
>>But the drivers HAVE them…
obviously, you are looking for others to do your job g i’m pretty sure, that all math functions, which could be executed on the GPU (since the rais of HW transforming) are actually executed on the GPU - so, do you like to drive your mathlibrary with GPU-specific code, if they simply has to copy&paste the code out of the driver ??
and: if you are doing this, thousand of times each second, i’m again pretty sure, that the calling overhead is killing all advantage.

[This message has been edited by DJSnow (edited 01-03-2004).]

[This message has been edited by DJSnow (edited 01-03-2004).]

Originally posted by mw:
Well I don’t really want to argue about this. Some OpenGL drivers may do computations on CPU - others won’t (and with programmable processors and vertex data stored on server side this is rather a good idea), so I don’t know, if each and every OpenGL driver would really expose the functions you want.

Don’t think that the glMulMatrix and such functions use the gpu.
It’s sure that all driver have some optimized 3d math for the cpu.


As I said, a standardized library would be practical, but it’s not essential, since it’s not sure what to include in it (simple vector, matrix math? of course; Quaternions? maybe; physics? collision detection? ray tracing/ casting? a scenegraph for object/world representation? inverse kinematics?).

Just for 3d math. There are optimized math library out there (BLAS), but it’s not for graphics: you have an optimized general matrix multiplication, but not for 4x4 matrices etc.


Since you do not seem too keen about Geometry.pas (which is really a good starting point), I found in Google the following link: http://www.ddj.com/topics/cpp/libraries/
which lists some C++ libraries, which may be useful to you - and of course there are OpenSceneGraph and ODE.

You misunderstand me. I downloaded a lot of library, which use 3dnow, sse etc. But they can be used only on win32 because of the inline assembly (MSVC).
I’ve even found an open source project which wanted to do what I’ve said (optimized 3d math), but it’s not optimized yet and seems to be dead already.


>>2. If I write optimized library, I loose
>>the platform independency
no - you are loosing COMPUTER SYSTEM independency. your SSE or 3D-now powered chip is available on linux as it is available on windows.

And platform independency, because of the different assembly format.


edit:
>>But the drivers HAVE them…
obviously, you are looking for others to do your job g i’m pretty sure, that all math functions, which could be executed on the GPU (since the rais of HW transforming) are actually executed on the GPU - so, do you like to drive your mathlibrary with GPU-specific code, if they simply has to copy&paste the code out of the driver ??
and: if you are doing this, thousand of times each second, i’m again pretty sure, that the calling overhead is killing all advantage.

  1. Not all math done on gpu. Then why write the ATI and the NVIDIA in their drivers that they uses sse, 3dnow etc., if they don’t need it???
  2. Ofcourse I want to do this job with others. I would be able to do the work for win32-x86. And that’s all. If the OpenGL would support it, than the driver could get the best from the processor (Athlon, Pentium4, PowerPC, Athlon64…).
  3. Please don’t write me more that the drivers doesn’t contain such optimized math code except you are a driver programmer.

[QUOTEThen why write the ATI and the NVIDIA in their drivers that they uses sse, 3dnow etc., if they don’t need it???[/QUOTE]
because it’s good for marketing, if you use many good sounding keywords?
Streaming extensions are great for moving data quickly and maybe they are useful for vertex array data, but one simple matrix multiplication won’t gain too much of it. (Besides even if I didn’t write an OpenGL driver, I’m pretty sure that every T&L card implements it’s matrix stack in hardware).
Something like OpenMath would be cool, but I really don’t think that it should be included in OpenGL (maybe in GLU, but at least on Win32 systems nobody seems to care about GLU development)

Just for 3d math

well, all of the topics listed above (and a lot more) are connected to “3D math” somehow. But I agree, an easy to use library for basic vector and matrix functions would help newcomers (btw. Geometry.pas works in Win32 and Linux environments - and processor opcodes are the same in Windows and Linux, you don’t have to use any Visual Studio specific syntax to write assembly routines for Win32 and Linux at the same time - and you don’t have to use Visual Studio under Windows at all).

Originally posted by mw:
[b][quote]Then why write the ATI and the NVIDIA in their drivers that they uses sse, 3dnow etc., if they don’t need it???

because it’s good for marketing, if you use many good sounding keywords?
Streaming extensions are great for moving data quickly and maybe they are useful for vertex array data, but one simple matrix multiplication won’t gain too much of it. (Besides even if I didn’t write an OpenGL driver, I’m pretty sure that every T&L card implements it’s matrix stack in hardware).
[/b][/QUOTE]
Might be. But ATI and NVIDIA for example gives universal drivers and:

  1. There are software vertex program emulators for their older cards
  2. All cards fallback software emulations when used in selection or feedback mode
    So the drivers HAVE whate I’ve said.


Something like OpenMath would be cool, but I really don’t think that it should be included in OpenGL (maybe in GLU, but at least on Win32 systems nobody seems to care about GLU development)

But why?
I would like to have just VERY lowlevel, but possibly optimized api for:
matrix-matrix multiplication, matrix inverse and batched matrix-vector multiplication.
That’s all, no more, no less.


Just for 3d math

well, all of the topics listed above (and a lot more) are connected to “3D math” somehow. But I agree, an easy to use library for basic vector and matrix functions would help newcomers (btw. Geometry.pas works in Win32 and Linux environments - and processor opcodes are the same in Windows and Linux, you don’t have to use any Visual Studio specific syntax to write assembly routines for Win32 and Linux at the same time - and you don’t have to use Visual Studio under Windows at all).[/b]

Newcomers need at least this, they need only copy-paste the basic implementation and ready. I need it.
The pas file is very portable, I need only a pascal compiler, yes?
Don’t you have a bas file? I like colorful projects.
To be serious: the biggest advantage would be that on all platform you get a simple small 3d math library where the OpenGL is accessible (even PowerPC or PS2), and I don’t think that implementing it would be a very big problem to the ARB members who already have these code…

[This message has been edited by Csiki (edited 01-03-2004).]

Geometry.pas was just an example for an Open Source library with vector and matrix operations. It’s Open Source, free, portable and there exist versions specialized on certain CPU extensions.

Sadly many C++ programmers think, that Pascal (Delphi, FreePascal, GNU Pascal, …) is somehow limited compared to C++ - probably because they didn’t try it really and see that it’s nearly 100% identical, with the exception that Pascal is more designed like natural English (even non-programmers can understand easy algorithms written in Pascal).
I wouldn’t say that Pascal is better than C++, but it surely isn’t worse - it’s another flavour of the same thing (at least if writing OpenGL and/or GUI applications), that’s all.
BTW Basic has it’s right of existence too. I’m rather surprised, to find some Basic programmers using OpenGL here too, since I would never have thought of using VB.NET for anything other than programming SQL databases - but I think it’s cool and shows that OpenGL can really be used everywhere.

For a simple and fast, optimized 3D math library on every system: sure, great, why not - but why should it designed by OpenGL ARB members - because they have the knowledge of some vector- and math routines?
Come on, you find those in any school book and assembly programming isn’t that hard, if you want to optimize (and a simple C++ (or yes, Pascal) version should compile nearly everywhere).
You could start an Open Source project in this direction, if you don’t like existing libraries. Surely there would be people to contribute optimizations for special cases, since it would not be that much work - but I just don’t see the need that the developers of the OpenGL API should do this.

Geometry.pas was just an example for an Open Source library with vector and matrix operations. It’s Open Source, free, portable and there exist versions specialized on certain CPU extensions.

Yes, I know. There is a glscene project wich seems to be very advanched.


Sadly many C++ programmers think, that Pascal (Delphi, FreePascal, GNU Pascal, …) is somehow limited compared to C++ - probably because they didn’t try it really and see that it’s nearly 100% identical, with the exception that Pascal is more designed like natural English (even non-programmers can understand easy algorithms written in Pascal).

I learned first basic (commodore+4!!!) and assembly (for c+4), then pascal and assembly and only after that c and c++.
Pascal is not limited to c++ (there are only very-very few cases), but the c and c++ is the defacto standard.
I don’t like to use more than one language for a project (even if MS says that’s good), that’s why I wrote what I wrote.


I wouldn’t say that Pascal is better than C++, but it surely isn’t worse - it’s another flavour of the same thing (at least if writing OpenGL and/or GUI applications), that’s all.
BTW Basic has it’s right of existence too. I’m rather surprised, to find some Basic programmers using OpenGL here too, since I would never have thought of using VB.NET for anything other than programming SQL databases - but I think it’s cool and shows that OpenGL can really be used everywhere.

Most of the imperative languages are going to have the same functionality.


For a simple and fast, optimized 3D math library on every system: sure, great, why not - but why should it designed by OpenGL ARB members - because they have the knowledge of some vector- and math routines?
Come on, you find those in any school book and assembly programming isn’t that hard, if you want to optimize (and a simple C++ (or yes, Pascal) version should compile nearly everywhere).

I’m very good at math, I have no problem with it at all.
But please read once my posts from the begining to the end: if the OpenGL supports some basic math (just the most recent used), then your code will be optimized for ALL processors (in the future) with a simple recompile.
If your code is portable enough then it will work on any processor and can use it’s vector specific instructions through the OpenGL.
For example: I don’t have Athlon64. Could you give me one?


You could start an Open Source project in this direction, if you don’t like existing libraries. Surely there would be people to contribute optimizations for special cases, since it would not be that much work - but I just don’t see the need that the developers of the OpenGL API should do this.

It seems that there were quite a lot such projects, but all of them are dead.
Do you know why?
They were unable to cover all type of cpu.
For example they write for Pentium3, but not for Athlon 3dnow. Most of as have only one type PC…
Intel comes out with Pentium4-5-6, you buy it to refresh you project? I think no.
But these mega corporations (Intel, AMD, IBM, NVIDIA, ATI) have the oportunity to do such things. And they do, but we can not access the results.

[This message has been edited by Csiki (edited 01-03-2004).]

@programming languages
well, you did sound a bit cynical about Pascal and Basic in the previous post, probably I misunderstood you

@math algorithms
I didn’t want to question your math skills - just wanted to point out, that there is nothing special about a little trigonometry

@processors
if you really wanted to, you could download documentations and even emulators for nearly every processor OpenGL is used on (you don’t need a physical Itanium or Athlon64 to program one). You don’t even need to program assembly to get your optimized routines - use the special compilers provided by Intel and AMD (and probably many other processor platforms OpenGL will run on) - and there your processor specific optimized code goes with plain C++ source…

I’d rather think the lot of simple (and discontinued) math libraries exists, because it’s quite easy to write one, and when you’ve included everything you wanted, or the greater project they are normally contained in has come to an end, you are finished - no need for further development (rather than because there are no optimized versions for various processors ).
If there were a standardized library - fine and really useful, but I really don’t see the OpenGL ARB responsible for something like that (even if they would surely do a nice job with it).

@math library
Yes, it’s easy to write a feature reach math library.
It might happen than some day compilers automatically use sse (not just Intel’s)…

@discontinued libraries
No. Most of them have written that the aim is to have an optimized library.
And most of them have written also that they won’t continue because they don’t have time, enough developer etc, but the library works…
It’s very hard to keep uptodate such a library.

@ARB’s response:
We have OpenGL, OpenAL, there will be OpenML.
OpenMath3D doesn’t seem to be a big project if you see the others…
There is a BLAS math library, but it’s not exactly what a graphics programmer want (it’s for sciencific use)…

[This message has been edited by Csiki (edited 01-03-2004).]

P.S. maybe some future extension will allow to render (IE) transformed vertices and normal vectors into a buffer object, to be used (IE) for a texture fetch in a fragment program.
Such a thing could really speed some things up.

However, maybe you should place your request for basic math routines in the OpenGL 2.0 forum (if OpenGL 2.0 ever comes), because I agree that the “advertisement effect” is obviously there (if Direct3D has it, and so on…) and you’re right, it wouldn’t be much work to implement it (even if in reality using it would make most code just marginally faster - if at all, since on a new platform you have to recompile your source anyway - and hopefully your compiler isn’t too bad in it’s job).
However, an extension such as GL_ARB_CLIENT_MATH doesn’t sound too great for me.

@csiki:
computer-system == intel x86, right ?
platform == win32, right ?
the term “independency” means in my translation “that something is unable without something other, because it depends on”, right ?

ok, then:
you told me, that you “lose platform independency if you write optimized code” - that was your statement.

if you write your optimized code, you are NOT loosing platform-independency, but computersystem-independency - because your optimized SSE code will only run on Pentium-processors, and your 3Dnow code will only run on AMD processors - this is what the term “computersystem-(in)dependency” says !
and on such computersystem you can run your platform.
so, tell me, what do you have you lost in this case ?
you have lost actually your computersystem-independency, because - open your ears now - it’s your fault when you are using MSVC as compiler - use another compiler, DJGPP for example; this will run on other platforms and you can access SSE/3DNow - so, you are mainly loosing computersystem-independency - or not ? because, compiling your code with another compiler “frees” the optimized code from beeing only executeable on windows, as it would be the case if you have compiled with MSVC !

>>Please don’t write me more that the
>>drivers doesn’t contain such optimized
>>math code except you are a driver
>>programmer.
please don’t you tell me anything about some terms which’s content i defninitivly now, until you are a linguistic professor who knows it surely better.
apart from that: you are on a discussion boards, this means that each person is bringing in her/his opinion. if you don’t know it for sure for yourself, don’t judge other peoples by hearing their opinion IF you asked them.

>>It might happen than some day compilers
>>automatically use sse
AHA !!! you didn’t say that before - you talked of writing your own optimized code - and not of a compiler which automatically utilizes the specific instructions - this is a completely other definition, in this manner !!!

>>We have OpenGL, OpenAL, there will be
>>OpenML.
it’s already there - its called Open media library, and has nothing to do with math - but, i’m sure you knewed that before me

[This message has been edited by DJSnow (edited 01-04-2004).]

Pedestrian polymorphism:

//header:
//these are implemented in a separate assembly module
extern "C"
{
  void __stdcall x86_execute_vertex_op_chain(ubyte** target,
    const Vertex** sources,
    uint vertex_count,
    const VertexOpChain* op_chain);

  void __stdcall AMD_execute_vertex_op_chain(ubyte** target,
    const Vertex** sources,
    uint vertex_count,
    const VertexOpChain* op_chain);

  void __stdcall SSE_execute_vertex_op_chain(ubyte** target,
    const Vertex** sources,
    uint vertex_count,
    const VertexOpChain* op_chain);
}
//global:
void (__stdcall* GeometryPipe::convert_verts)(ubyte** target,
  const Vertex** src,
  uint count,
  const VertexOpChain*)=plain_C_execute_vertex_op_chain;

//init code:
if (cpu.got_3dnow())
{
  convert_verts=AMD_execute_vertex_op_chain;
}
else
if (cpu.got_SSE()&&(config.allow_sse))
{
  convert_verts=SSE_execute_vertex_op_chain;
}
else
{
  convert_verts=x86_execute_vertex_op_chain;
}

The above assumes that the target is an x86. Well, you can easily extend that to other architectures, too, if you have an assembler for the platform. Otherwise just use the plain_C_fallback implementation.

And most important of all, use NASM for every single piece of x86 assembly code you’re ever going to write. NASM can produce object files linkable with all compilers known to man.

Originally posted by DJSnow:
[b]@csiki:
computer-system == intel x86, right ?
platform == win32, right ?
the term “independency” means in my translation “that something is unable without something other, because it depends on”, right ?

ok, then:
you told me, that you “lose platform independency if you write optimized code” - that was your statement.

if you write your optimized code, you are NOT loosing platform-independency, but computersystem-independency - because your optimized SSE code will only run on Pentium-processors, and your 3Dnow code will only run on AMD processors - this is what the term “computersystem-(in)dependency” says !
and on such computersystem you can run your platform.
so, tell me, what do you have you lost in this case ?
you have lost actually your computersystem-independency, because - open your ears now - it’s your fault when you are using MSVC as compiler - use another compiler, DJGPP for example; this will run on other platforms and you can access SSE/3DNow - so, you are mainly loosing computersystem-independency - or not ? because, compiling your code with another compiler “frees” the optimized code from beeing only executeable on windows, as it would be the case if you have compiled with MSVC !

>>Please don’t write me more that the
>>drivers doesn’t contain such optimized
>>math code except you are a driver
>>programmer.
please don’t you tell me anything about some terms which’s content i defninitivly now, until you are a linguistic professor who knows it surely better.
apart from that: you are on a discussion boards, this means that each person is bringing in her/his opinion. if you don’t know it for sure for yourself, don’t judge other peoples by hearing their opinion IF you asked them.

>>It might happen than some day compilers
>>automatically use sse
AHA !!! you didn’t say that before - you talked of writing your own optimized code - and not of a compiler which automatically utilizes the specific instructions - this is a completely other definition, in this manner !!!

>>We have OpenGL, OpenAL, there will be
>>OpenML.
it’s already there - its called Open media library, and has nothing to do with math - but, i’m sure you knewed that before me

[This message has been edited by DJSnow (edited 01-04-2004).][/b]

Platform = x86 + windows
If you change cpu or os, you change platform.
I write program for x86 now, but why should we consider that this will true tomorrow.
I don’t joke when I write PowerPC, Athlon64 etc. not x86 is the world.

If you use devcpp under win32 (gcc 3.2 etc.) then you unable to use the code analyzers.
My big problem specifially: if I use gcc then the AMD CodeAnalyst unable to load link informations…
If you solve this problem, then you are right, I won’t use MSVC.