Retrieving computed data

I’m trying to do some math calculations using OpenGL. The only thing I managed to use in order to do my calculations is the ModelView matrix multiplications, for example:
To do: matrix1 * matrix2 = result
I do:
glLoadMatrix(matrix1)
glMultMatrix(matrix2)
glGetFloatv(GL_MODELVIEW_MATRIX, result)

This is the only way I’ve managed to get calculated results back using the OpenGL interface functions.

Are there any other ways to get results back using OpenGL?
Can I get the results of a transformation done on some vertices I define? (meaning - if ‘v’ is one of the vertices and ‘T’ is the transformation matrix I define, can I get the result of ‘T*v’?)

Yeah, this is great:

Math Package Critical ERROR: Cant transform vector cause your drivers are not up-to-date. Download the newest drivers from your card manufacturers site.
Error details: GL_ARB_fragment_program is not supported.

But seriously, better use SSE or sth like that. It is trivial to implement and runs damn fast.

Thanx for the quick reply, but I’ve compared the calculations between SSE and OpenGL and it seems that OpenGL is faster.
In addition, using OpenGL gives me the advantage of doing the calculations on the graphics card CPU instead of the main CPU.

Anyway, what’s the error you’ve included in your reply???

I agree. You could eventually pack your vertices in some buffers or even RGB textures, but I think you will lost precision. Additionaly full transformations by a matrix can be done in ARB_fragment_program only, I think. You colud the ReadPixels to retrieve the results.

Anyway, if you tested it, could you show the benchmarking results? How many SSE instructions do you use to do transform?

I’m not familiar with the ARB_fragment_program. Can you show me a small example of how to use it?
Regarding the precision issue, I can use doubles instead of floats in order to increase it, can’t I?

I don’t have the exact benchmarking results of the SSE test at this time since it was done by another person, but if you’re still interested in it, I think I can get it in a day or two.
Anyway, from the testings I’ve done on OpenGL, the best results I got were 1 million 4x4 matrix multiplications in 220 milliseconds. This was on a machine with a standard NVidia graphics card. The weird thing was that when I ran the test on a G-Force4 graphics card, I saw no improvement in the results. Do you know any other card that may improve my results?

I am not sure if I remember correctly, but i was able to do 30M matrix muls in one second on 1.2GHz.

Anyway, I will check it tonight, as I am not at home now and dont have access to my bench.

Pozdrawiam

Of course Im saying about SSE performance.

Originally posted by Slider:
Regarding the precision issue, I can use doubles instead of floats in order to increase it, can’t I?
No. You’re bound to the precision limitations inherent to the card. Eg glVertex3dv vs glVertex3fv is just a way of selecting your input format. It’ll be converted to a format natively supported by the hardware.

Anyway, from the testings I’ve done on OpenGL, the best results I got were 1 million 4x4 matrix multiplications in 220 milliseconds.
AMD has example code that performs a vector*matrix mult in 18 cycles. SSE/SSE2 offers similar per clock throughput. Do the math. You’re on the losing end.

This was on a machine with a standard NVidia graphics card. The weird thing was that when I ran the test on a G-Force4 graphics card, I saw no improvement in the results. Do you know any other card that may improve my results?[/b]
That’s because the GL driver does the matrix multiplication on the host CPU. The result is then uploaded to the hardware. You’re gaining nothing.

If you want hw processing, you need to work with vertex and fragment shaders. The primary bottleneck will then be the readback speed, but you’ll have to solve framebuffer precision issues first.

Isn´t the “SuperBuffers” (or ÜberBuffers) extension supposed to offer functionality to render to an array? Wouldn´t this make such a thing possible?

Jan.

OK. Got the bench results. AMD Duron 1.2GHz.
matrixmatrix - 7,000,000 calls per second, 21 SSE instructions and some x86 instructions for loop and pointer loads.
matrix
vector - 30,000,000 calls per second

You say you got 5,000,000 muls per second. On what processor did you run that bench.

Anyway, as I pointed at the beggining, using OGL to do that sort of stuff is pointless.

Pozdrawiam

It sounds like SSE is really better…
Thanx for all the help & information.
Btw, I’ve tested OpenGL on an Intel 1.8GHz proccessor.

It seemed I got a bit confused…
The testings that the other guy did were not using SSE but something called SPL by Intel.
So what is SSE? and how do I get it?

Originally posted by Slider:

So what is SSE? and how do I get it?

if you want to get an ideia of what SSE is take a look at http://firingsquad.gamers.com/hardware/pentium3600/page2.asp
and http://www.codeproject.com/cpp/sseintro.asp

[This message has been edited by jcabeleira (edited 07-23-2003).]

Check out this article

“Introduction to SSE Programming” http://www.codeproject.com/cpp/sseintro.asp

another site introducing SSE in detail: http://x86.ddj.com/articles/sse_pt1/

and, of course, this links:
http://www.cortstratton.org/content/tutorials/OptimizingForSSE.php
http://cedar.intel.com/media/pdf/p4/getting_started.pdf