Retrieving computed data

Slider · July 21, 2003, 1:52am

I’m trying to do some math calculations using OpenGL. The only thing I managed to use in order to do my calculations is the ModelView matrix multiplications, for example:
To do: matrix1 * matrix2 = result
I do:
glLoadMatrix(matrix1)
glMultMatrix(matrix2)
glGetFloatv(GL_MODELVIEW_MATRIX, result)

This is the only way I’ve managed to get calculated results back using the OpenGL interface functions.

Are there any other ways to get results back using OpenGL?
Can I get the results of a transformation done on some vertices I define? (meaning - if ‘v’ is one of the vertices and ‘T’ is the transformation matrix I define, can I get the result of ‘T*v’?)

MichaelK · July 21, 2003, 2:22am

Yeah, this is great:

Math Package Critical ERROR: Cant transform vector cause your drivers are not up-to-date. Download the newest drivers from your card manufacturers site.
Error details: GL_ARB_fragment_program is not supported.

But seriously, better use SSE or sth like that. It is trivial to implement and runs damn fast.

Slider · July 21, 2003, 3:04am

Thanx for the quick reply, but I’ve compared the calculations between SSE and OpenGL and it seems that OpenGL is faster.
In addition, using OpenGL gives me the advantage of doing the calculations on the graphics card CPU instead of the main CPU.

Anyway, what’s the error you’ve included in your reply???

MichaelK · July 21, 2003, 3:17am

I agree. You could eventually pack your vertices in some buffers or even RGB textures, but I think you will lost precision. Additionaly full transformations by a matrix can be done in ARB_fragment_program only, I think. You colud the ReadPixels to retrieve the results.

Anyway, if you tested it, could you show the benchmarking results? How many SSE instructions do you use to do transform?

Slider · July 21, 2003, 3:51am

I’m not familiar with the ARB_fragment_program. Can you show me a small example of how to use it?
Regarding the precision issue, I can use doubles instead of floats in order to increase it, can’t I?

I don’t have the exact benchmarking results of the SSE test at this time since it was done by another person, but if you’re still interested in it, I think I can get it in a day or two.
Anyway, from the testings I’ve done on OpenGL, the best results I got were 1 million 4x4 matrix multiplications in 220 milliseconds. This was on a machine with a standard NVidia graphics card. The weird thing was that when I ran the test on a G-Force4 graphics card, I saw no improvement in the results. Do you know any other card that may improve my results?

MichaelK · July 21, 2003, 3:59am

I am not sure if I remember correctly, but i was able to do 30M matrix muls in one second on 1.2GHz.

Anyway, I will check it tonight, as I am not at home now and dont have access to my bench.

Pozdrawiam

MichaelK · July 21, 2003, 4:02am

Of course Im saying about SSE performance.

zeckensack · July 21, 2003, 4:30am

Originally posted by Slider:
Regarding the precision issue, I can use doubles instead of floats in order to increase it, can’t I?
No. You’re bound to the precision limitations inherent to the card. Eg glVertex3dv vs glVertex3fv is just a way of selecting your input format. It’ll be converted to a format natively supported by the hardware.

Anyway, from the testings I’ve done on OpenGL, the best results I got were 1 million 4x4 matrix multiplications in 220 milliseconds.
AMD has example code that performs a vector*matrix mult in 18 cycles. SSE/SSE2 offers similar per clock throughput. Do the math. You’re on the losing end.

This was on a machine with a standard NVidia graphics card. The weird thing was that when I ran the test on a G-Force4 graphics card, I saw no improvement in the results. Do you know any other card that may improve my results?[/b]
That’s because the GL driver does the matrix multiplication on the host CPU. The result is then uploaded to the hardware. You’re gaining nothing.

If you want hw processing, you need to work with vertex and fragment shaders. The primary bottleneck will then be the readback speed, but you’ll have to solve framebuffer precision issues first.

Jan · July 21, 2003, 10:07am

Isn´t the “SuperBuffers” (or ÜberBuffers) extension supposed to offer functionality to render to an array? Wouldn´t this make such a thing possible?

Jan.

MichaelK · July 22, 2003, 2:35am

OK. Got the bench results. AMD Duron 1.2GHz.
matrixmatrix - 7,000,000 calls per second, 21 SSE instructions and some x86 instructions for loop and pointer loads.
matrixvector - 30,000,000 calls per second

You say you got 5,000,000 muls per second. On what processor did you run that bench.

Anyway, as I pointed at the beggining, using OGL to do that sort of stuff is pointless.

Pozdrawiam

Slider · July 22, 2003, 9:26pm

It sounds like SSE is really better…
Thanx for all the help & information.
Btw, I’ve tested OpenGL on an Intel 1.8GHz proccessor.

Slider · July 23, 2003, 5:09am

It seemed I got a bit confused…
The testings that the other guy did were not using SSE but something called SPL by Intel.
So what is SSE? and how do I get it?

jcabeleira · July 23, 2003, 5:49am

Originally posted by Slider:

So what is SSE? and how do I get it?

if you want to get an ideia of what SSE is take a look at http://firingsquad.gamers.com/hardware/pentium3600/page2.asp
and http://www.codeproject.com/cpp/sseintro.asp

[This message has been edited by jcabeleira (edited 07-23-2003).]

orhunbirsoy · July 23, 2003, 5:51am

Check out this article

“Introduction to SSE Programming” http://www.codeproject.com/cpp/sseintro.asp

jcabeleira · July 23, 2003, 6:05am

another site introducing SSE in detail: http://x86.ddj.com/articles/sse_pt1/

DJSnow · July 23, 2003, 9:54am

and, of course, this links:
http://www.cortstratton.org/content/tutorials/OptimizingForSSE.php
http://cedar.intel.com/media/pdf/p4/getting_started.pdf