hadi_hadizadeh

07-02-2006, 12:32 PM

Hi,Is it possible to multiply 2 big matrixes using OpenGL as fast as possible? In the other word, I want to do so by using GPU. Thanks

View Full Version : Multiply 2 big matrixes using OpenGL

hadi_hadizadeh

07-02-2006, 12:32 PM

Hi,Is it possible to multiply 2 big matrixes using OpenGL as fast as possible? In the other word, I want to do so by using GPU. Thanks

ZbuffeR

07-02-2006, 12:42 PM

It is not the intended use, especially if you want precise results.

Anyway, here is a good site about general "General-Purpose computation on GPUs" :

http://www.gpgpu.org/

http://www.gpgpu.org/wiki/FAQ

Anyway, here is a good site about general "General-Purpose computation on GPUs" :

http://www.gpgpu.org/

http://www.gpgpu.org/wiki/FAQ

hadi_hadizadeh

07-02-2006, 01:02 PM

Why do you think it is not intended use? In fact, I want to multiply 2 double matrixes faster than writting a simple and normal code. Someone says that OpenGL can be used as an interface to GPU. Is it true?

ZbuffeR

07-02-2006, 01:16 PM

Of course it is possible, and for GPGPU computations OpenGL is a good API to use, see the FAQ link I posted for more details.

What I mean is that graphic accelerators are not intented to be used as high performance scientific clusters, basically you will have to trade off precision and flexibility to get cheap processing power.

EDIT : ie. it is already hard to get float precision, for double it will be even harder.

What I mean is that graphic accelerators are not intented to be used as high performance scientific clusters, basically you will have to trade off precision and flexibility to get cheap processing power.

EDIT : ie. it is already hard to get float precision, for double it will be even harder.

hadi_hadizadeh

07-02-2006, 01:32 PM

OK, By using OpenGL as an Interface to GPGPU programming, I think there is no need to tackle with GPGPU directly.I mean that we write our codes in OpenGL and this is the OpenGL who runs the codes in GPU. Is it true? If so, could you please give me an example to show how can I multiply 2 matrixes in OpenGL? Thank you very much!

ZbuffeR

07-02-2006, 01:48 PM

Read The Links.

Jan

07-02-2006, 03:48 PM

1. RTFM

2. What is your exact definition of "big"? Don't tell me it's 4x4.

2. What is your exact definition of "big"? Don't tell me it's 4x4.

Mars_999

07-02-2006, 06:39 PM

Originally posted by Jan:

1. RTFM

2. What is your exact definition of "big"? Don't tell me it's 4x4. 1.OUCH

:)

1. RTFM

2. What is your exact definition of "big"? Don't tell me it's 4x4. 1.OUCH

:)

Flavious

07-02-2006, 08:52 PM

The links that Zbuffer provided are quite good.

Anyway, to my knowledge there is no native support for matrix multiplication beyond 4x4. However, matrix multiplication is inherently vectorizable, meaning it is possible to compute the product of matrices as the product of sub-matrices or vectors.

For instance, a matrix-vector product can be vectorized as 4 vector scales and 4 vector adds. A matrix-matrix multiply can likewise be coded as 4 matrix-vector multiplies. An NxM-MxQ multiply can be perhaps optimally coded as combination of 4x4 sub-matrices. For starters, I would try to decompose the input matrix into a blocked matrix consisting of 4x4 sub-matrix entries. For example, given 2 8x8 matrices P and Q,

PQ = ( A B )(E F) = (AE+BG AF+BH)

( C D )(G H) (CE+DG CF+DH)

where A, B, C, D, E, F, G and H are each 4x4 sub-matrices of P and Q.If your matrix dimensions are not already multiples of 4, consider padding them with zeros as necessary.

At any rate, the trick to computational speed with GPUs is vectorizing your operations. Bear in mind this speed has to be weighed against time needed to transfer the operations to and from the GPU, assuming that you need the results in you app.

Alternatively, SSE or 3DNOW! intrinsics could be leveraged in place of GPU processing, scenario permitting. You may find an introduction to these intrinsics helpful in understanding the general concepts in SIMD architectures.

I hope this helps.

Anyway, to my knowledge there is no native support for matrix multiplication beyond 4x4. However, matrix multiplication is inherently vectorizable, meaning it is possible to compute the product of matrices as the product of sub-matrices or vectors.

For instance, a matrix-vector product can be vectorized as 4 vector scales and 4 vector adds. A matrix-matrix multiply can likewise be coded as 4 matrix-vector multiplies. An NxM-MxQ multiply can be perhaps optimally coded as combination of 4x4 sub-matrices. For starters, I would try to decompose the input matrix into a blocked matrix consisting of 4x4 sub-matrix entries. For example, given 2 8x8 matrices P and Q,

PQ = ( A B )(E F) = (AE+BG AF+BH)

( C D )(G H) (CE+DG CF+DH)

where A, B, C, D, E, F, G and H are each 4x4 sub-matrices of P and Q.If your matrix dimensions are not already multiples of 4, consider padding them with zeros as necessary.

At any rate, the trick to computational speed with GPUs is vectorizing your operations. Bear in mind this speed has to be weighed against time needed to transfer the operations to and from the GPU, assuming that you need the results in you app.

Alternatively, SSE or 3DNOW! intrinsics could be leveraged in place of GPU processing, scenario permitting. You may find an introduction to these intrinsics helpful in understanding the general concepts in SIMD architectures.

I hope this helps.

hadi_hadizadeh

07-03-2006, 05:40 AM

I want to multiply a 50000x9 matrix by a 9x9 matrix. Now, is it possible to do it in GPU? If so, is it possible to do it by using OpenGL?

hadi_hadizadeh

07-03-2006, 05:47 AM

Also, what is the RTFM?! or OUCH?!

Zengar

07-03-2006, 07:06 AM

Look here: http://www.gaarde.org/acronyms/

OUCH means just ouch ;-)

It is possible to do it onGPU (potentially), why not? But it would be very difficult to get good precision, because GPUs usually operate on less then 32-bit precision, so you will need some special methods (no idea what, I always skipped my numerics lectures :-o ) if you want double precision.

Besides, you've got you link, didn't you? I am not shure that anyone here ever did something similar - too much effort. In the end, it is up to you.

I think that it is not worth the effort. You'll spend weeks trying to get good precision and in teh end your software will propably be overly compicated, unstable and extremely hard to debug. I doub't that you will get significant performance increase this way. Stick to SSE on CPU, this is my advice.

BTW, why do you need such huge matrices oO? If not a secret...

OUCH means just ouch ;-)

It is possible to do it onGPU (potentially), why not? But it would be very difficult to get good precision, because GPUs usually operate on less then 32-bit precision, so you will need some special methods (no idea what, I always skipped my numerics lectures :-o ) if you want double precision.

Besides, you've got you link, didn't you? I am not shure that anyone here ever did something similar - too much effort. In the end, it is up to you.

I think that it is not worth the effort. You'll spend weeks trying to get good precision and in teh end your software will propably be overly compicated, unstable and extremely hard to debug. I doub't that you will get significant performance increase this way. Stick to SSE on CPU, this is my advice.

BTW, why do you need such huge matrices oO? If not a secret...

UrbanLegend

07-03-2006, 09:08 AM

RTFM = Read the Flipping Manual ( the F can be replaced with other words :) )

I don't think you will get any benefit out of using the GPU here

SSE is the way to go IMO, here at my current company we have our own SSE implementations that are significantly faster then C++ or Intels fast math library

I don't think you will get any benefit out of using the GPU here

SSE is the way to go IMO, here at my current company we have our own SSE implementations that are significantly faster then C++ or Intels fast math library

hadi_hadizadeh

07-03-2006, 09:26 AM

Many thanks for your replys. I am working on a Real-Time Image Processing Task, and I need a fast matrix multiplication procedure because the bottleneck of my algorithm is it. My Programming Language is Delphi. Do you have any idea or know any way to use SSE in Delphi?

Zengar

07-03-2006, 02:08 PM

From Delphi 2005, SSE instructions are supported by the assembler (they are there in 2006 for shure). You can also use freepascal. However, all SSE code has to be written per hand then. If I am not mistaken, AMD has a free math library utilising 3dnow! and SSE instructions, look for it on their developer site. Unfortunately, I know no tutorials on SSE, so I can't help you there. If you are new to assembly, you will have rough time though. It is still better then using GPU, IMHO. Just google or ask the questions on Intels/AMD developers forums, you will get some good advice I guess.

And, BTW, don't expect good performance. Even with very good optimisations, this will probably take hours to compute. Your best option would be using some specialised scientific hardware, or a Cell-like CPU :-)

And, BTW, don't expect good performance. Even with very good optimisations, this will probably take hours to compute. Your best option would be using some specialised scientific hardware, or a Cell-like CPU :-)

Overmind

07-03-2006, 02:49 PM

RTFM = Read the Flipping ManualI prefer "Read the Fine Manual" ;)

Even with very good optimisations, this will probably take hours to compute.Huh? 50000 * 9 * 9 == 4050000.

4M multiplications should run in under a second, and that's without even the most basic optimisations.

As for gaining real time performance, I think the best bet is to invest in parallel execution (multiple CPUs or even a cluster). The standard matrix multiplication algorithm is inherently parallel, every component of the result is computed seperately.

Sure, SSE will help a lot, too, but I'm not sure if it will be enough...

Even with very good optimisations, this will probably take hours to compute.Huh? 50000 * 9 * 9 == 4050000.

4M multiplications should run in under a second, and that's without even the most basic optimisations.

As for gaining real time performance, I think the best bet is to invest in parallel execution (multiple CPUs or even a cluster). The standard matrix multiplication algorithm is inherently parallel, every component of the result is computed seperately.

Sure, SSE will help a lot, too, but I'm not sure if it will be enough...

dom_unido

07-03-2006, 03:05 PM

I answered on GPGPU.org in case someone is interested.

http://www.gpgpu.org/forums/viewtopic.php?p=12283#12283

The AMD/Intel math libs are really really slow compared to Atlas or GotoBLAS btw.

http://www.gpgpu.org/forums/viewtopic.php?p=12283#12283

The AMD/Intel math libs are really really slow compared to Atlas or GotoBLAS btw.

Bob

07-03-2006, 05:54 PM

Originally posted by Overmind:

Even with very good optimisations, this will probably take hours to compute.Huh? 50000 * 9 * 9 == 4050000.

4M multiplications should run in under a second, and that's without even the most basic optimisations.

Think someone is really underestimating the power of todays computers. Timing a matrix multiplication of that size in MATLAB tells me it's done in about 10 milliseconds. Got an AMD64 3500+, so it's fairly modern, but still nowhere near "hours". :cool:

Even with very good optimisations, this will probably take hours to compute.Huh? 50000 * 9 * 9 == 4050000.

4M multiplications should run in under a second, and that's without even the most basic optimisations.

Think someone is really underestimating the power of todays computers. Timing a matrix multiplication of that size in MATLAB tells me it's done in about 10 milliseconds. Got an AMD64 3500+, so it's fairly modern, but still nowhere near "hours". :cool:

Zengar

07-03-2006, 10:04 PM

Wow, I thought that the computational overhead would be much higher oO

I guess it is because I have non-explainable fear of large computations. Somehow I always think that the operations will go up in exponential manner *embarassed*

Well, nvm

I guess it is because I have non-explainable fear of large computations. Somehow I always think that the operations will go up in exponential manner *embarassed*

Well, nvm

hadi_hadizadeh

07-03-2006, 11:39 PM

Zengar said that Delphi 2005 support SSE internally. If it is true, then I think I can compile my codes in Delphi 2005 and so there is no need to tackle with the SSE instructions myself. But do you agree? In my current code , that matrix multiplication of that size takes about 100msec on my Pentium 4 (Celeron,2.8GHZ,512MB) and I am wondering how Matlab can do it in about 10mesec!! As you know, Matlab is very slow in regards to the native programming language codes since it is a utility! Do you suggest me AMD processors or Intel ones for this purpose?

hadi_hadizadeh

07-03-2006, 11:39 PM

Zengar said that Delphi 2005 support SSE internally. If it is true, then I think I can compile my codes in Delphi 2005 and so there is no need to tackle with the SSE instructions myself. But do you agree? In my current code , that matrix multiplication of that size takes about 100msec on my Pentium 4 (Celeron,2.8GHZ,512MB) and I am wondering how Matlab can do it in about 10mesec!! As you know, Matlab is very slow in regards to the native programming language codes since it is a utility! Do you suggest me AMD processors or Intel ones for this purpose?

Zengar

07-04-2006, 01:07 AM

No, I didn't say that :-/ I said that the assembler supports them, so you can use SSE instructions within asm ... end blocks. You still have to hand-write it...

Komat

07-04-2006, 01:41 AM

Originally posted by hadi_hadizadeh:

In my current code , that matrix multiplication of that size takes about 100msec on my Pentium 4 (Celeron,2.8GHZ,512MB) and I am wondering how Matlab can do it in about 10mesec!! Matlab matrix multiplication is likely heavily optimalized, it is probably using algorithms that take better use of cpu caches.

In my current code , that matrix multiplication of that size takes about 100msec on my Pentium 4 (Celeron,2.8GHZ,512MB) and I am wondering how Matlab can do it in about 10mesec!! Matlab matrix multiplication is likely heavily optimalized, it is probably using algorithms that take better use of cpu caches.

Bob

07-04-2006, 03:17 AM

Originally posted by hadi_hadizadeh:

As you know, Matlab is very slow in regards to the native programming language codes since it is a utility!I'm sorry, but you have no clue what you're taling about. I'm using MATLAB daily at work for reasearch, and I can assure you, you will have a very hard time beating MATLAB in anything numeric involving heavily vectorized code if you write proper MATLAB code. I have tried to beat it in C++ and you CAN do it, but the effort just isn't worth it as the development time in MATLAB is so much faster and safer.

You see, a matrix multiplication is executed in purely native code in MATLAB. It's not like it's implemented with three nested for loops in MATLABS own scripting language. For that reason, you can't really beat MATLAB in those areas, as MATLAB already performs as good as the computer can.

As you know, Matlab is very slow in regards to the native programming language codes since it is a utility!I'm sorry, but you have no clue what you're taling about. I'm using MATLAB daily at work for reasearch, and I can assure you, you will have a very hard time beating MATLAB in anything numeric involving heavily vectorized code if you write proper MATLAB code. I have tried to beat it in C++ and you CAN do it, but the effort just isn't worth it as the development time in MATLAB is so much faster and safer.

You see, a matrix multiplication is executed in purely native code in MATLAB. It's not like it's implemented with three nested for loops in MATLABS own scripting language. For that reason, you can't really beat MATLAB in those areas, as MATLAB already performs as good as the computer can.

Powered by vBulletin® Version 4.2.3 Copyright © 2018 vBulletin Solutions, Inc. All rights reserved.