View Full Version : OpenGL for computer vision

Hi!

I'm doing a small project on augmented reallity and want to use openGL to render an object that is to be inserted in a moviesequens.

Now the question:

How are the matrix for the camera defined in openGL?

How do I convert the parameters in a cameramatrix (as defined in computer vision) to openGL format?

Hi. I, too, am a vision research kinda guy.

What do you mean convert it into a computer vision matrix? There _is_ no "standard" computer vision matrix. For example, you might see proj matricies in vision as the identity, with the understanding points are projected into focal multiplied coordinates.

There are a number of parameters that vision knows about that aren't modeled in opengl, at least not explicitly. The optical centre in vision is just the axis alignment in opengl. Radial distortion isn't modeled, tho'.

if you want some more thoughts on this, email me? john@cs.adelaide.edu.au

cheers,

John

..... The correspondent relationship of the transformation matrix .....

............ between the computer graphics & computer vision ...........

1.Coordinate Transformation in Computer Vision

Without considering the lens distortion,a 3*4 projection, for a

pinhole camera model, an 3*4 projection matrix M always appears in

the computer vision literature like the following one:

...[u]...[Xw] [f/(dx/dy) 0 u0 0] [......t0]...[Xw]

Ze*[v]=M*[Yw]=[....0.....f v0 0]*[..R...t1] * [Yw]

...[1]...[Zw] [....0.....0 .1 0] [......t2]...[Zw]

.........[ 1] .................. [0 0 0 1 ]...[ 1]

......[f/(dx/dy) 0 u0 0] [Xe]

.....=[....0.... f v0 0]*[Ye] ............................... (1)

......[....0.... 0 .1 0] [Ze]

.........................[ 1]

........... t

[Xw,Yw,Zw,1] : world coordinate

........... t

[Xe,Ye,Ze,1] : eye coordinate

R : 3*3 rotation matrix

t0,t1,t2 : translation vector

(u0,v0) : optical center in the image

f : effective focus for the pinhole camera model

dx,dy : the pixel size in the x and y direction

The extrinsic matrix just corresponds the MODELVIEW matrix in OpenGL.

For a clean discussion,later we'd like to leave it out and start from

an eye coordinate.

From equation 1, we have:

/Ze*u=f*Xe/(dx/dy)+u0*Ze . ---> .. / u=f/(dx/dy)*(Xe/Ze)+u0

\Ze*v=f*Ye....... +v0*Ze ......... \ v=f....... *(Ye/Ze)+v0 (2)

2.Coordinate Transformation in Computer Graphics

In this section we will consider how the transformation is performed

in the most popular Graphics library OpenGL. As mentioned above,we start

the transformation from eye coordinate.

[Xc] ... [Xe]

[Yc] = P*[Ye] ............................................. (3)

[Zc] ... [Ze]

[wc] ... [ 1]

u = (Xc/Wc+1)*width /2 + x0

v = (Yc/Wc+1)*height/2 + y0 .............................. (4)

Equation 3 formulates the perspective projection, without loss of

generality,we specify the W component of the eye coordinate as 1.

The viewport transform is expressed in equation 4, in which

(u,v) is the screen coordinate. (x0,y0,width,height) are the parameters

provided in the glViewport function call.

If we build our projection matrix using gluPerspective,which is very

convenient and most frequently used, we can write out the perspective

matrix P in OpenGL like this:

...... ctg(fovy/2)

.... [ ----------- 0 ....... 0 .......... 0 ....... ]

........ aspect

.... [ .... 0 ctg(fovy/2) . 0 .......... 0 ....... ]

P = .................... zFar+zNear 2*zFar*zNear .......... (5)

.... [ .... 0 .... 0 .. ---------- ------------- ]

....................... zNear-zFar zNear-zFar

.... [ .... 0 .... 0 ....... -1 ......... 0 ....... ]

fovy,aspect,zFar,zNear are the parameters of gluPerspective

aspect is at most time specified as width/height, but it need not

to be like that.

Now,with all the equations ready, by some substitutions we can reformulate

(u,v) coordinate:

/u=(ctg(fovy/2)/aspect)*(-Xe/Ze)*(width/2)+width/2+x0 ...... (6)

\v=ctg(fovy/2) ...... *(-Ye/Ze)*(height/2)+height/2+y0

3.Wrap them up

With the analysis, things get more and more clear. What remains to be

done is the comparision of equation 2 and 6.

You maybe notice that there is a negative sign in the equation 6. That's

not surprising because OpenGL use a left-handed screen coordinate while

the eye coordinate system is right-handed.

Neglecting the negative sign,we can give out the correspondence:

/u0= width/2+x0

\v0=height/2+y0 ............................................ (7)

/f/(dx/dy)=(ctg(fovy/2)*width)/(2*aspect)

\f ...... = ctg(fovy/2)*height/2 ........................... (8)

For some specical cases, we can make some simplication about the correspondent

relationship.

a.x0=y0=0

this is also the default behavior of OpenGL viewport transformation,

Equation 7 will be simplied as:

/u0= width/2

\v0= height/2

You probably feels familar with this case because in Computer Vision(CV) we

often presume the optical center of the camera is at the center of the image.

b.dx/dy=1

This means the image to be analyzed in CV is isotropic, the pixel sizes in

horizontal and vertical direction are the same.

Then equation 8.1 comes into f=(ctg(fovy/2)*width)/(2*aspect), compare it

with 8.2, we can derive aspect=width/height. Oh, this is most-frequently-used

setting when using the gluPerspective!

*(because it seems that the posting system doesn't work well with the multiple

blank spaces, I have to tweak the equations using dots. Hope it looks better

:-( )

[This message has been edited by inet (edited 08-23-2000).]

[This message has been edited by inet (edited 08-23-2000).]

I'm sorry about the format of the last post from me. But I really don't know how I can make them better.(Though I have a version with decent feel-of-look in my own computer).

If you're interested in the content and confused with the format, you can just email me.(ming@mpi-sb.mpg.de)

Any suggestions and comments are welcome about this essay.

http://www.opengl.org/discussion_boards/ubb/biggrin.gif Nice!

This is exactly what I was looking for!

Thank yoy very much!

/Per Åstrand

[This message has been edited by pas (edited 08-23-2000).]

Kilam Malik

08-23-2000, 06:27 AM

You can use UBB - codes to format your source can't type it in here, go to edit/delete message:

class Acme

{

public:

Acme(int _a) { a = _a; }

~Acme();

protected:

int a;

};

y-e-r... you just described the classic pinhole camera model. It's not the ONLY model used in computer vision, you know.

check out:

"An Efficient and Accurate Camera Calibration Technique for 3D Machine

Vision", Roger Y. Tsai, Proceedings of IEEE Conference on Computer Vision

and Pattern Recognition, Miami Beach, FL, 1986, pages 364-374.

and

"A versatile Camera Calibration Technique for High-Accuracy 3D Machine

Vision Metrology Using Off-the-Shelf TV Cameras and Lenses", Roger Y. Tsai,

IEEE Journal of Robotics and Automation, Vol. RA-3, No. 4, August 1987,

pages 323-344.

straight from tsai's calibration distribution:

1 - What is Tsai's camera model?

Tsai's camera model is based on the pin-hole model of 3D-2D perspective

projection with 1st order radial lens distortion. The model has 11

parameters: five internal (also called intrinsic or interior) parameters:

f - effective focal length of the pin-hole camera,

kappa1 - 1st order radial lens distortion coefficient,

Cx, Cy - coordinates of center of radial lens distortion -and-

the piercing point of the camera coordinate frame's

Z axis with the camera's sensor plane,

sx - scale factor to account for any uncertainty in the

framegrabber's resampling of the horizontal scanline.

and six external (also called extrinsic or exterior) parameters:

Rx, Ry, Rz - rotation angles for the transform between the

world and camera coordinate frames,

Tx, Ty, Tz - translational components for the transform between the

world and camera coordinate frames.

In addition to the 11 variable camera parameters Tsai's model has six fixed

intrinsic camera constants:

Ncx - number of sensor elements in camera's x direction (in sels),

Nfx - number of pixels in frame grabber's x direction (in pixels),

dx - X dimension of camera's sensor element (in mm/sel),

dy - Y dimension of camera's sensor element (in mm/sel),

dpx - effective X dimension of pixel in frame grabber (in mm/pixel), and

dpy - effective Y dimension of pixel in frame grabber (in mm/pixel).

and another thing: some computer vision research uses the affine camera model.

cheers,

John

[This message has been edited by john (edited 08-23-2000).]

Thanks for your comments.

Anyway, you "can" use the more complex

camera model, but the method of derivation is

the same.

The Tsai's camera model in essence considers

the 1st order distortion of the lens.

As for the affine camera, it only adds one

skewness parameter.

Giancarlo

10-19-2009, 08:15 AM

Guys,

sorry to disturb you but I'm into an augmented reality project applied to medical stuff. Now,

In all the litterature I've found I can always see the same model (pinhole) from where they derive all the intrinsic and extrinsic paramenters. They derive the projection matrix and moving the object in the scene the overlay the virtual object. but,

what if I move the camera!??! I've never understand if I can or not moving the camera as in all the works I found the camera is steady.

thanks.

giancarlo

ZbuffeR

10-19-2009, 08:19 AM

Intrinsic camera parameters do not change when moving camera. They belong to the "inside" of the camera.

Extrinsic parameters change when moving camera. They belong to the "ouside" of the camera, ie world position and world rotation.

So when moving camera, you have to update extrinsic params.

Powered by vBulletin® Version 4.2.3 Copyright © 2018 vBulletin Solutions, Inc. All rights reserved.