Adding Depth Data from RGB-D Sensor to the z-buffer of OpenGL for Occlusion in AR

Hi guys!
I want to achieve occlusion effects using an RGB-D sensor in my Augmented reality app just like this video:

The camera may be moving around a marker.

Can you please tell me what is the methodology to achieve this and what algorithm should I use?

I read that one solution is to somehow fill the depth buffer of openGL with the depth data from the sensor.
I still don’t get it though.
Right now I render a 2D textured quad with video stream from the sensor and virtual objects on top of a marker.

What exactly should I do?Any projects/code samples that might help?I can’t find any step by step tutorial on this one.

I was told to upload the depth data as a texture and bind it as the depth buffer for the render target.
This would require matching the near and far planes of the projection matrix with the min and max values of the depth sensor.

I just don’t get it though.If my depth buffer of opengl sees a depth value of 0.2 for example,how will it understand that it has to draw a pixel that belongs to the video stream and not a pixel that belongs to virtual object?

I was told to upload the depth data as a texture and bind it as the depth buffer for the render target.
This would require matching the near and far planes of the projection matrix with the min and max values of the depth sensor.

Yes, but you’re going to have to do that anyway. That’s the only way you can have rendered elements match with, and be obscured by, existing photographed elements.

BTW, when they said to do this, I think they were also assuming that you’d be uploading the “video stream” pixel data to a texture and also using that as your color render target. So you would have the video color and depth data as your framebuffer.

I just don’t get it though.If my depth buffer of opengl sees a depth value of 0.2 for example,how will it understand that it has to draw a pixel that belongs to the video stream and not a pixel that belongs to virtual object?

Because that’s how a depth buffer works.

When you draw a triangle, the depth values for each fragment are written to the depth buffer. When you draw a second triangle, the depth value at each appropriate pixel is read and compared with the depth value from the fragment. If the fragment’s depth is “behind” the pixel’s depth, then it is not rendered.

The only difference is what is generating the initial depth data. In your case, it’s data pulled from photographed elements. You just need to put that data into the depth buffer. And if your depth ranges match, then the photographed depths will correspond to the scene-rendered depths.

The rest is just regular depth buffer operations.

I see.So right now,I render a textured quad with my video stream using glOrtho.Afterwards I write the depth data from the sensor to the depth buffer
using :

glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
	glPixelZoom( 1, -1); //flip image vertically
	glRasterPos3f( 0, 0, -1.0 );
	glDrawPixels(640,480,GL_DEPTH_COMPONENT,GL_FLOAT,normalizedMappedDepthMat.ptr());
	glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE );

and then I use a perspective projection with znear=0.05 and zfar=10.0 for the virtual objects that will be rendered on top of the markers.

Here is what I don’t get.The depth data from the sensor is in millimeters(mm) which is the distance of a pixel from the depth sensor(mapped to color).How should I transform the millimeters to match the data of the depth buffer in relation with the zNear and zFar? Do I have to change zNear and zFar to the min and max values of the depth sensor that I get in each frame? Or the min and max values that the depth sensor can give generally?Also many times,depth sensor can’t see an area and returns a very large value or a very small one.So which should I use as min and max values?Also should the znear and zfar values change dynamically depending on the min-max values of depth sensor? Or not?

It depends upon the scale you’re using for modelling.

A point at the near plane (Z=-zNear) will have a depth of 0.0. A point at the far plane (Z=-zFar) will have a depth of 1.0. A point half-way between the two (Z=-(zNear+zFar)/2) will have a depth of 0.5.

The values you upload to the depth buffer will be in the range 0…1 (if you use an integer format, they’ll be divided by 2^N-1, where N is the number of bits in the depth format).

So you need to choose the near and far planes so that the depth values from the rendered scene use the same scale as those from the sensor.

[QUOTE=GClements;1266469]It depends upon the scale you’re using for modelling.

A point at the near plane (Z=-zNear) will have a depth of 0.0. A point at the far plane (Z=-zFar) will have a depth of 1.0. A point half-way between the two (Z=-(zNear+zFar)/2) will have a depth of 0.5.

The values you upload to the depth buffer will be in the range 0…1 (if you use an integer format, they’ll be divided by 2^N-1, where N is the number of bits in the depth format).

So you need to choose the near and far planes so that the depth values from the rendered scene use the same scale as those from the sensor.[/QUOTE]

You said that the values I upload to the depth buffer will be in the range of 0-1.Right now,I get values from 0 to some max number of millimeters.Should I normalize these values before passing them to the depth buffer?based on the max millimeters found at each frame?

So you say that the near and far planes of perspective view have to be chosen during runtime or one time in the beginning?

Right now,regarding the virtual objects that are rendered,I don’t know their coordinates w.r.t camera(only w.r.t a marker).I started building the app using znear=0.05 and zfar=10.0. Should I change these values?Or should I change the values of the depth data from sensor before feeding them to depth buffer?

[QUOTE=mbikos;1266470]You said that the values I upload to the depth buffer will be in the range of 0-1.Right now,I get values from 0 to some max number of millimeters.Should I normalize these values before passing them to the depth buffer?based on the max millimeters found at each frame?

So you say that the near and far planes of perspective view have to be chosen during runtime or one time in the beginning?[/QUOTE]
Sorry, I didn’t read your previous post carefully enough; you said that you were using glOrtho(), but that’s only for the video.

For a perspective projection, the relationship between Z and depth is non-linear. Specifically, it’s of the form depth=A/Z+B where A=zFar*zNear/(zFar-zNear) and B=zFar/(zFar-zNear). Bear in mind that points in front of the viewpoint have negative Z coordinates.

So you’ll either need to transform the depth values fed to the depth buffer, or use a fragment shader to write linear depth values to the depth buffer (the latter is more complex but may be more efficient).

Considering that he’s using glDrawPixels to write the depth, I don’t think efficiency is the problem :wink:

In any case, I would suggest rendering both the color and depth textures as color and depth textures (rather than using glDrawPixels). That way, you can use a fragment shader in that rendering to perform the necessary computations on the depth to put them in the proper range and non-linearity for your scene. That way, you don’t have to try to de-linearize the scene’s depth values or anything.

[QUOTE=GClements;1266471]Sorry, I didn’t read your previous post carefully enough; you said that you were using glOrtho(), but that’s only for the video.

For a perspective projection, the relationship between Z and depth is non-linear. Specifically, it’s of the form depth=A/Z+B where A=zFar*zNear/(zFar-zNear) and B=zFar/(zFar-zNear). Bear in mind that points in front of the viewpoint have negative Z coordinates.

So you’ll either need to transform the depth values fed to the depth buffer, or use a fragment shader to write linear depth values to the depth buffer (the latter is more complex but may be more efficient).[/QUOTE]

Thank you for your answer.So let’s say for example that I have a value z in millimeters from the depth sensor.Ishould transform the value in meters and then I should have z’=-z(negative value) and then apply the transformation you mentioned z’’=A/z’ +B?
And pass these values to the depth buffer via the glDrawpixels command?

Also, this has to happen before setting the perspective projection right?
So firstly change depth buffer,then load projection matrix and then I am done?

I adore you GCElements! My graduation is devoted to you!!!

In order to be able to render virtual objects within a physical world, you need to make sure that the scales of the two worlds match up. And since you can’t control your physical world’s scale, you should focus on the scale of your virtual world.

Therefore, if your physical distance data is in milimeters, then you should scale your virtual world to match (via the model-to-world transform).

Once you’ve done that, all you need to do is make sure that your physical distance data goes through the same transformations as your virtual distance data before the depth comparison.

Your physical distance values are in camera-local space. Your virtual Z coordinates, in camera-space, are transformed in a variety of ways. They go through the projection matrix, then through a variety of vertex post-processing transformation steps, to end up as values in the depth buffer.

You simply need to make sure that the physical distance values go through the same transformations. Transform them by your projection matrix, the same one you use for your virtual positions (use 0,0,X,1 for the vec4. The X is the depth value). Then transform them through the processing steps I linked to above. That’s the value that gets written to the depth buffer.

And then everything should match up perfectly.

[QUOTE=GClements;1266471]Sorry, I didn’t read your previous post carefully enough; you said that you were using glOrtho(), but that’s only for the video.

For a perspective projection, the relationship between Z and depth is non-linear. Specifically, it’s of the form depth=A/Z+B where A=zFar*zNear/(zFar-zNear) and B=zFar/(zFar-zNear). Bear in mind that points in front of the viewpoint have negative Z coordinates.

So you’ll either need to transform the depth values fed to the depth buffer, or use a fragment shader to write linear depth values to the depth buffer (the latter is more complex but may be more efficient).[/QUOTE]

GCelements,can you please tell me where did you find this transformation between depth and z?To include it in my references?

It’s a consequence of the projection matrix for a perspective projection (see gluPerspective or glFrustum) and conversion from clip coordinates to normalised device coordinates.

The perspective transformation sets the projected W coordinate to -Z (fourth row of the projection matrix) and the projected Z coordinate to A*Z+B (third row of the projection matrix, assuming that the unprojected W coordinate is 1, which is usually the case).

Conversion to NDC divides X, Y and Z by W, so the final Z coordinate becomes (-A)+(-B)/Z. The depth value is just (Z+1)/2, i.e. the Z value transformed from -1…1 to 0…1 (assuming that the depth range hasn’t been changed with glDepthRange).