PDA

View Full Version : Deferred shading



Leadwerks
05-22-2008, 02:49 AM
I am just about to move our lighting into a deferred step. It seems that the only additional buffer we need to add is a RGB buffer for the normal. The existing color texture can remain, the attached depth texture can be used to reconstruct the pixel xyz position in the deferred pass, and we just need to add a normal buffer. I do not know how to add this "gbuffer" or how to write to it in the frag shader. All I know how to do is set up an FBO with a color texture and a depth attachment. Where can I find more information on this?

AndreasL
05-22-2008, 05:48 AM
Dominike Göddeke has a tutorial that covers (among other GPGPU things) how to create and render to a floating point texture using FBOs.
http://www.mathematik.uni-dortmund.de/~goeddeke/gpgpu/tutorial.html#arrays5

Nvidia GDC presentation about FBOs in general
http://http.download.nvidia.com/develope...ffer_Object.pdf (http://http.download.nvidia.com/developer/presentations/2005/GDC/OpenGL_Day/OpenGL_FrameBuffer_Object.pdf)

GameDev.net has two tutorials on FBO, and the second cover multiple render targets
http://www.gamedev.net/reference/programming/features/fbo1/
http://www.gamedev.net/reference/programming/features/fbo2/


Not OpenGL-specific:
GPU gems 2 has a chapter on the deferred shading in the game S.T.A.L.K.E.R which I think is very informative.
GPU gems 3 has a similiar article about the game Tabula Rasa.

Leadwerks
05-22-2008, 11:37 AM
Why do I need floating point textures? This should be all I need:

Color - RGB
Depth - 24 bit depth (I think)
Normal - RGB

It should not be necessary to create a buffer for the fragment position, because that can be figured out from the camera FOV and aspect, fragment screen coord, and depth.

zeoverlord
05-22-2008, 12:07 PM
I found two PDF files talking about deferred shading, the one describing killzone 2 contains a pretty nice description on how all the color/depth buffers are laid out.

http://www.talula.demon.co.uk/DeferredShading.pdf
http://www.guerrilla-games.com/publications/dr_kz2_rsx_dev07.pdf

Leadwerks
05-22-2008, 03:23 PM
Here's my code for setting up a texture-based render buffer. The buffer can have an optional color and/or depth buffer, and now I am adding a normal buffer. I have no idea how to add the normal buffer.

Do I just create another RGBA8 texture and use COLOR_ATTACHMENT1? I tried this and the FBO did not produce an error. Now how can I render to the normal texture?


If colorbuffer
buffer.colorbuffer=CreateTexture(tw,th,GL_RGBA,GL_ TEXTURE_RECTANGLE_ARB)
buffer.colorbuffer._width=tw
buffer.colorbuffer._height=th
buffer.colorbuffer.bind()
buffer.colorbuffer.link.remove()
buffer.colorbuffer.link=Null
buffer.colorbuffer.Clamp()
buffer.colorbuffer.setfilter TEXTUREFILTER_PIXEL
glGenFramebuffersExt 1,Varptr buffer.colorbuffer.framebuffer[0]
buffer.framebuffer=buffer.colorbuffer.framebuffer[0]
glTexImage2D buffer.colorbuffer.target(),0,GL_RGBA8,buffer.colo rbuffer._width,buffer.colorbuffer._height,0,GL_RGB ,GL_UNSIGNED_BYTE,Null
glBindFramebufferEXT GL_FRAMEBUFFER_EXT,buffer.colorbuffer.framebuffer[0]
glFramebufferTexture2DEXT GL_FRAMEBUFFER_EXT,GL_COLOR_ATTACHMENT0_EXT,buffer .colorbuffer.target(),buffer.colorbuffer.index(),0
EndIf

If depthbuffer
buffer.depthbuffer=CreateTexture(tw,th,GL_RGBA,GL_ TEXTURE_RECTANGLE_ARB)
buffer.depthbuffer._width=tw
buffer.depthbuffer._height=th
buffer.depthbuffer.bind()
buffer.depthbuffer.link.remove()
buffer.depthbuffer.link=Null
buffer.depthbuffer.Clamp()
buffer.depthbuffer.setfilter TEXTUREFILTER_PIXEL
glTexImage2D buffer.depthbuffer.target(),0,GL_DEPTH_COMPONENT24 ,buffer.depthbuffer._width,buffer.depthbuffer._hei ght,0,GL_DEPTH_COMPONENT,GL_UNSIGNED_BYTE,Null
glFramebufferTexture2DEXT GL_FRAMEBUFFER_EXT,GL_DEPTH_ATTACHMENT_EXT,buffer. depthbuffer.target(),buffer.depthbuffer.index(),0
EndIf

If normalbuffer

EndIf


----EDIT-----

Wow, that second GameDev article is really good! It is a rare thing that I come across documentation worth reading!

Leadwerks
05-22-2008, 04:29 PM
That was easy. I still don't understand why people use floating-point textures and write the frag position to a buffer.
http://www.leadwerks.com/post/mrt.jpg

skynet
05-22-2008, 05:03 PM
Because in early days, render-to-depth-texture (i.e. using it as zbuffer at the same time) was not possible. Also, reconstruction of the worldposition from a zbuffer is a bit more involved (but should not be _that_ much of a problem today)

What you really should take care of is the precision of the normals. If you use RGB8-encoded normals, they are simply not enough. RGB10_A2 is quite ok (still, noticable artifacts), but needs EXTX_framebuffer_mixed_formats to be useful. RGBA16F for the normals is ok, I didn't notice any problems with it.

Leadwerks
05-22-2008, 05:51 PM
What if you used the RG terms to encode the X component and the BA terms to encode the Y component? The resolution would be less than a float value but greater than a byte.

The absolute value of the Z component can then be calculated from those terms.

AndreasL
05-23-2008, 01:03 AM
The setup I used was a R32F buffer for the linear depth and a R16G16B for the normals which gave high enough precision for the stuff I was doing. I did run in to trouble when trying to use the same setup on different GPUs, so I can see the benefits of not using float-textures and stick with the RGB8 format.

skynet
05-23-2008, 01:31 AM
The absolute value of the Z component can then be calculated from those terms. .

A square root has two solutions, which one do you chose? The normal is not always pointing outwards the screen. Think about interpolated vertex normals or normalmaps which can create normals that can greatly differ from the face normal. I wonder why people always seem to forget that when they propose the two-component-per-normal approach... (?)

Leadwerks
05-23-2008, 02:22 AM
If the normal isn't pointing towards the viewer, then it would have been dismissed unless back-face culling is disabled.

Let's say we need back-face culling disabled...well, I can make the smaller of one of the Y terms odd or even without creating too much inaccuracy, and use that as a +/- flag to indicate the z direction.

How compatible is the second color attachment with NVidia and ATI SM 3.0 cards?

skynet
05-23-2008, 02:38 AM
If the normal isn't pointing towards the viewer, then it would have been dismissed unless back-face culling is disabled.

You are talking about the face normal. I was talking about the per-pixel normals, which - unless you are using flatshading - matter for lighting.

Of course, if you can somehow encode the sign of the z component into the other two, then it should be possible to leave z out.

You should be able to use multiple color attachments as soon as ARB_draw_buffers is supported.

-NiCo-
05-23-2008, 04:42 AM
Same principle. Assuming there's no transparency (which is a rather valid assumption, otherwise you'd need to store more than 1 normal per pixel), everything in the world with a normal that's not facing the viewer should be invisible to the viewer. So you can take the pixels position and the two components of the normal to create a plane. Then you can select the sign of the third component so that it ends up at the same side of the plane as the viewer.

ector
05-23-2008, 10:01 AM
"everything in the world with a normal that's not facing the viewer should be invisible to the viewer"

That assumption breaks with interpolated normals, and it also breaks with normal mapping.

-NiCo-
05-23-2008, 11:09 AM
That assumption breaks with interpolated normals

Can' t imagine it does. If the interpolated normals face away from the viewer I believe that those pixels should be discareded rather than shadowed.


and it also breaks with normal mapping.
You're right. In this case discarding the pixels would create holes in the geometry.

In any case, a normalized three dimensional unit vector has only two variables so you can encode it with a 360 degree yaw angle and a 180 degree pitch angle.

knackered
05-23-2008, 11:37 AM
cos and sin per-pixel X per-lightvolume? sounds like a false economy to me.

Leadwerks
05-23-2008, 11:41 AM
Like I said, it is possible to encode the sign of the z component of the normal in a hackish but acceptable way. And if it means I can use an RGBA texture for the normal buffer, cool. That means I only use 2 RGBA images and a depth buffer for deferred lighting, and I don't suffer much from the bandwidth problems these techniques tend to experience. And the normal resolution would be about 0.00006 (1/128/128).

I ordered an ATI X1550 for low-end testing. Is this going to work on SM 3.0 hardware?

-NiCo-
05-23-2008, 11:58 AM
cos and sin per-pixel X per-lightvolume? sounds like a false economy to me.

Well, if you're doing deferred shading performing this conversion for a pixel only once to be reused for many light seems like a good option to me... Furthermore, although normalizing a normal vector using cube maps is slower than the normalization function in the shader these kind of lookups can still be used to perform the computationally more expensive trigonometric functions.

Seth Hoffert
05-23-2008, 04:17 PM
Semi-unrelated, but if the normal map is stored in tangent space, no sign storage is necessary, correct? (Just want to make sure my understanding is solid.)

sqrt[-1]
05-23-2008, 05:11 PM
FYI: depending on your usage, you may be able to use this:
http://code.google.com/p/lightindexed-deferredrender/

Leadwerks
05-23-2008, 05:22 PM
I got it set up using 24-bit normals. There is no problem at all with just using regular RGB encoding. Curved surfaces don't exhibit any banding artifacts as I thought they might.

knackered
05-23-2008, 05:50 PM
you're saying your specular highlights look ok with just 24 bit normals?

Leadwerks
05-23-2008, 06:31 PM
Haven't done specular yet, just diffuse lighting.

Leadwerks
05-24-2008, 12:45 AM
Wow, I got deferred point lights working and experienced about a 200% boost performance boost.

knackered
05-24-2008, 08:41 AM
were you previously doing light bounds to geometry bounds tests every frame or something? it all depends on how efficient your forward rendering path is, as to how much of a benefit you get from deferred shading.
Try specular and you should see some horrible banding with 24bit normals with a high specular power.

Leadwerks
05-24-2008, 12:31 PM
I think it is because I am rendering some fairly complex scenes. I also did something clever with depth testing which makes eliminates a lot of fragments in the light pass.

Leadwerks
05-24-2008, 03:58 PM
With 8 point lights our deferred lighting is 4 times faster than forward rendering. :D

Jan
05-24-2008, 04:02 PM
" I also did something clever with depth testing which makes eliminates a lot of fragments in the light pass. "

Interesting. Is it anything different, than mentioned in the above papers? If so, i would like to know more about it.

Jan.

Leadwerks
05-24-2008, 09:50 PM
I don't know what the above papers do. I just do what makes sense to me.

I go like this:
1. Render scene to texture-based buffer

2. Draw buffer textures to the next buffer (either another texture buffer or the back buffer) with lighting.

So first I copied the depth buffer from part 1 to part 2. Since I had to do a full-screen pass for the ambient light, I just used this shader to set the depth.

For each point light I draw a sphere mesh. If the camera is outside the sphere I enable depth testing with gldepthfunc=GL_LESS. If I am inside the sphere, I just switch the polygon order (GL_CW, GL_CCW) and set the depth test to GL_GREATER. This does a pretty good job of discarding a lot of fragments with an early-out depth test.


I got specular reflection working and tried using RGBA32F for the normal buffer format. I could see absolutely no difference between that and the appearance of RGBA8.

Jan
05-25-2008, 03:45 AM
The paper, that talks about Killzone's deferred rendering explains pretty well, how they determine lit pixels (using different depth-tests and stencil masking). I think it rejects even more fragments, than your approach, you might want to take a look at it again.

Jan.

V-man
05-25-2008, 07:30 AM
For each point light I draw a sphere mesh. If the camera is outside the sphere I enable depth testing with gldepthfunc=GL_LESS. If I am inside the sphere, I just switch the polygon order (GL_CW, GL_CCW) and set the depth test to GL_GREATER. This does a pretty good job of discarding a lot of fragments with an early-out depth test

You lose performance by changing the depth test direction. I think it disable the hierarcal z buffer.

Leadwerks
05-25-2008, 11:09 AM
I don't understand how they use stencil masking, because when I render a point light, I just draw a sphere, so it is already within the light bounds, and stencil masking would have no affect.

Maybe they draw a stencil mask with a previous pass reading the depth buffer and determining whether each pixel is within the light volume. Then the second pass would be carried out only on the relevant pixels.

Sunray
05-25-2008, 12:07 PM
What you really should take care of is the precision of the normals. If you use RGB8-encoded normals, they are simply not enough. RGB10_A2 is quite ok (still, noticable artifacts), but needs EXTX_framebuffer_mixed_formats to be useful. RGBA16F for the normals is ok, I didn't notice any problems with it.

Aren't normal maps RGBA8 already? So how can fetching normals from a RGBA8 texture, rotating them to world space and storing them in a RGBA8 texture, loose precision?

Maybe I'm missing something because I have no experience in deferred shading.

Seth Hoffert
05-25-2008, 12:14 PM
Sometimes normal maps can be stored as 8bpp, but other times more precision is needed. In my case, I ended up switching over to 16bpp and only storing the X and Y component, and recreating the Z component in the shader (which I was able to do without storing any sign since my normal map was in tangent space). :D

Eosie
05-25-2008, 01:16 PM
I also used to use high-precision normal maps for reflections and later I realized that the problem was not because of 8bit precision but also because of low-precision bilinear filtering which was performed on the texture. So I switched back to LATC-compressed normal maps and now I do bilinear filtering manually in the shader and it looks almost the same as if I used high-precision normal maps! (except losing mipmapping and anisotropic filtering) So for me, the need of high-precision normal maps that are mapped on a model is just a myth. However I'm not sure about deferred shading though.

Seth Hoffert
05-25-2008, 01:36 PM
Well, the way I look at it...

By storing normals as a two-component short texture (16 bits per component), it takes a total of 32 bits, and you don't lose mipmapping or anisotropic filtering, and there is no need to perform bilinear filtering manually.

By storing normals as a three-component char texture (8 bits per component), it will most likely pad the 24 bits into 32 anyways, and you will lose mipmapping/anisotropic filtering (when performing your own bilinear filtering). :( Of course, one could follow the same approach as above and just store two components (so, 16 bits total per texel) and save space.

However, I am not taking into account the available compressed formats here, and of course I am talking about storing tangent space normals and not object space normals.

Leadwerks
05-25-2008, 02:59 PM
Maybe there is a format like GL_IA16 that would work for that.

Leadwerks
05-26-2008, 05:45 PM
I found that with specular reflection on a flat surface, using an RGB8 texture causes a flicker when the camera angle changes. This makes sense, because the resolution of the normal relative to the camera is limited. Using an RGB16F normal buffer corrected this. I saw no difference between the RGB16F and RGB32F formats.

Interestingly, I tried using GL_LUMINANCE_ALPHA16F_ARB and the framerate dropped to <1 FPS, presumably because my GEForce 8800 is doing some kind of software fallback. I guess that is what happens when you try to use weird/odd formats that don't get tested that often by the driver guys.

So I will use GL_RGB16F_ARB with a fallback for GL_RGB8.

knackered
05-27-2008, 08:25 AM
Just out of curiosity Leadwerks, how are you detecting software fallbacks in your engine? i.e. how will you fallback to RGB8 if RGB16F drops you onto a slow path?

Leadwerks
05-27-2008, 01:52 PM
There is an extension to detect whether float textures are supported.

There is no way to detect software fallbacks unless you want to try parsing the feedback from the shader log and guessing what it means. I am not willing to use such a hack. There is no way I know of to detect if a texture format causes a fallback to software more.

It would be better to just have the driver crash than to make it run in software more, in my opinion.

knackered
05-27-2008, 03:19 PM
I was hoping you'd give us details of your pre-run benchmarking, but you obviously don't do that. I do a frame or two of some basic timed tests at start-up myself.

Leadwerks
05-27-2008, 04:44 PM
Really? That seems horribly messy to the point where I wouldn't even want to mess with it.

NeARAZ
05-28-2008, 10:41 AM
It would be better to just have the driver crash than to make it run in software more, in my opinion.
Well, welcome to the wonderful world of abstractions... If it does not work, we'll switch to software! Everyone's going to be happy! *sigh*

knackered
05-28-2008, 03:46 PM
Really? That seems horribly messy to the point where I wouldn't even want to mess with it.
Messy? Since when was OpenGL tidy? Answer: 1999.
How else can you be sure something is hardware accelerated in OpenGL? Answer: you can't, you cross your fingers and wait for the support calls.

Leadwerks
05-28-2008, 03:55 PM
NeARAZ, do you work for Unity? I probably met you when I was in Denmark. I am the guy from California that came by right when Unity 2.0 was about to be released.

bobGL
05-28-2008, 06:59 PM
Leadwerks:

What is the formula that you are using to reconstruct the XYZ position from the depth buffer in your GLSL shader??

Tks in advance!

Leadwerks
05-28-2008, 07:05 PM
Change non-linear depth value into a z value:

float DepthToZPosition(in float depth) {
return camerarange.x / (camerarange.y - depth * (camerarange.y - camerarange.x)) * camerarange.y;
}

buffersize is the screen dimensions:

float depth = texture2D(texture1,texCoord).x;
vec3 screencoord;
screencoord = vec3(((gl_FragCoord.x/buffersize.x)-0.5) * 2.0,((-gl_FragCoord.y/buffersize.y)+0.5) * 2.0 / (buffersize.x/buffersize.y),DepthToZPosition( depth ));
screencoord.x *= screencoord.z;
screencoord.y *= -screencoord.z;

bobGL
05-28-2008, 08:04 PM
So basically you need to pass to your shader the width & height + zFar & zNear right?

NeARAZ
05-28-2008, 10:48 PM
NeARAZ, do you work for Unity? I probably met you when I was in Denmark. I am the guy from California that came by right when Unity 2.0 was about to be released.
Yeah, I remember that :)

Leadwerks
05-29-2008, 12:55 PM
So basically you need to pass to your shader the width & height + zFar & zNear right?
Yes, that is all that is require to calculate it.

Leadwerks
06-04-2008, 01:22 AM
I always enjoy reading papers that Valve and Crytek put out, so I wrote this paper about our experience implementing deferred lighting in our engine. It includes a few useful formulas. I hope you enjoy it:
http://www.leadwerks.com/ccount/click.php?id=50

knackered
06-04-2008, 01:58 PM
very kind of you. thanks.

Lord crc
06-04-2008, 04:29 PM
Interesting read, thanks!

knackered
06-05-2008, 12:50 AM
albeit a bit on the brief side.

Leadwerks
06-05-2008, 01:56 AM
Always leave them wanting more. :D

Thanks for the feedback, I feel like I went from a GLSL noob to a pro pretty quickly, so it is nice to know people are interested in the stuff I am working on.

karx11erx
06-06-2008, 08:38 AM
I am just trying to understand deferred lighting, and I have one question: Does it also give you proper shadowing? Forgive me my noobishness.

Leadwerks
06-06-2008, 10:40 AM
The results of our deferred renderer are exactly the same as our forward. Shadow maps are still rendered the same way.

karx11erx
06-06-2008, 11:12 AM
So you have to apply shadow maps and compute them separately (one render pass per light source)?

Leadwerks
06-06-2008, 11:53 AM
Yes. I only update point and spot shadowmaps when something moves that affects a light's shadows.

karx11erx
06-06-2008, 12:41 PM
Mind telling me how many lights per scene you have (average/max)?

Leadwerks
06-06-2008, 01:39 PM
I don't have anything I can base an average on. 9 point lights runs at about 250 FPS on an 8800.

Sunray
06-06-2008, 01:44 PM
A simpler way (IMO) to calculate the view space position is "Position = gl_ProjectionMatrixInverse * Ndc" where Ndc is known from gl_Position.xy/gl_Position.w and the sampled depth value. It's simpler because you don't have to pass any uniforms to the shader.

karx11erx
06-07-2008, 02:30 AM
Well, I was asking because I can have dozens and dozens of moving light sources in the scenes of the (old) 3D game I am maintaining, and if real time shadows had come with deferred lighting at no extra cost it would have been something I'd certainly have implemented. Deferred lighting keeps looking interesting, but I still don't have a good idea for shadow rendering. I cannot render a scene 50 times to get everything right ...

Sunray,

mind lining out how to exactly compute "Ndc"?

Sunray
06-07-2008, 04:09 AM
In normalized device coordinateds (NDC) x,y,z are in the range [-1, 1].

It should be something like this:



VertexShader:
ClipPos = gl_Position;

FragmentShader:
ClipPos.xy /= ClipPos.w;

// Sample depth
vec2 Uv = ClipPos.xy*0.5 + 0.5;
float Depth = texture2D(DepthBuffer, Uv).r*2.0 - 1.0;

// Compute view space position
vec4 Ndc = vec4(ClipPos.xy, Depth, 1.0);
vec4 Position = gl_ProjectionMatrixInverse * Ndc;

Leadwerks
06-07-2008, 11:29 AM
Why would you do a matrix multiplication when the values are already in screen space? All you have to do is change them from screen coordinates to screen space by multiplying them by how far away from the camera they are.

oc2k1
06-07-2008, 03:49 PM
One of the fastest reconstruction is this:


mat4 m = gl_ProjectionMatrix;
float Z = m[3].z/(texture2DRect(G_Depth, gl_FragCoord.xy).x * -2.0 + 1.0 - m[2].z);
vec3 modelviewpos = vec3(pos.xy/pos.z*Z,Z);

where pos.xyz is the lightvolumes fragmentposition in screenspace. The only possible optimization is to replace the two values from the gl_ProjectionMatrix by two own uniform float variables. It would save a single instructions (+1.0), because scale by two is for free. More details:
http://lumina.sourceforge.net/Tutorials/Deferred_shading/Point_light.html

LangFox
06-08-2008, 08:35 AM
Well, when I read the article "Motion Blur as a Post-Processing Effect" in GPU Gems 3, the author provides a way to extract world space from the depth buffer in a full-screen post-processing.

It looks like:


// Get the depth buffer value at this pixel.
float zOverW = tex2D(depthTexture, texCoord);

// H is the viewport position at this pixel in the range -1 to 1.
float4 H = float4(texCoord.x * 2 - 1, (1 - texCoord.y) * 2 - 1, zOverW, 1);

// Transform by the view-projection inverse.
float4 D = mul(H, g_ViewProjectionInverseMatrix);

// Divide by w to get the world position.
float4 worldPos = D / D.w;


Then I translated it to GLSL:


vec2 vec2TexCoord = gl_TexCoord[0].st;

// Get the depth buffer value at this pixel.
float fZOverW = texture2D(g_txDepth, vec2TexCoord).r;

// H is the viewport position at this pixel in the range -1 to 1.
vec4 vec4H = vec4(vec2TexCoord * 2.0 - 1.0, fZOverW, 1.0);

// Transform by the view-projection inverse.
vec4 vec4D = g_mat4InverseViewProjection * vec4H;

// Divide by w to get the world position.
vec4 vec4WorldPos = vec4D / vec4D.w;


Unfortunately, the result looks wrong. Is it because the depth buffer range is 0 <= z <= w in D3D, but -w <= z <= w in GL?

oc2k1
06-08-2008, 08:50 AM
That is slower, because the multiplication with the inverse viewProjection matrix requires 16 (scalar) MADDs

Sunray
06-08-2008, 09:26 AM
Try "zOverW = zOverW * 2.0 - 1.0". This should be the exact same thing as I wrote. However I forgot to divide pos by pos.w. :)

LangFox
06-08-2008, 08:32 PM
To oc2k1,

Yes, your way is better for light volume. But in this case it's a full-screen quad...And it calculates world space position.

To Sunray,

You Are Right! Now I got correct result.