Using multipass rendering for speedup

Hi guys.

I once read, that Doom 3 uses this way to draw its geometry:

Clear depth, disable color write, disable texturing and all other effects. Enable depth test.
Draw the whole scene (i think front to back).
This will only fill the depth buffer.
Now enable all effects and render the scene with depthfunc = GL_EQUAL and depth write disabled.

I can understand, that this might give a speed increase in Doom 3 because it has a lot of overdraw and this way per pixel lighting, bump mapping etc. will really only be calculated for pixels that are visible (that pass the depth test).

However do you think this might give me a speed increase, if i need only 2 to 4 texture units (texture + lightmap + environmental map + dynamic lighting).
I mean, when i have 4 texture units (what i don´t have) i could draw all stuff in one go. However if i have dynamic objects, that cover a huge part of the screen, there would be much calculated, which won´t be visible afterwards, cause my static geometry can be sorted front to back, but my dynamic objects will be drawn afterwards and will cover this.

Just tell me what you think about it.
Jan.

Rendering polygons from front to back will also help to eliminate overdraw as it allows one’s video card’s hidden surface removal capabilities to work at their best.

Originally posted by Jan2000:

Clear depth, disable color write, disable texturing and all other effects. Enable depth test.
Draw the whole scene (i think front to back).
This will only fill the depth buffer.
Now enable all effects and render the scene with depthfunc = GL_EQUAL and depth write disabled.

It speeds up things only if your first pass is expensive. If it’s cheap, then you probably don’t need to pre-render your depth buffer.

D3 renders in one pass on some hardware. This pass is quite GPU aggressive, so pre-rendering depth buffer makes sense here.

Julien.

I think I’ve read that the Radeon’s HyperZ performs optimally when doing an inequality test rather than an equality test, so you’d likely want to use a LESSEQUAL test rather than just EQUAL.

>>Clear depth, disable color write, disable texturing and all other effects. Enable depth test.
Draw the whole scene (i think front to back).
This will only fill the depth buffer.
Now enable all effects and render the scene with depthfunc = GL_EQUAL and depth write disabled<<

i do something very similar in my game (though i also do quite a bit of overdraw upto 17 passes)
this does result in quite a speed increase on my gf2mx (from memory upto 10-20%)
im sure with newer hardware this will be even greater

The point of this multipass is to improve quality/complexity of shading on hardware that can’t do it in a single pass NOT to increase performance. Stencil shadow testing against depth buffer is another important motivation. If you do this purely for performance when you could have used a single pass you will lose performance BIG TIME.

Coarse Z hardware optimizations mean you will see performance gains if you sort front to back because you will do less fragment shading and memory fetches.

[This message has been edited by dorbie (edited 02-08-2003).]

I believe what the poster is interested in is more deferred shading rather than multipass. Rendering the entire scene into just the depth buffer first, may be more efficient at times if the shader is sufficiently expensize, the number of occluded pixels is high, and the HW can reject pixels prior to shading.

I don’t have any good rule of thumb as to when it is better, so testing is probably your only option to figure it out. Remember you will have to be careful to avoid depth variance issues between passes. You also should consider that if you want to do something like this where the rest of the rendering doesn’t rely on it, you might want to have a switch in your code to do it either way. I imagine the cross-over point of what is best will vary across cards. Also, this will never be a win if you are geometry limited, but I am sure you knew that.

Finally, yes you do want to use LEQUAL rather then EQUAL on the Radeon series when appropriate asit allows better use of the HW occlusion culling.

-Evan

I’ve had this argument before with Evan, but I still can’t fathom this advice to use LEQUAL rather than EQUAL when you mean EQUAL.

EQUAL is at least as fast as LEQUAL on all NVIDIA hardware. After all, any pixel that LEQUAL can kill, EQUAL can also kill.

  • Matt

The depth test comes at the end of the pipeline so it won’t improve performance at all. The depth test is at the end because if it was at the front a rejected fragment would cause a pipeline bubble with the same performance hit anyway.

i believe the performance increase is mostly due to the disabling of the zwrites.
sure if u can do a scene in one pass instead of multiple passes that is most likely to be quicker (but not all hardware is capable of this)

also though the depthtest might be at the end in the spec i believe some companies dont do it that way

[This message has been edited by zed (edited 02-10-2003).]

As I said, the primary motivation for this is not deferred shading. In general if your reason for multipass is deferred shading then you’d better have a very complex shader. It’s highly debatable that you’ll do better than a decent coarse scene sort even if you do.

I think that this is especially usefull if you have a lot of dynamic (or some big dynamic) objects.
Doom 3 never renders in 1 pass, as far as i know. It renders one pass PER LIGHT.
Now when i draw front to back, then it shouldn´t give me a speed increase, because every pass the z-values have to be computed, so if i first draw an extra pass, it will only be more work.
However if i draw all geometry, even the dynamic objects, in the first pass, then i MIGHT get a speed increase, that is, when my dynamic objects cover a big piece of my screen.

However what i would like to know is, how big the speed difference might be. That means, if i have lots of dynamic objects, but only need 3 or 4 texture units to draw the geometry, should i then just draw all geometry in one pass, and overdraw it with the dynamic objects, or should i first fill the depth buffer, and then draw the geometry in one pass.
The question is how much of overdraw (or how complex shaders) is needed to justify such a first pass.

I would like to test this, but at the moment it is not possible for me, cause my app has enough other problems, and my gfx card has only 2 texture units, and no (hardware) vertex or fragment shaders.
Therefore i asked, if anyone has experience with this approach.

Jan.