View Full Version : Shadow volumes sooo slow...

07-11-2002, 04:41 AM

Shadow volumes eat so much fillrate. If I disable them (just not drawing them) I get framerates that are sometimes even 5 times higher! And I have already basically optimized my engine with portal culling so that a brush only casts a shadow when it is really lighted by a light-source. But shadow volumes still make that enormous difference in framerates... http://www.opengl.org/discussion_boards/ubb/frown.gif

At the moment, I'm using the infinte shadow volume technique described in the NVidia paper. Probably this is even taking more fillrate than the conventional "Carmack's Reverse" approach. Does anyone know if there is a way to optimize the shadow volumes so that they are less fillrate-intensive? Or how are you solving this huge performance problem in your engines?

Thanks in advance

07-11-2002, 05:49 AM
Are you projecting the volume or just the silhouette ?
If you're using Carmack's reverse, you HAVE to project the volume. If you're not using Carmack's reverse, then you can save some fillrate by projecting only silhouettes.

Another pretty straight-forward optimization is to project volumes only if it is useful. That is, if you can predict than an object "A" won't be able to occlude light for any visible object in the viewing frustum, then don't project the silhouettes/volume for that object "A".

07-11-2002, 07:38 AM
Another thing that helps fillrate is to use the scissor function. This has been discussed a bit on another thread about that nvidia paper on shadow volumes. Carmack is even using scissor clipping for the shadow volumes. I havn't tried it myself yet, but I plan to soon.


07-11-2002, 08:04 AM
You could try using some sort of constructive solid geometry to merge overlapping shadow volumes into one volume before rendering it.

I haven't thought about it too much, so I don't know if this could actually work, but you may be able to do the CSG in 2D screen space to simplify the problem.

-- Zeno

07-11-2002, 08:24 AM
Zeno : CSG would probably use the stencil which is already used by shadow volume algorithm, so you CSG+Shadow may be a bit messy to implement because you need to share the stencil buffer. I think CSG doesn't help much there. It would be at best as fast as current technique, and could be at worst very slow.

LaBasX2 : I assume that you are rendering into the stencil buffer with turning lighting off, and textures off, etc. But do you send texture coordinates and normals ? I hope not : you should only call glVertex while rendering your black silhouettes/volumes into the stencil buffer. Each call to glNormal, glTexCoord, glColor and glFogCoord (and other instructions like that...) should not be called for drawing black silhouettes/volumes.

Also you should set glShadeModel(GL_FLAT).

What do you disable ? lighting ? texturing (for ALL texture units) ? fog ?

07-11-2002, 08:47 AM
You should use scissoring, with some tweaks it gave me a 3x speedup in average cases.
You could also try to use Beamtrees or something else so that you can check for polygons that are in other polygon's shadows.
(Note that beamtrees are a lot of work and you almost certainly need a bsp for it since you have to insert front to back)

07-11-2002, 09:09 AM
Thanks for your help so far.

I will try using the scissor box. I hope that this will also help a bit in my case. Beamtrees and csg sound also interesting but since I don't have a bsp-tree it will become hard to realize.

Nearly everything is disabled while drawing the volume faces (including color buffer writing). Also I'm only sending the vertex positions to the card (even using vertex arrays). I rather think that rendering "infinite" faces costs more fillrate than rendering manually stretched faces. But I have to check this out by implementing the conventional Carmack's Reverse again and looking at the performance.


07-11-2002, 09:21 AM
Are you projecting volumes or only silhouettes ?

07-11-2002, 09:33 AM
I'm projecting volumes since I'm using the NVidia version of Carmack's Reverse.

07-11-2002, 09:56 AM
Could somebody point me to somewhere (or explain firsthand) how scissoring would help shadow volume rendering?

I don't doubt it does, but I'd like to understand why (and I can't figure it out in my head).



07-11-2002, 10:05 PM
LaBasX2 : if you only projected silhouettes then you would save some fillrate. But that depends on what algorithm you use to reduce/eliminate artefacts of near/far plane intersection.

Mezz : for each shadowing object, if you know what objects will be shadowed, then you can bound the region where shadow occurs by a rectangle. Use this rectangle as scissor, so that OpenGL will eliminate sooner (in the per-fragment operation pipeline) pixels whose coordinates lie out of the scissor.

At worst, scissor testing willl be as slow as if scissor was disabled. At best it will speed up significantly your shadow algorithm. That is for GPU performance.
About CPU performance, you have to perform a few operations to detect which objects shadow which other objects.

You can do something similar with clipping planes, but I think clipping planes eat too much compared to scissor since it's much easier for the GPU to optimize scissor testing than 3D clipping.

Julien Cayzac
07-12-2002, 12:14 AM
Another optimization could be to cache shadow volumes when both the occluder and the light are static. This is done at load time, once for all...
However, in modern 3D engines, everything's becoming dynamic.


07-12-2002, 02:58 AM
deepmind : you're right, but unfortunately that won't help much the fillrate. Anyway this may help the fillrate , in a special case : imagine that a static object A occludes the light of another static object B from a static light, then instead of projecting two shadow volumes for A and B, you can merge the volumes using CSG and then only draw one shadow volume AB. IMO it's the only case CSG could be useful (I mean, fast enough), and to be honest this case should be pretty rare.

07-12-2002, 04:00 AM
as far as i know, the idea of the beamtree is actually removing the need for a csg afterwards by merging them at generationtime.. am i wrong or right?

07-12-2002, 04:07 AM
Csg would probably really help but I think it is only suitable for static geometry where it can be precalculated. At the moment in my engine nearly everything is dynamic but I fear that I will have to give up that philosophy since performance is just too bad for the shadows.

What if I would not try to make the volumes infinite but only give them the size of the light's radius? But I think that the scissor box already has the same effect, right?

07-12-2002, 04:44 AM
limiting the light's radius obviously improves fillrate. That's what Carmack is using in his new Doom engine.
I don't think it is related to scissor testing, even though both optimizations work on influence area/volume.

07-12-2002, 05:29 AM
If you really want to save some fill then you really need to use beamtrees. It's important not to try and make everything dynamic by default. Have a static and dynamic part for each light ( a dynamic part is required for static lights too, for things that animate/move ). Scissoring will save you quite a bit of fill as pointet out by Pentagram ( using infinite volumes for multiple lights is only practical in conjunction with scissoring and the optimizations mentioned above ).

But, having lights that are static is really not that bad. Most lights in the real world don't move around and you might aswell take advantage of that http://www.opengl.org/discussion_boards/ubb/smile.gif. If you decide to move a "pre-compiled" light, just discard the static volumes and treat is as completely dynamic ( and possibly re-build the static part if/when the light won't move for a while - split this work up over a few frames or simply leave it out ).

You can use the beam trees to cull animating models ( and of course other surfaces ) and thus avoid computing a shadow for it.

Remember that if you don't use infinite volumes you'll _need_ to clip the shadow volumes for shadowing to work in all possible cases ( this has been discussed a few times on this board ).

07-12-2002, 05:50 AM
When an occluder is outside the pyramid defined by the corners of the image plane and the light source, you know its shadow volume cannot be clipped by the near plane, so you can render the shadow volume of that object with in "zpass" mode (which only requires the extruded silhouette polygons -- not the end caps).

This is a simple test that can be done on a bounding volume of each occluder, and it definitely reduces fill consumption.

Scissor is another Good Thing to use when you know you can crop the region of possible illumination.

Mark Kilgard and I are seriously considering a follow-up paper that focuses totally on optimization issues for stenciled shadow volumes.

Thanks -

07-12-2002, 06:18 AM

A follow up paper sounds like a good idea http://www.opengl.org/discussion_boards/ubb/smile.gif. Unfortunately, the best optimizations require a lot of work and depends on what you're doing ( and these are at the scene-graph level ). Take a deformable model, this is the absolute worst case ( I think the optimization you mentioned + scissor is the only thing that'll reduce fill consumption for that case ).

07-12-2002, 09:32 AM
I spent a short amount of time messing around with nvidia's shadow volume paper. I implemented it in my own short demo. I have 2 dynamic models that animate and everything else is completely static. THe lights are static too. THe 2 animating characters take as long as the entire rest of the scene. This is simply a result of having to compute the 2 characters shadow volumes ever frame and the entire environment is only calculated once. Its a bummer. Things that are truely dynamic means just that. Anything can change and anything goes which means nothing can be precomputed.


07-12-2002, 12:16 PM
Using beam trees and static geometry/lights will certainly boost performance but I guess implementing that will be much work since my complete engine is designed for dynamic use at the moment. But probably there is no way round static components with the current hardware...

But before completely giving up my dynamic concept I still want to try the scissor optimization and the other optimization proposed by cass.

I don't know if I'm understanding the scissor test correctly:
You need to project the light sphere into screenspace. For doing that you could use gluProject to get the middle point (x, y) of the sphere and the radius (rad) in screenspace. After that you set the scissor function with the following parameters: glScissor(x - rad, y - rad, 2 * rad, 2 * rad). Ist that correct or is there a better way? Please tell me if I'm missing something here...

Thanks in advance

P.S.: @Cass: A paper about shadow volume optimization would be really a great and helpful thing. The last paper about infinite shadow volumes was excellent in my opinion!

[This message has been edited by LaBasX2 (edited 07-12-2002).]

07-12-2002, 05:32 PM
I agree, a follow up paper would rock! http://www.opengl.org/discussion_boards/ubb/smile.gif

BTW, any one have some source that shows building and using a beam tree? http://www.opengl.org/discussion_boards/ubb/smile.gif I have looked around all over the place but no luck. http://www.opengl.org/discussion_boards/ubb/frown.gif


07-13-2002, 12:52 AM
yeah, a follow up would be too cool. as this is one of the best papers i've ever read. read it, implemented it. took me about an hour from zero to volumetric shadows..

07-13-2002, 01:10 AM
Beam Trees & Shadow Volume Bsp's: http://www.cs.wpi.edu/~matt/courses/cs563/talks/bsp/document.html
(at the bottom of the document) http://www.flipcode.com/harmless/issue01.htm#beamtrees
These are the most usefull links I found.
You can always check out my tenebrae sources. It implements a SVBsp (& scissoring & precalculation) but I don't know if it works http://www.opengl.org/discussion_boards/ubb/wink.gif (Essentially the vis of quake works so well polygons are cut most of the time before they can be checked against the svbsp)
(Note that I will release a new (cleaner) version "soon".)


[This message has been edited by Pentagram (edited 07-13-2002).]

07-13-2002, 03:14 AM
Ok, I've seen that calculating the scissor box seems to be a bit more complicated than I thought...

07-16-2002, 07:00 AM

I have sample code for scissor rectangle determination up at that link. Probably has some bugs, but the theory is sound ( and it works .)

[This message has been edited by Catz (edited 07-16-2002).]

[This message has been edited by Catz (edited 07-16-2002).]

07-17-2002, 08:59 AM
Great! Thanks for the link!

07-20-2002, 01:09 AM
Just noticed this,

I don't think it will reduce fill rate consumption but it should still improve performance.

EDIT: Ok, it won't help you now, since no hardware supports it except the new Radeon 9700 ( as far as I know ).

[This message has been edited by PH (edited 07-20-2002).]

07-20-2002, 01:32 AM
one question about the twosidedstencil. what if i disable it, and let stenciltest enabled. is then the frontface or backface settings the one, that will be set in the singlesidedstencil? or are those values indepenend of the singesidedstenciltest? its not explicit defined in the ext, and i'm not sure about it.. is it a bit like multitexturing, or is it simply a replacement and does not map onto the standart stencil settings?

07-20-2002, 02:16 AM
With two-sided testing disabled, the front state is used ( like two-sided lighting I believe ).

Julien Cayzac
07-20-2002, 04:04 AM
Originally posted by PH:
Ok, it won't help you now, since no hardware supports it except the new Radeon 9700 ( as far as I know ).

The spec comes from NVidia, so I think they will soon support the extension (plus, it's a EXT_ one). The fact that ATI already supports it is really good! I'll modify my codepaths to make use of it.


07-20-2002, 04:15 AM
I meant to say that the Radeon 9700 _hardware_ supports two-sided stencil testing not the extension ( but it most likely will ). I noticed an ATI specific extension in glATI.h that hinted that the Radeon 9700 supported this functionality but it has been removed again ( this was prior to it's launch, so it'll probably be added again - the extension was/is called GL_ATI_separate_stencil ). Other things worth mentioning is that the Radeon 8500 apparently supports NV_point_sprite and NV_occlusion_query ( these are still in the header file ). It's nice to see these extensions implemented in other vendors drivers.

Julien Cayzac
07-20-2002, 04:30 AM
Originally posted by PH:
Other things worth mentioning is that the Radeon 8500 apparently supports NV_point_sprite and NV_occlusion_query ( these are still in the header file ). It's nice to see these extensions implemented in other vendors drivers.
Could someone confirm NV_occlusion_query is really supported by ATI, please? If that's true, then it's real good news for me since the engine I'm working on *needs* occlusion queries (I was hoping it would be mainstream in 2 years and didn't expect it was coming so earlier http://www.opengl.org/discussion_boards/ubb/tongue.gif)


07-20-2002, 07:51 AM
I can't say with 100% certainty since these extensions are not in my drivers but have a look at this,

Two weeks ago there were a few more extensions and a "Radeon 9xxx" line so I would expect a site update soon. Oh, and the ATI_map_object_buffer extension complements VAO nicely.

07-20-2002, 07:57 AM
And notice the it contains a GL_EXT_texture_rectangle which is supported on Radeon 7xxx / 8xxx. I don't have the spec but it's probably like the NV version.

Sorry for the offtopic posts http://www.opengl.org/discussion_boards/ubb/smile.gif.

07-23-2002, 08:05 AM
Originally posted by deepmind:
Could someone confirm NV_occlusion_query is really supported by ATI, please?

That's a definite yes http://www.opengl.org/discussion_boards/ubb/smile.gif. I've just used it ( and NV_point_sprite ) with the 6118 Radeon 8500 drivers. The extensions are not in the extension string, so they are probably not free of bugs ( the point sprite extension has a few bugs I think ).

EDIT: ARB_vertex_program is also in this driver set but I don't think it's complete yet.

[This message has been edited by PH (edited 07-23-2002).]