PDA

View Full Version : aggressive scissor for shadow volumes



cass
04-11-2003, 11:49 AM
I did a bit of hand waving about this topic at GDC this year, so I figured I should do a demo to illustrate the point a little more clearly. You can find it at:
http://developer.nvidia.com/view.asp?IO=shadow_volume_intersection

Thanks -
Cass

Assassin
04-11-2003, 02:09 PM
I haven't looked through the source yet, but it seems that you're finding all the intersections with the bounding box of the "room" containing the shadow-caster. Would it not save more fillrate to clip the extruded quads to those calculated intersection points, making the stencil volume only exist inside the area it has influence in?

SirKnight
04-11-2003, 02:48 PM
Ya earlier today I noticed this demo on the developer page. Looks pretty cool. I havn't tried it out yet though, but I will in a sec. The presentation from GDC that is linked to from that demo's page is pretty cool too. Lot's of good stuff in there.

-SirKnight

cass
04-11-2003, 03:30 PM
Originally posted by Assassin:
I haven't looked through the source yet, but it seems that you're finding all the intersections with the bounding box of the "room" containing the shadow-caster. Would it not save more fillrate to clip the extruded quads to those calculated intersection points, making the stencil volume only exist inside the area it has influence in?

Good point, Assasssin, it would save more fill to clip and cap the shadow volume to the bounds. The trouble is, that's a pretty expensive operation to do on the CPU. This approach uses rough approximations with bounding volumes, and gets a lot of the same fill reduction by just using the scissor. The CPU and geometry load is signficiantly lighter.

Cass

SirKnight
04-11-2003, 04:49 PM
I think it depends on your application which way is faster. In some cases your 3d engine may run faster one way but another one may run faster the other way. The best thing is to test both ways to see which one your application likes better. A program that does not use much CPU may do better with the method Assassin says, on the other hand, if your program already uses quite a bit of CPU time, then it would be better to not make the CPU do more than what it can handle at the moment, thus hurting performance. This is when the technique cass' demo shows would prevail.

-SirKnight

[This message has been edited by SirKnight (edited 04-11-2003).]

SirKnight
04-11-2003, 05:19 PM
Originally posted by cass:
The trouble is, that's a pretty expensive operation to do on the CPU.
Cass


Are you sure? Because from my tests w/ the function to compute the screenspace bounding rectangle of the bounds of a light to use with glScissor was quite fast I think. According to the VC++ 6.0 Pro profiler it usually was under 0.010 ms of time taken to execute the function. Most of the time it was around 0.008 ms.

-SirKnight

PH
04-11-2003, 10:18 PM
Cass,

The presentation from GDC was very interesting. I like the sem-automatic shadow volume method ( I implemented it using 8 instructions for local lights ). The formula in the presentation seems to be wrong.

pos = pos*pos.w + (pos*L.w - L*pos.w)*(1-pos.w)

For L.w = 1, and a vertex with w = 0:
pos = pos*0 + (pos*1 - L*0) * (1-0) = pos

and w = 1
pos = pos*1 + (pos*1 - L*1) * (1-1) = pos

So you get the original vertex position is both cases ( unless I misunderstood something in the presentation http://www.opengl.org/discussion_boards/ubb/smile.gif ).

Anyway, I have a question on the scissor topic: when computing the convex hull formed by the light and the view frustum, is it neccessary to use the infinite view frustum or will it work using the ordinary frustum ( used when doing normal culling ) ? To me, it seems that as long as the shadows don't project into the finite frustum, they can be disregarded ( I'm looking at the comments in the "Generalized Shadow Volume Culling Rule" section ).

PH
04-12-2003, 12:10 AM
Originally posted by SirKnight:

Are you sure? Because from my tests w/ the function to compute the screenspace bounding rectangle of the bounds of a light to use with glScissor was quite fast I think. According to the VC++ 6.0 Pro profiler it usually was under 0.010 ms of time taken to execute the function. Most of the time it was around 0.008 ms.


I think Cass and Assassin were talking about actually clipping the shadow volumes. This is pretty expensive ( can be optimized quite a bit when using axis-aligned planes but is still non-trivial ).

Edit: For the interested, here's a shot (http://www.geocities.com/SiliconValley/Pines/8553/ClippedShadows.html) that shows clipped shadows with caps carved out of an axis-aligned box ( the expensive part ).


[This message has been edited by PH (edited 04-12-2003).]

cass
04-12-2003, 06:20 AM
That's right. The bounding box test is pretty simple, but clipping/capping the *actual* high poly shadow volume to the light bounds is a lot more expensive.

That you can get very similar fill consumption and cheaper geometry
processing with a relatively coarse
bounding box calculation is the important result.

Cass

cass
04-12-2003, 06:28 AM
Originally posted by PH:
Cass,

The presentation from GDC was very interesting. I like the sem-automatic shadow volume method ( I implemented it using 8 instructions for local lights ). The formula in the presentation seems to be wrong.

pos = pos*pos.w + (pos*L.w - L*pos.w)*(1-pos.w)

For L.w = 1, and a vertex with w = 0:
pos = pos*0 + (pos*1 - L*0) * (1-0) = pos

and w = 1
pos = pos*1 + (pos*1 - L*1) * (1-1) = pos

So you get the original vertex position is both cases ( unless I misunderstood something in the presentation http://www.opengl.org/discussion_boards/ubb/smile.gif ).

Anyway, I have a question on the scissor topic: when computing the convex hull formed by the light and the view frustum, is it neccessary to use the infinite view frustum or will it work using the ordinary frustum ( used when doing normal culling ) ? To me, it seems that as long as the shadows don't project into the finite frustum, they can be disregarded ( I'm looking at the comments in the "Generalized Shadow Volume Culling Rule" section ).

Hi Paul,

Let me check the presentation on this again. Sounds like an error though. Thanks! http://www.opengl.org/discussion_boards/ubb/smile.gif

On the scissor determination, if you clip to the frustum, you need to clip based on the region of possible shadow. If you know that no shadow can fall beyond your "original" far plane, then computing your scissor based on that far plane is fine.

The way that bounds are treated can be uniform. Any information you can use to further constrain the "region of possible shadow" is fair game.

Thanks -
Cass

SirKnight
04-12-2003, 10:23 AM
OOHHHH so that's what was meant. Ok well then in that case I agree that the bounding box method is much faster. http://www.opengl.org/discussion_boards/ubb/smile.gif Alright, forget about my other post then. I think I was thinking about something else there. :p

-SirKnight

[This message has been edited by SirKnight (edited 04-12-2003).]

SirKnight
04-12-2003, 02:13 PM
Just wondering, do we really need the larger scissor rectangle (the blue one in the demo) if we use the smaller (green) scissor rectangle? Or are both being drawn to show the difference of the light bounds scissor rect and the aggressive constrained scissor rect? If that's the case, and I think it is, then using the aggressive constrained scissor saves a ton more fill than the light bound scissor.

-SirKnight

cass
04-12-2003, 03:45 PM
You only need the smaller scissor.

The blue one was shown to give an indication of how much better you could do with per-object (vs per-light) scissor.

Thanks -
Cass

jwatte
04-12-2003, 06:02 PM
Question:

I understand that the scissor will be massive savings in many cases (except for degenerate cases, where nothing will save you) if you extrude to infinity.

If you extrude to the end of the light radius in a vertex shader, rather than extruding to infinity, wouldn't you get most (if not all) of the fill rate savings anyway? I like to avoid CPU geometry computations if possible.

I realize that there is a bit of a trade-off, because you have to extrude sufficiently further out from the light radius so that the edge segments don't linearly cut into the bounding sphere, which may be hard to do always do right. But CPU calculation seems like such a waste; especially if you're also doing skinning and stuff. It seems we're relegating vertex programs to apply the tangent space basis for us, and that's about it; we do a lot of transform work on the CPU that'll get re-done on the GPU...

SirKnight
04-12-2003, 06:08 PM
Ok that's what I thought. Thanks for the clarification.

BTW, I am using Visual Studio .NET and I can't get this demo to compile. Are you aware of this problem with .NET? I can't get hardly any nvidia demo to compile, some I can though. I think it has problems with the demos that use the vector class, well some of them anyway. Here is the error I get:




h:\NVIDIA Corporation\SDK\DEMOS\OpenGL\src\volume_intersect\ volume_intersect.cpp(489): error C2475: 'std::vector<_Ty,_Ax>::size' :
forming a pointer-to-member requires explicit use of the address-of operator ('&amp;') and a qualified name
with
[
_Ty=edge,
_Ax=std::allocator<edge>
]



Thanks.
-SirKnight

[This message has been edited by SirKnight (edited 04-12-2003).]

jra101
04-12-2003, 07:06 PM
You can fix that compile problem by adding "()" after the call to a.e.size on line 489 of volume_intersect.cpp.

I'll get an updated version of the zip file up in a bit that fixes this issue.

Ysaneya
04-13-2003, 02:49 AM
Brilliant! At first i thought it wouldn't be very usefull, because you'd need to calculate six ray/plane intersections per shadow volume edge. Then i realized you'd just use the bounding box of the object for that, instead of the real object; hence the CPU cost remains relatively low.

So far i've been capping my shadow volumes to the light radius with a vertex shader to save fill-rate, and didn't use the scissor test.

Y.

PH
04-13-2003, 04:01 AM
A few more quick questions on the presentation ( drifting slightly off topic ):

Is it really worth the trouble to build connected loops while extracting the silhouette ? Or put another way, is anyone actually doing this and can confirm whether this is a definite win ?

What about silhouette extraction of static occluders using the method from the "Silhouette Clipping" paper ?

I'll probably just keep those on my wish/todo list http://www.opengl.org/discussion_boards/ubb/smile.gif.

Edit:
No problems with the code after all.

[This message has been edited by PH (edited 04-13-2003).]

V-man
04-13-2003, 06:11 AM
>>>Is it really worth the trouble to build connected loops while extracting the silhouette ? Or put another way, is anyone actually doing this and can confirm whether this is a definite win ?<<<<

I did it for finding the true silhouette but that is way to expensive.

Now I'm using Cass's trick there and it's not necessary to connect the edges in the "pseudo-silhouette".

Of course if you do it, it will cost you.

SirKnight
04-13-2003, 06:38 AM
Originally posted by jra101:
You can fix that compile problem by adding "()" after the call to a.e.size on line 489 of volume_intersect.cpp.

I'll get an updated version of the zip file up in a bit that fixes this issue.


Ok thanks. I just realized that this actually was a very easy error to find and fix. :p If I only would have clicked on the error message I would have been brought right to it and saw the error. But I assumed it was one of the problems I had before where if I clicked on the error it would take me inside the vector file. I guess that's what I get for assuming. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

-SirKnight

cass
04-13-2003, 07:46 AM
Originally posted by PH:
Is it really worth the trouble to build connected loops while extracting the silhouette ? Or put another way, is anyone actually doing this and can confirm whether this is a definite win ?

Paul,

No hard evidence on this. It is mostly a function of how individual architectures make efficient use of bandwidth. Coherence of updates can have a significant impact on the overall fill efficiency.

Thanks -
Cass

SirKnight
04-16-2003, 02:44 PM
I just noticed something in the code for this demo. This function here:




vec4f compute_homogeneous_plane(vec4f a, vec4f b, vec4f c)
{
vec4f v, t;

if(a[3] == 0)
{ t = a; a = b; b = c; c = t; }
if(a[3] == 0)
{ t = a; a = b; b = c; c = t; }

// can't handle 3 infinite points
if( a[3] == 0 )
return v;

vec3f vb = homogeneous_difference(a, b);
vec3f vc = homogeneous_difference(a, c);

vec3f n = vb.cross(vc);
n.normalize();

v[0] = n[0];
v[1] = n[1];
v[2] = n[2];

v[3] = - n.dot(vec3f(a.v)) / a[3] ;

return v;
}


has some typos. First there is two exact copies of code one after another. This code here:




if(a[3] == 0)
{ t = a; a = b; b = c; c = t; }
if(a[3] == 0)
{ t = a; a = b; b = c; c = t; }


Also right after that it does this:



// can't handle 3 infinite points
if( a[3] == 0 )
return v;


Well that check there does not check if there were 3 points that are infinite, it only checks two. First it checks if a is an infinite point, then the points get moved over so a now becomes b. Then that code snippet above checks for an infinite point which now is a but used to be b. The third point never get's checked so this function only checks for two not three infinite points. Here is this function modified by me that is all fixed up.




vec4f compute_homogeneous_plane(vec4f a, vec4f b, vec4f c)
{
vec4f v;

// can't handle 3 infinite points
if( a[3] == 0 &amp;&amp; b[3] == 0 &amp;&amp; c[3] == 0 )
return v;

vec3f vb = homogeneous_difference(a, b);
vec3f vc = homogeneous_difference(a, c);

vec3f n = vb.cross(vc);
n.normalize();

v[0] = n[0];
v[1] = n[1];
v[2] = n[2];

v[3] = - n.dot(vec3f(a.v)) / a[3] ;

return v;
}


Alright there ya go, glad to help. http://www.opengl.org/discussion_boards/ubb/smile.gif

-SirKnight

cass
04-16-2003, 03:23 PM
SirKnight,

Hmm. I'm not seeing the bug.

What I intend(ed) to do was rotate
the vertices until a.w was nonzero.

If after two rotations a.w was still
zero, then I would have checked all 3 points.

Maybe I'm missing something?

Thanks -
Cass

SirKnight
04-16-2003, 03:36 PM
Oh wait I'm sorry cass I see what was going on now. I just saw two pieces of identical code and thought it was not needed. Ok ya I look at it again and see what you mean. Gosh I feel dumb. http://www.opengl.org/discussion_boards/ubb/frown.gif

-SirKnight

SirKnight
04-16-2003, 03:45 PM
Ok I just thought of a way that would still allow 2 infinite points at most that (unlike the code I posted) and would do away with those 9 moves and 3 comparisons.




vec4f compute_homogeneous_plane(vec4f a, vec4f b, vec4f c)
{
vec4f v;

// can't handle 3 infinite points
if( !( a[3] + b[3] + c[3] ) )
return v;

vec3f vb = homogeneous_difference(a, b);
vec3f vc = homogeneous_difference(a, c);

vec3f n = vb.cross(vc);
n.normalize();

v[0] = n[0];
v[1] = n[1];
v[2] = n[2];

v[3] = - n.dot(vec3f(a.v)) / a[3] ;

return v;
}


I hope this is better.

-SirKnight

cass
04-16-2003, 08:05 PM
Better, but homogeneous_difference() doesn't really like two infinite points, so you need to make sure that if two points are infinite that "a" is the non-infinite point.

Details, details... http://www.opengl.org/discussion_boards/ubb/smile.gif

Thanks -
Cass

SirKnight
04-17-2003, 05:11 AM
ARG, that dang homogeneous_difference function. Ya I see that would be bad if there are two infinite points and 'a' is one of them. Alright back to the drawing board. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

-SirKnight

Liquid
04-24-2003, 01:13 PM
1. It doesn't look like Doom3 uses scissor per object but per light.
Or what's an object in a static gameworld?
2. I don't think that the portal culling stuff for shadow volumes is that simple than it looks.
Some more information would be nice.
For example you can use a portal's scissor rect if the boundingbox of an object is outside the convex hull of the portal and the light, if you look through this portal, so the shadow volume stays in the areas behind the portal.
I think there must be more things like this.
3. The "Optimized Stencil Shadow Volumes" presentation was great!

[This message has been edited by Liquid (edited 04-24-2003).]

cass
04-24-2003, 05:57 PM
Originally posted by Liquid:
1. It doesn't look like Doom3 uses scissor per object but per light.
Or what's an object in a static gameworld?
2. I don't think that the portal culling stuff for shadow volumes is that simple than it looks.
Some more information would be nice.
For example you can use a portal's scissor rect if the boundingbox of an object is outside the convex hull of the portal and the light, if you look through this portal, so the shadow volume stays in the areas behind the portal.
I think there must be more things like this.
3. The "Optimized Stencil Shadow Volumes" presentation was great!

[This message has been edited by Liquid (edited 04-24-2003).]

1. That's true of the leaked version of Doom3 from a long time ago, but then nobody (that I'm aware of) was doing per-object scissor as a shadow volume optimization back then.

2. I agree, some examples of of portal culling would have been helpful.

3. Thanks!

Cass