PDA

View Full Version : Shadow mapping for point lights demo



Humus
09-21-2002, 10:43 AM
I've written a demo of shadowmapping for point lights (source is available). One of my better demos IMO. http://www.opengl.org/discussion_boards/ubb/smile.gif
Will run on Radeon 8500/9700.

It's available on my website:
http://humus2.campus.luth.se/~humus/
(Note, only temporary address, will be back to esprit.campus.luth.se in a week or so)

PH
09-21-2002, 11:28 AM
Meget flot, Humus http://www.opengl.org/discussion_boards/ubb/smile.gif.

Runs smoothly on my 8500 ( looks very good too ). Definitely one of your best.

zed
09-21-2002, 01:15 PM
looks good (i cant run it though on my gf2mx)
so can i/we have a few states please http://www.opengl.org/discussion_boards/ubb/smile.gif
average fps -
windowsize -
number of lights -
number of polygons -

cheers zed

Humus
09-21-2002, 10:45 PM
On my Radeon 8500 I get 70-80fps in 1024x768x32 fullscreen. There's two lights, and uhm 20 quads (not particularily complex geometry http://www.opengl.org/discussion_boards/ubb/smile.gif). This can be compared to my stencil shadows demo which runs at 65-75fps in the same res, also uses two lights but only 8 quads of which only two casts shadow valumes as I know on beforehand that they are the only ones that will cast shadows.

Also, this method is more scaleable, you can use a smaller cubemap for better performance, going down to 128x128 (instead of current 256x256) boosts performance about 20fps, even though quality is worse of course, but still acceptable except on closeups.

Edit: 20 quads, not 14 http://www.opengl.org/discussion_boards/ubb/smile.gif

[This message has been edited by Humus (edited 09-22-2002).]

Ozzy
09-21-2002, 10:56 PM
Really interesting! planned to write using ARB_fragment_prg some day? ;)

Humus
09-22-2002, 12:38 AM
Well, depends on how soon I get my hands on hardware and drivers supporting it.

Ysaneya
09-22-2002, 03:23 AM
Got around 90-100 fps. Pretty good demo, except it's limited to a simple scene with a small light radius. I'd be more interested in a moderately complex scene (5k tris) with a big radius (like 50 to 100m).. Granted, it's more work..:)

Y.

LaBasX2
09-22-2002, 03:42 AM
You're demo is looking great from the screenshot but I have problems here with my Radeon 9000...There is no shadow and no perpixellighting visible...Only some sort of ambient (with textures) seems to be visible...
When I click debug from the menu it says "invalid window handle" (tanslated from german error message)

LaBasX2

Humus
09-22-2002, 04:03 AM
Originally posted by Ysaneya:
Got around 90-100 fps. Pretty good demo, except it's limited to a simple scene with a small light radius. I'd be more interested in a moderately complex scene (5k tris) with a big radius (like 50 to 100m).. Granted, it's more work.. http://www.opengl.org/discussion_boards/ubb/smile.gif

Y.

This is more of a prototype application. I'm working on a larger project with much more complex geometry. Now that I've found the technique works I can implement it in this larger project.

Humus
09-22-2002, 04:08 AM
Originally posted by LaBasX2:
You're demo is looking great from the screenshot but I have problems here with my Radeon 9000...There is no shadow and no perpixellighting visible...Only some sort of ambient (with textures) seems to be visible...
When I click debug from the menu it says "invalid window handle" (tanslated from german error message)

LaBasX2

This error occures for everyone, even for me and everyone else the demo works for. I'm not sure why, but I've tracked down this error to the wglMakeCurrent(hPdc, hPrc) call. I'm pretty sure all parameters are correct, and I've checked that I do everything correct ... and after all, it works. So I don't know really ...

There seams to be problems with this demo on Radeon 9700 for some reason though, but for the 9000 was unexpected as it's very similar to the 8500, for which it works.

LaBasX2
09-22-2002, 04:13 AM
Just updated my drivers from 7.73 to the very new 7.76 and now it's working http://www.opengl.org/discussion_boards/ubb/smile.gif

Looking amazing, great work Humus!

[This message has been edited by LaBasX2 (edited 09-22-2002).]

PH
09-22-2002, 04:20 AM
There seams to be problems with this demo on Radeon 9700 for some reason though, but for the 9000 was unexpected as it's very similar to the 8500, for which it works.


I have problems with the 9700 and ATI_fragment_shader too ( with my own shaders ) ? I don't get any bumpmapping.

Humus
09-22-2002, 05:10 AM
I have after request tried to make an implementation of it with register combiners. If someone with a GF3 or higher want to try would be nice. Chances of it working would be like 10% though I suppose as I don't have any hardware to test it on. http://www.opengl.org/discussion_boards/ubb/smile.gif

LaBasX2
09-22-2002, 09:03 AM
Humus, you are using a 3d texture for the radial distance. So you have a maximum of 8 bit precision. Usually that isn't enough for normal shadow mapping and so there will probably be many shadow artifacts in larger scenes, right?

Thanks
LaBasX2

Humus
09-22-2002, 09:51 AM
Yup, that's right, if the light has a large radius. The size of the scene isn't too important though, but geometric detail may be.

I hope to get my Radeon 9700 soon so I can implement the distance calculation as pure math instead of texture lookup in the pixel shader, plus having 16bit/channel textures to render to.

Ysaneya
09-22-2002, 09:59 AM
Wouldn't it be possible to use the distance to the near plane instead ? Then, all you need to do is a DP4 operation to calculate the distance.. but well, i've never implemented shadow maps so i don't know...?

Y.

MZ
09-22-2002, 10:30 AM
glCombinerParameteriNV(GL_NUM_GENERAL_COMBINERS_NV , 5 /*instead of 6*/);
(...)
/*replace 4th stage with this: */
glCombinerInputNV(GL_COMBINER4_NV, GL_RGB, GL_VARIABLE_A_NV, GL_ZERO, GL_SIGNED_IDENTITY_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER4_NV, GL_RGB, GL_VARIABLE_B_NV, GL_ZERO, GL_SIGNED_IDENTITY_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER4_NV, GL_RGB, GL_VARIABLE_C_NV, GL_TEXTURE0_ARB, GL_SIGNED_IDENTITY_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER4_NV, GL_RGB, GL_VARIABLE_D_NV, GL_ZERO, GL_UNSIGNED_INVERT_NV, GL_RGB);
glCombinerOutputNV(GL_COMBINER4_NV, GL_RGB, GL_DISCARD_NV, GL_DISCARD_NV, GL_SPARE0_NV,
GL_NONE, GL_NONE, GL_FALSE, GL_FALSE, GL_TRUE);

At first run yor app exited reporting lack of WGL_ARB_render_texture. In your Framework when you query for WGL_EXT/ARB_extensions_string you should check whether wglGetProcAddress("wglGetExtensionsStringARB") returns non-NULL, rathen then scan extensions string (spec says that, BTW).

After these patches it runs nicely on my machine (20 fps, GF3 + 40.41 + AthlonXP 1700)

I don't know how exactly your program works yet, I just made NV RC do the same work as ATI FS http://www.opengl.org/discussion_boards/ubb/wink.gif



[This message has been edited by MZ (edited 09-22-2002).]

Humus
09-22-2002, 01:13 PM
Originally posted by Ysaneya:
Wouldn't it be possible to use the distance to the near plane instead ? Then, all you need to do is a DP4 operation to calculate the distance.. but well, i've never implemented shadow maps so i don't know...?

Y.

Well, you'll then have a different distance calculation for each cubemap face, something that will singificantly complicate the problem when you do the compare with the stored value.

Humus
09-22-2002, 01:36 PM
Originally posted by MZ:
At first run yor app exited reporting lack of WGL_ARB_render_texture. In your Framework when you query for WGL_EXT/ARB_extensions_string you should check whether wglGetProcAddress("wglGetExtensionsStringARB") returns non-NULL, rathen then scan extensions string (spec says that, BTW).

After these patches it runs nicely on my machine (20 fps, GF3 + 40.41 + AthlonXP 1700)

I don't know how exactly your program works yet, I just made NV RC do the same work as ATI FS http://www.opengl.org/discussion_boards/ubb/wink.gif

Hmm ... seams like that note in the spec has passed me by completely. The first thing that springs to mind is, "why?". I've yet to understand the purpose of the WGL_ARB_extension_string at all btw. Anyway, I've fixed that code now though.

Updated with your fixed RC code, the "6" was just a silly cut'n'paste error http://www.opengl.org/discussion_boards/ubb/smile.gif

20fps sounds a little low though, what resolution is that?

MZ
09-23-2002, 07:18 AM
maximized window on 1162 x 864 desktop: 20 fps
fullscreen 640 x 480: 30 fps

Humus
09-23-2002, 08:22 AM
That would confirm that it's the rendering to the cubemap that's slow. It's as if the card doesn't support rendering to cubemaps natively and does a copy-to-texture operation. Still that shouldn't make it this slow.

harsman
09-23-2002, 08:33 AM
I think render to texture is slower than copy to texture on nvidia hardware, especially with older drivers. It has improved with recent driver releases but it might still be slower. I haven't done or seen any recent benchmarks.

Lars
09-23-2002, 09:40 AM
It could also be because of the 5 general combiners u are using. There is an Nvidia presentation where they said how expensive the combiners are but i don't remeber which one it was.
On Geforce3 up,i think only two general combiners are totally free, till four costs twice as much and till 8 four times as much (without guarantee). So if you can reduce the number of stages, it could get a bit faster.
I ve seen you where not using the final combiner (just passing D), maybe you can squeeze something from one general combiner into the final combiner and reduce the number of used ones to 4.

Maybe the following works:



glCombinerInputNV(GL_COMBINER2_NV, GL_ALPHA, GL_VARIABLE_A_NV, GL_TEXTURE2_ARB, GL_SIGNED_IDENTITY_NV, GL_BLUE);
glCombinerInputNV(GL_COMBINER2_NV, GL_ALPHA, GL_VARIABLE_B_NV, GL_ZERO, GL_UNSIGNED_INVERT_NV, GL_BLUE);
glCombinerInputNV(GL_COMBINER2_NV, GL_ALPHA, GL_VARIABLE_C_NV, GL_CONSTANT_COLOR0_NV, GL_SIGNED_IDENTITY_NV, GL_BLUE);
glCombinerInputNV(GL_COMBINER2_NV, GL_ALPHA, GL_VARIABLE_D_NV, GL_ZERO, GL_UNSIGNED_INVERT_NV, GL_BLUE);
glCombinerOutputNV(GL_COMBINER2_NV, GL_ALPHA, GL_DISCARD_NV, GL_DISCARD_NV, GL_SPARE0_NV, GL_NONE, GL_NONE, GL_FALSE, GL_FALSE, GL_FALSE);


glCombinerInputNV(GL_COMBINER3_NV, GL_RGB, GL_VARIABLE_A_NV, GL_ZERO, GL_SIGNED_IDENTITY_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER3_NV, GL_RGB, GL_VARIABLE_B_NV, GL_ZERO, GL_SIGNED_IDENTITY_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER3_NV, GL_RGB, GL_VARIABLE_C_NV, GL_ZERO, GL_UNSIGNED_INVERT_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER3_NV, GL_RGB, GL_VARIABLE_D_NV, GL_ZERO, GL_UNSIGNED_INVERT_NV, GL_RGB);
glCombinerOutputNV(GL_COMBINER3_NV, GL_RGB, GL_DISCARD_NV, GL_DISCARD_NV, GL_SPARE0_NV, GL_NONE, GL_NONE, GL_FALSE, GL_FALSE, GL_TRUE);

glFinalCombinerInputNV(GL_VARIABLE_E_NV, GL_TEXTURE0_ARB,GL_UNSIGNED_IDENTITY_NV, GL_RGB);
glFinalCombinerInputNV(GL_VARIABLE_F_NV, GL_PRIMARY_COLOR_NV,GL_UNSIGNED_IDENTITY_NV, GL_RGB);
// Was it E_TIMES_F ??? not shure, but something like that

glFinalCombinerInputNV(GL_VARIABLE_A_NV, GL_E_TIMES_F_NV,GL_UNSIGNED_IDENTITY_NV, GL_RGB);
glFinalCombinerInputNV(GL_VARIABLE_B_NV, GL_SPARE0_NV, GL_UNSIGNED_IDENTITY_NV, GL_RGB);
glFinalCombinerInputNV(GL_VARIABLE_C_NV, GL_ZERO, GL_UNSIGNED_IDENTITY_NV, GL_RGB);
glFinalCombinerInputNV(GL_VARIABLE_D_NV, GL_ZERO, GL_UNSIGNED_IDENTITY_NV, GL_RGB);



This should do the same but only using 4 general combiners. I can't test it, because i only have a geforce2go and dont like the emulation mode that much :-)

Lars

edit: some code layout changes

[This message has been edited by Lars (edited 09-23-2002).]

Lars
09-23-2002, 09:49 AM
...it is even simpler, just do the operation from combiner 2 in combiner four, and then remove one combiner... no need for the final combiner :-(



glCombinerInputNV(GL_COMBINER4_NV, GL_RGB, GL_VARIABLE_C_NV, GL_TEXTURE0_ARB, GL_UNSIGNED_IDENTITY_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER4_NV, GL_RGB, GL_VARIABLE_D_NV, GL_PRIMARY_COLOR_NV, GL_UNSIGNED_IDENTITY_NV, GL_RGB);


I hope it gets a bit faster...

Lars

[This message has been edited by Lars (edited 09-23-2002).]

Humus
09-23-2002, 11:03 AM
Hmm, yeah, that should work. But I don't think that's the bottleneck anyway. If it was, then there would be a larger difference between hi-res and low-res framerates. From MZ's post above:

maximized window on 1162 x 864 desktop: 20 fps
fullscreen 640 x 480: 30 fps

Purely fillratewise the 640x480 score should be more than three times higher. I also don't really think a GF3 should be that much slower than a Radeon 8500. I get 130fps in 640x480 and 60fps in 1152x864.