PDA

View Full Version : Light Indexed Deferred Rendering - new technique



sqrt[-1]
12-30-2007, 09:14 PM
I have been experimenting with a lighting technique which I think might be new. This technique seems very obvious so I am hoping that people here can let me know of any prior papers. (before I make a fool of myself)

This approach simply assigns each light a unique index and then stores this index at each fragment the light hits, rather than storing all the light or material properties per fragment. These indexes can then be used in a fragment shader to lookup into a lighting properties table for data to light the fragment.

This technique can be broken down into three basic render passes:

1) Render depth only pre-pass
2) Disable depth writes (depth testing only) and render light volumes into a light index texture.
Standard deferred lighting / shadow volume techniques can be used to find what fragments
are hit by each light volume.
3) Render geometry using standard forward rendering lighting is done using the light index
texture to access lighting properties in each shader.

What this achieves is the main advantages of deferred rendering (complex light object scene interactions) with ways around the disadvantages (fat buffer sizes, MSAA and transparency issues)

This technique has a obvious down side of limiting the number of lights that can hit a fragment - but this can be easily managed in a game editor context.
However, I think artists would prefer to have as many non-shadowing lights as they want and deal with overlap issues than the current situation of X lights per object and having to break up objects into small pieces.

I wrote up a paper explaining it fully here:
http://lightindexed-deferredrender.googlecode.com/files/LightIndexedDeferredLighting1.0.pdf

The demo with full source will follow soon when I have cleaned up the code.

oc2k1
12-30-2007, 09:31 PM
Nothing new..... you can accelerate the loop on a GF8 wit something like:


uniform vec4 col[8]; // dummy should be replaced by a true light calculation
uniform int mask; //the mask should be read from a integer texture
void main(void){
unsigned int i = mask;
while (i != 0){
unsigned int b = log2(i); //skip all zero bits
gl_FragColor += col[B]; //calculate light b
i -= 1 << b; // a xor calculation should do it too
}
}

A small problem in that loop are endless loops created by typing errors :P

sqrt[-1]
12-30-2007, 10:45 PM
oc2k1, if it is not new, could you tell me what games/apps/papers use this technique so I can update my paper?

Yeah I know you can do bit-math on Geforec 8 - even mention it in the paper - but I did not want to add anything I could not test.

Jan
12-31-2007, 06:28 AM
Although it might be "nothing new", i don't think any modern games use such an approach. I think the idea is good. How many overlapping lights one needs in practice, needs to be found out, but a assume a maximum of 4 lights per pixel should work pretty well for many games, since most games still use ambient lighting on a per-sector basis, instead of many non-shadow-casting lights.

I am not sure, whether this would work, but to store the light-indices, you might be interested in this demo by Humus:
http://www.humus.ca/index.php?page=3D

I haven't yet read your paper, so i am not sure, how exactly you want to store the light-indices per fragment.

Jan.

knackered
12-31-2007, 10:57 AM
Read your paper and like the technique.
As far as its advantages over my own forward rendering code, it's just saving me some CPU work finding the lights that affect objects, and saving me shader swapping. That's not much work saved, especially considering I sort by auto-generated shader. It also saves this work at the expense of a complicated shader, which won't scale well backwards to 3 year old cards.
What other advantages over forward rendering is it offering? You seem to focus on its advantages/disadvantages over deferred shading.

sqrt[-1]
12-31-2007, 05:25 PM
@Jan - Humus actually took that idea from a discussion I had on these forums about data packing:
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=230242#Post230242

@knackered - Well I wrote the test demo on a 3+ year old card (6800 GT) and with 80/60FPS and 1024x768 for one or two lights per pixel I think that is acceptable (and scene complexity is not affected with this technique - all fragment bound). (I also have not seriously profiled)
You did not say what sort of forward rendering you do, but assuming the "multiple lights in one shader" approach you will not really get any advantage if you only have a few lights on screen.
The best sort of case would be if your app features a terrain system with player point lights moving over it. You would typically not want to re-submit all the terrain for each point light (or break it up) and there may be more points lights than can easily be supported in a single shader.
So basically if you do not need to consider deferred rendering (eg. nighttime scene of city, or Christmas lights) this is probably not an ideal technique.
Also, I have worked with engines before where object ->light intersections were not exactly fast.

But half the beauty of this technique is is can be easily layered on top of existing forward rendering approaches. So if you want you can turn this on with high end cards only, and have lots of little PFX lights.(eg lights from plasma gun, lights from sparks falling, fire effects etc)

Timothy Farrar
01-01-2008, 09:57 PM
Interesting paper, thanks for the reference. The single pass packing via bit-shifting is really an awesome idea.

It would be REALLY interesting to see the performance of this on the newer hardware with constant buffers!

BTW, have you ever thought about trying to do image-space fake global illumination by using the final framebuffer RGB data and Z buffer, then simplify this (using something like a custom mipmap generation shader) to generate a much smaller framebuffer which contains a new point light source per pixel. Then use these new point light sources to light the next frame? Basically light the next frame with the bounce light from the previous frame ... might work great in your current light indexed deferred rendering framework.

sqrt[-1]
01-02-2008, 05:51 AM
Interesting paper, thanks for the reference. The single pass packing via bit-shifting is really an awesome idea.

It would be REALLY interesting to see the performance of this on the newer hardware with constant buffers!

BTW, have you ever thought about trying to do image-space fake global illumination by using the final framebuffer RGB data and Z buffer, then simplify this (using something like a custom mipmap generation shader) to generate a much smaller framebuffer which contains a new point light source per pixel. Then use these new point light sources to light the next frame? Basically light the next frame with the bounce light from the previous frame ... might work great in your current light indexed deferred rendering framework.


Actually, I think your blend Max packing version is more appropriate for games (when limited to two lights a fragment)

As constant buffers, when I release the source I am sure someone with a Geforce 8 could test this out *wink*

I actually did not think of faking global illumination like that, as I would not think the results would look any good. (as opposed to the light space positioning mentioned in the paper)

But hey, I never would have though screen space ambient occlusion would look and good either. When I run out of demo ideas I might give it a try. (going to try order independent transparency again and then mess with the Wii controller next)

Timothy Farrar
01-02-2008, 05:28 PM
Actually, I think your blend Max packing version is more appropriate for games (when limited to two lights a fragment)

As constant buffers, when I release the source I am sure someone with a Geforce 8 could test this out *wink*

I actually did not think of faking global illumination like that, as I would not think the results would look any good. (as opposed to the light space positioning mentioned in the paper)

But hey, I never would have though screen space ambient occlusion would look and good either. When I run out of demo ideas I might give it a try. (going to try order independent transparency again and then mess with the Wii controller next)


Send me a message when you get your example finished..

BTW, you should be able to get 4 bins in 1 pass using the max blend method if you separate the lights into two non-overlapping sets. There are some other tricks which can be used to do this as well.

Screen/image space methods are only now just beginning to be explored as the GPUs are fast enough to do them in real time. All sorts of stuff can be done in image space, that while fake, look awesome (great for games). Take Mario Galaxy for example, I'd bet that all the reflections/refractions (like on the crystals, etc) are done using a copy of the framebuffer. Image space "refractive" transparency is easily done this way. Image space subsurface scattering can be done very fast if you cheat! Now that image space ambient is here, it is only a matter of time before someone does image space global illumination approximations as well. You can even go all the way to parts of the lighting without ever using any of the traditional diffuse or specular methods and simply use only image space techniques. I'm doing this for my current project. Probably very tough to see in this screen shot, but

http://www.farrarfocus.com/atom/img/dfr1.jpg

sqrt[-1]
01-14-2008, 05:39 AM
OK the demo is up now. Keep in mind this is only a "tech" demo and is not really flashy at all. (based off an old Humus demo)
(It will probably only work for Geforec 6/7/8 users - have not tested ATI at all)
Demo Link (http://lightindexed-deferredrender.googlecode.com/files/LightIndexedDeferredRendering1.0.zip)

Scene with 255 lights

http://lightindexed-deferredrender.googlecode.com/files/DemoScreenshot_small.jpg

Also there is a small revision to the main document:
Doc Ver 1.1 (http://lightindexed-deferredrender.googlecode.com/files/LightIndexedDeferredLighting1.1.pdf)

knackered
01-14-2008, 10:31 AM
I like it. I've been racking my brains trying to think of ways of improving it. Like storing a texture containing all the combinations of lights and then using a single index to reference the fragments particular combination. Crap idea, I know, but at least it's got me thinking.

Timothy Farrar
01-14-2008, 11:06 PM
Just installed Windows a few days ago. So I can actually try your binary, and on a XFX GeForce 8600 GTS XXX (overclocked), got for 4 lights/pix,

640x480 -> 100 fps deferred / 30 fps non-deferred
1024x768 -> 85 fps deferred / 28 fps non-deferred
1600x1200 -> 38 fps deferred / 25 fps non-deferred

Number of lights/pix didn't make any difference in framerate.

sqrt[-1]
01-15-2008, 03:22 AM
FYI: If you "massage" the makefile you can probably compile it on Linux.

I find it strange that varying the number of "lights per pixel" does not change the frame rate. (I will see if I can run it on a 8600 to compare)

sqrt[-1]
01-15-2008, 08:10 PM
OK I managed to try it on a Nvidia 8600 GT and here are my results:

Resolution / Lights per pixel/ Framerate

640x480 4x - 175 FPS
640x480 2x - 245 FPS
640x480 1x - 295 FPS
640x480 non-deferred - 35 FPS

1024x768 4x - 95 FPS
1024x768 2x - 140 FPS
1024x768 1x - 180 FPS
1024x768 non-deferred - 29 FPS

So are you sure changing the lights per pixel has no difference? (also make sure deferred lighting is enabled - as it is only a deferred lighting setting)

sqrt[-1]
01-26-2008, 01:05 AM
Ok a update to the demo is now available:
http://lightindexed-deferredrender.googlecode.com/files/LightIndexedDeferredRendering1.1.zip

This should fix most ATI issues and by default it uses a moving light scene.

Jan
01-26-2008, 03:48 AM
Without indexed lights it runs very slow (<= 5 FPS) when looking at the center of the room, and very fast when looking into the corners.

With indexed lights there is no lighting, though it runs smooth (> 60 FPS).

ATI Radeon X1600 Mobility, Catalyst 8.1.

Jan.

sqrt[-1]
01-26-2008, 05:07 AM
With indexed lights there is no lighting, though it runs smooth (> 60 FPS).


That is very strange, I tested a Radeon 9550 and Humus said he tested x1800 and HD2900 XT cards.

Did you try the 1 or 2 lights per fragment option? or toggling the stencil?

Perhaps even using the "Precision test" option and seeing if you only see yellow.

Jan
01-26-2008, 05:31 AM
I checked all options. Toggling number of lights and stencil doesn't change anything. Enabling "Precision Test" makes the whole screen yellow (except for the HUD, of course).

Jan.

tamlin
01-26-2008, 11:21 PM
sqrt[-1], and Humus, did you use cat 8.1 or older drivers?

Just speculating that 8.1 introduced ... unintended behaviour.

Or could it be some issue that it's "Mobility"? (UMA?)

sqrt[-1]
01-27-2008, 04:39 AM
I used Cat 8.1 on the 9550.

Jan
01-27-2008, 05:42 AM
From my experience there is no real difference between desktop and laptop GPUs, only that they are slower.

However, since you are doing stuff with the multisampling buffer, maybe that part of the GPU is a bit different? Maybe Humus knows more about that.

Jan.

sqrt[-1]
01-27-2008, 07:52 AM
However, since you are doing stuff with the multisampling buffer, maybe that part of the GPU is a bit different? Maybe Humus knows more about that.
Jan.

I don't think I use the multisample buffer in any real way for this demo. Sure you can enable it via the F1 render menu but it should not really affect the output much. (Perhaps you are thinking of the Humus order independent transparency demo, or perhaps my paper where I discuss ways of handling MSAA - but I don't implement these in the demo)

If you are using multisampling - perhaps toggling it off via the F1 menu - or check to see if you have it forced on via ATI control panel options? (perhaps even other options via the control panel are being set?

Jan
01-27-2008, 10:47 AM
Erm, yes, i thought about those demos, i mixed that up...

Hm, i will check the options, but i just installed Catalyst 8.1 two days ago, it should have the default options set (i usually don't change them, at all).

I tell you, if i find anything,
Jan.

Groovounet
01-27-2008, 01:28 PM
I am very impressed but this technique. So many lights, so fast and looks so great!

These are my result on a Core 2 Quad at 2.4GHz with a 8800 GT:
640x480 => 450 FPS deferred / 90 FPS non-deferred
800x600 => 360 FPS deferred / 85 FPS non-deferred
1024x768 => 275 FPS deferred / 80 FPS non-deferred

zed
01-30-2008, 02:38 PM
nice one sqrt[-1], one thing though the scene is unrealistic ( unless youre modelling a swarm of fireflys ), typically in games lights are gonna have much larger radiuses

im assuming
A/ the larger the radius the worse this technique will perform + in fact for some senerios this method will be slower due to the extra perpixel shader work.
B/ the greater the chance for error (due to the per light num per pixel limit being exceeded)

is there a way to alter all lights radius at once?

from the lightpositions.h file
i assume the 40 is radius but changing it doesnt alter the lights radius

LightData(vec3(0.000000f,0.000000f,1.000000f), vec3(568.066345f,-247.719604f,174.817795f), 40.000000f),

sqrt[-1]
01-30-2008, 08:40 PM
Yes, the scene is unrealistic - a real scene has more than 5 render calls and typically more than 40,000 polygons. Hence in a real scene this technique would be even better. :D (Especially if you have 100's of people with torches walking around on terrain)

A) The light radius have very little impact on per-pixel shader work - the only increase is the "invisible" light volume overdraw that is typical of shadow volume and other deferred rendering approaches.

B) While there is a greater chance of error with over draw - there are very few cases where a pixel is influenced by more than one or two lights in a game scene (excluding main ambient and directional lights) So with this technique you can say to your artists - place as many lights as you want, just don't have more than 4 overlaps. (or less if you want more speed)

Also keep in mind that you can use this for PFX type lights (eg from sparks/plasma gun etc where the overlap count is not critical.

The lightpositions.h file contains all the light position for the static light scene (option via the F1 menu) If you change all the third parameters in this file (as you were doing) that will update the light volume sizes. You can also mess with light placement manually by pressing "E" for editor mode. See the readme.txt for info on this.

If you want to change the size in the moving light scene - see the file App_Utils.cpp in the PFX Spawn method.

Basically this technique came about because in a previous game I was working on, artists were creating huge buildings (10,000's of polygons) and there might be small flickering street lights that hit the building. Even tho the scissor operation ensured that no extra fragment work was done - the vertex work was huge for each light.

zed
02-02-2008, 07:41 PM
If you change all the third parameters in this file (as you were doing)
this doesnt work for me, ie i change them from 38,39,40 to 338->340 (+ save) but they remain the same size

(edit) wait, i thought it was just a data file, im guessing i have to recompile, a few to many hoops. cheers anyways

knackered
02-04-2008, 05:41 AM
every street light was dynamic?!

sqrt[-1]
02-04-2008, 05:38 PM
No, there were only a few problem areas where these lights were causing issues.

zed
02-04-2008, 11:18 PM
]B) While there is a greater chance of error with over draw - there are very few cases where a pixel is influenced by more than one or two lights in a game scene (excluding main ambient and directional lights) So with this technique you can say to your artists - place as many lights as you want, just don't have more than 4 overlaps. (or less if you want more speed)

ok i see where youre coming from, its more useful for static lights. the stuff im doing 90% of the lights are dynamic (explosions/lazers), its not uncommon for pixels to be shaded by 10+ lights. i know ppl say pick the closest/brightest 4 lights after that u wont notice the difference. but from my testing the difference is easily noticable.
creating shadows for said lights is more of a bottleneck anyways :)

zed
02-07-2008, 04:28 PM
http://www.zedzeek.com/junk/sqrt-1.jpg
ok cause i was so curious i did try out the demo, much less painless than i was expecting kudos to humus + sqrt[-1]

performance is ~7x better with the deferred technique (theres quite a few errors) but on the whole impressive, im not to sure i should do it though as shadow generation per light is more of a bottleneck than the actual light shading, but still nice to be aware of