A hundred lights (Sending light data to shaders)

Basaaa · June 8, 2013, 7:11am

I’m developing a lighting system for my Voxel engine. It’s simply a list of lights containing the following information:

[ul]
[li]Light Position (Vec3)
[/li][li]Light Color (Vec3)
[/li][li]Light strength (float)
[/li][/ul]

I want to be a able to manage at least a hundred lights. So I have to send this data for each light to my Fragment shader, where I calculate the distance from the fragment and the light position for each light, and also take the angle between the normal of the surface to get a nice looking light without shadow.

What is the fastest way of sending 100 * this data to my fragment shader? I would like it to be OpenGL 3.1- compaitable, so UBO aren’t an option.

Thanks

Alfonse_Reinheart · June 8, 2013, 7:15am

Just because you don’t like the answer doesn’t make it wrong. So I’ll say it again:

The most efficient way to do that would be to not do that.

If you have a scene where you want 100 lights to affect an object, you can’t use forward rendering anymore. Also, as a practical matter, I’m fairly sure that quite a lot of GL 2.x hardware couldn’t handle a uniform array of 200 vec4s. And even if you could, your shader would probably choke attempting to do lighting over 100 lights in one pass.

The performance simply isn’t there for such a computation. Every pixel of overdraw will hurt by 100x. That’s going to kill you in the end. So you need to use some form of deferred rendering.

And in deferred rendering, you wouldn’t really render by sending 100 lights to a single shader.

Basaaa · June 8, 2013, 7:37am

[QUOTE=Alfonse Reinheart;1251533]Just because you don’t like the answer doesn’t make it wrong. So I’ll say it again:

The most efficient way to do that would be to not do that.

If you have a scene where you want 100 lights to affect an object, you can’t use forward rendering anymore. Also, as a practical matter, I’m fairly sure that quite a lot of GL 2.x hardware couldn’t handle a uniform array of 200 vec4s. And even if you could, your shader would probably choke attempting to do lighting over 100 lights in one pass.

The performance simply isn’t there for such a computation. Every pixel of overdraw will hurt by 100x. That’s going to kill you in the end. So you need to use some form of deferred rendering.

And in deferred rendering, you wouldn’t really render by sending 100 lights to a single shader.[/QUOTE]

Actually you answer IS wrong. I tried it here on a 2.1 card, and I can calculate the 100 lights with a 27 FPS. I’m looking for more answer then just yours, that’s why I ask it here. I’m not dumb, so again, if you don’t have anthing usefull to say, please go away.

Alfonse_Reinheart · June 8, 2013, 9:52am

Actually you answer IS wrong. I tried it here on a 2.1 card

Ahem: “I’m fairly sure that quite a lot of GL 2.x hardware couldn’t handle a uniform array of 200 vec4s. And even if you could, your shader would probably choke attempting to do lighting over 100 lights in one pass.” Emphasis added.

The fact that you found a specific 2.1 card that this happens to work on doesn’t change what I said. Is there 2.1 hardware that can actually store 800 uniform components? Sure; Shader Model 3 requires at least 224 vec4 registers. That doesn’t change the fact that there’s plenty of non-Shader Model 3 capable hardware that still supports OpenGL 2.1. NVIDIA supports SM3 from GeForce 6xxx onward, but AMD didn’t get SM3 support until the Radeon X1xxx line.

So it’s great that it works on your card. That doesn’t mean you should assume that any OpenGL 2.1 card can handle it.

I can calculate the 100 lights with a 27 FPS

What are you rendering at 27 fps? How much overdraw does your scene get? And what makes you think that a decent deferred rendering implementation won’t perform much better?

I’m looking for more answer then just yours

OK, here’s my other answer.

In GL 2.1, you have 2 ways to communicate information to shaders: uniforms and textures. If you attempt to use textures, you will have to not only use a float texture, you will have to fetch from that texture twice for every light. And if you compute all of the lights in one pass, that’s 200 texture fetches.

No AMD card before the Radeon HD line (ie: OpenGL 3.x) can handle that. I know the Radeon X1xxx line claims to be able to support SM3, but they’re lying; they can’t actually handle that. They can many texture accesses, but not 200. So it doesn’t matter which performs better; you cannot assume that any OpenGL 2.1 card can handle the texture form.

Therefore, unless you want to restrict yourself to NVIDIA-only hardware (in which case, you should state that up-front), uniforms are your only option. So there is no “fastest way”. There is simply the way that works.

As for NVIDIA hardware, I would be absolutely shocked if fetching from a texture 200 times was faster than accessing an array of uniform values.

Nowhere-01 · June 8, 2013, 11:47am

i don’t see much point discussing hardware capabilities for that case. because OP is doing things utterly wrong and inefficient in the first place, nobody in the right mind implements multiple light sources like this.

if you(original poster) really want to stick to forward-rendering(for some non-logical reason), you can divide your voxel-space into sections find most contributing lights for each section, based on distance and brightness. when rendering each section, take most contributing lights and pass their parameters as uniforms. it still gonna suck at performance, but it’s going to a lot faster than your way of doing things. 27 fps generally is not acceptable, and with a somewhat complex scene your performance, most probably, is going to be closer to 0.1 fps.

but the only right answer to this question is deferred rendering. it’s going to be faster and way more straight-forward in that case.

Basaaa · June 8, 2013, 11:50am

[QUOTE=Alfonse Reinheart;1251540]Ahem: “I’m fairly sure that quite a lot of GL 2.x hardware couldn’t handle a uniform array of 200 vec4s. And even if you could, your shader would probably choke attempting to do lighting over 100 lights in one pass.” Emphasis added.

The fact that you found a specific 2.1 card that this happens to work on doesn’t change what I said. Is there 2.1 hardware that can actually store 800 uniform components? Sure; Shader Model 3 requires at least 224 vec4 registers. That doesn’t change the fact that there’s plenty of non-Shader Model 3 capable hardware that still supports OpenGL 2.1. NVIDIA supports SM3 from GeForce 6xxx onward, but AMD didn’t get SM3 support until the Radeon X1xxx line.

So it’s great that it works on your card. That doesn’t mean you should assume that any OpenGL 2.1 card can handle it.

What are you rendering at 27 fps? How much overdraw does your scene get? And what makes you think that a decent deferred rendering implementation won’t perform much better?

OK, here’s my other answer.

In GL 2.1, you have 2 ways to communicate information to shaders: uniforms and textures. If you attempt to use textures, you will have to not only use a float texture, you will have to fetch from that texture twice for every light. And if you compute all of the lights in one pass, that’s 200 texture fetches.

No AMD card before the Radeon HD line (ie: OpenGL 3.x) can handle that. I know the Radeon X1xxx line claims to be able to support SM3, but they’re lying; they can’t actually handle that. They can many texture accesses, but not 200. So it doesn’t matter which performs better; you cannot assume that any OpenGL 2.1 card can handle the texture form.

Therefore, unless you want to restrict yourself to NVIDIA-only hardware (in which case, you should state that up-front), uniforms are your only option. So there is no “fastest way”. There is simply the way that works.

As for NVIDIA hardware, I would be absolutely shocked if fetching from a texture 200 times was faster than accessing an array of uniform values.[/QUOTE]

Thank you for your post. Also I would like to apologise for my behaviour, I do appreciate your time and help. I just had a horrible day and I should not have abreacted that on you. (I hope that’s correct English)

Deferred rendering looks awesome. I’m just a little afraid that I, as a beginner, am not gonna understand this proccess. Is it really hard? Do you think that with alot of research I should be able to make it?

Once again, I’m really sorry.

mhagain · June 8, 2013, 12:23pm

You need to be a little clearer on what your requirement is. When you say “100 lights” do you actually mean 100 in the entire scene (or even the entire map) and with maybe only 2, 3 or 4 affecting each surface? Or do you really want to be able to potentially pile 100 lights onto a single surface?

If the former, the classic way of doing it in a forward renderer is the approach taken by Doom 3, which (roughly) breaks down like so:

Clear the entire screen to black.
Run a depth-only prepass.
Switch the depth func to GL_EQUAL, disable depth writing, enable additive blending (glBlendFunc (GL_ONE, GL_ONE)).
For each light: figure which surfaces it hits and draw those surfaces using a standard one-light shader.

This obviously precludes the use of too many overlapping lights, as you’ll then get into colossal overdraw problems (although the early-z capabilities of more recent hardware can help some with that, but since you’re aiming at GL2.1 you can’t rely on early-z being present, and even if present switching the depth func may cause it to be disabled). Note that it doesn’t even attempt to combine multiple lights in a single pass - while that would be possible on modern hardware, the number of texture units needed (for diffuse, normal map, specular map, light projection texture and light falloff texture) goes beyond that which you can rely on having at this kind of downlevel GL_VERSION.

This worked well enough in Doom 3 (which targetted a GL_VERSION even earlier than 2.1) but it’s really pushing at the limits of the suitability of a classic forward rendering approach for this kind of workload.

And so I’d urge you to reconsider your insistence on not using GL3.1+. With the kind of workload you’re talking about, older hardware will not only be less capable (in terms of features/functionality), but also significantly slower. GL2.1 hardware (and that class of hardware genuinely can be considered absolutely ancient nowadays) just lacks the basic raw horsepower to handle the kind of scene complexity you’re aiming for (and some really heavy Doom 3 scenes can prove bothersome for even more recent hardware), so you need to be quite a deal more realistic in setting your ambitions and/or target specs - either dial back on what you’re trying to do, or bump your hardware requirements, because you really can’t have both.

cireneikual · June 8, 2013, 1:21pm

If you are making a Minecraft clone, just pre-calculate the lighting. That, or if it needs to be dynamic, use deferred shading. Most of these Minecraft engines use a flood fill algorithm to bake lighting into the vertices.
Deferred shading isn’t really hard to implement, but it must be done early on since it changes the way your engine fundamentally does things. However, baked lighting is still much simpler, and will perform better.
For baked lights, the light count pretty much doesn’t matter, it will always be fast. It doesn’t look as good as deferred though.

Basaaa · June 8, 2013, 5:10pm

Thank you both for your info.

I’m not really making a Minecraft clone, it’s more like a Voxel-world mmorpg. (Yes, I still have a long way to go but I’m willing to learn and spend a few years on it :))
Also, I really don’t like the flood-fill algorithm, as per-fragment just looks sooo much better.

Deferred rending indeed looks pretty awesome. I’m gonna implement it. Any tips on good tutorials/information on the subject?

cireneikual · June 8, 2013, 6:30pm

Here is a tutorial: http://ogldev.atspace.co.uk/www/tutorial35/tutorial35.html
The tutorial uses the stencil buffer to cull fragments that are not affected by the light, but I would recommend you do not do this. Just use a simple depth test instead, since the stencil version will actually be much slower, unless you do some hacks as I posted here: http://www.opengl.org/discussion_boards/showthread.php/179049-Fast-Stencil-Light-Volumes-for-Deferred-Shading

Basaaa · June 8, 2013, 6:40pm

Ok thanks alot.

Basaaa · June 9, 2013, 11:06am

Okay.I got to the point where the first step of deferred shading works. Screenshot:

Bottom left looks fine. (world pos)
Top left is supposed to be the texture colors, but for some reason I get all black (works fine without deferred shading, it’s the same code)
Top right are the normals, looks not good for me
Bottom right looks fine, the tex coords.

Also, I only get 60 FPS. That’s not correct is it???
Last thing, the depth test doesnt seem to work. I assume this is because I replace the FBO manually and it doesn’t actually renders it?

cireneikual · June 9, 2013, 12:34pm

My system uses 4 buffers: View-space position (RGB16F) (doing it in view space means you don’t need 32 bit floats), view-space normals (RGB16F), diffuse color and specularity (RGBA8) (rgb - diffuse, a - specular), and an emissivity buffer (R16F).
I am not sure what you need to render the texture coordinates for.
You might have enabled vertical sync, and that is why you are only getting 60fps.
As for why you are only getting black when trying to render color, IDK. Can you show us your G buffer rendering shader?

Basaaa · June 9, 2013, 1:23pm

I fixed the texture issue, and vsync is disabled. I am sure that I should be able to get more FPS with this.

Basaaa · June 9, 2013, 3:36pm

Okay. I did some profiling, and I found out that Display.Update takes about 0.01 second. The rest takes almost nothing. What’s wrong???

mhagain · June 10, 2013, 2:16am

1 / 0.01 = 100fps - so that’s your theoretical maximum. You need to check out that “almost nothing” - all it needs is another 0.006666 seconds for it to limit you to 60fps. Also cross-check the accuracy of the timer you’re using for profiling.

On the other hand, deferred is a clear tradeoff. You need to accept that you’re going to have much higher bandwidth requirements in exchange for the capability to handle many more lights. There’s obviously a tipping-point beyond which either approach becomes preferable to the other.

Nowhere-01 · June 10, 2013, 2:27am

if you want to test how much time rendering a frame actually takes, you should place glFinish() call at the end of the frame(don’t forget to remove it afterwards, you shouldn’t have it in final code) and check the timer after calling to glFinish. this function waits until all rendering commands are actually finished.

imported_tonyo_au · June 11, 2013, 1:43am

If you want to do a large number of lights with forward rendering checkout the example on AMD’s site
http://developer.amd.com/tools-and-sdks/graphics-development/amd-radeon-sdk/

They have another example I just find the link at the moment.