Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 18

Thread: A hundred lights (Sending light data to shaders)

  1. #1
    Junior Member Newbie
    Join Date
    Nov 2012
    Posts
    17

    A hundred lights (Sending light data to shaders)

    I'm developing a lighting system for my Voxel engine. It's simply a list of lights containing the following information:

    • Light Position (Vec3)
    • Light Color (Vec3)
    • Light strength (float)


    I want to be a able to manage at least a hundred lights. So I have to send this data for each light to my Fragment shader, where I calculate the distance from the fragment and the light position for each light, and also take the angle between the normal of the surface to get a nice looking light without shadow.

    What is the fastest way of sending 100 * this data to my fragment shader? I would like it to be OpenGL 3.1- compaitable, so UBO aren't an option.

    Thanks

  2. #2
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    Just because you don't like the answer doesn't make it wrong. So I'll say it again:


    The most efficient way to do that would be to not do that.

    If you have a scene where you want 100 lights to affect an object, you can't use forward rendering anymore. Also, as a practical matter, I'm fairly sure that quite a lot of GL 2.x hardware couldn't handle a uniform array of 200 vec4s. And even if you could, your shader would probably choke attempting to do lighting over 100 lights in one pass.

    The performance simply isn't there for such a computation. Every pixel of overdraw will hurt by 100x. That's going to kill you in the end. So you need to use some form of deferred rendering.

    And in deferred rendering, you wouldn't really render by sending 100 lights to a single shader.

  3. #3
    Junior Member Newbie
    Join Date
    Nov 2012
    Posts
    17
    Quote Originally Posted by Alfonse Reinheart View Post
    Just because you don't like the answer doesn't make it wrong. So I'll say it again:


    The most efficient way to do that would be to not do that.

    If you have a scene where you want 100 lights to affect an object, you can't use forward rendering anymore. Also, as a practical matter, I'm fairly sure that quite a lot of GL 2.x hardware couldn't handle a uniform array of 200 vec4s. And even if you could, your shader would probably choke attempting to do lighting over 100 lights in one pass.

    The performance simply isn't there for such a computation. Every pixel of overdraw will hurt by 100x. That's going to kill you in the end. So you need to use some form of deferred rendering.

    And in deferred rendering, you wouldn't really render by sending 100 lights to a single shader.
    Actually you answer IS wrong. I tried it here on a 2.1 card, and I can calculate the 100 lights with a 27 FPS. I'm looking for more answer then just yours, that's why I ask it here. I'm not dumb, so again, if you don't have anthing usefull to say, please go away.

  4. #4
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    Actually you answer IS wrong. I tried it here on a 2.1 card
    Ahem: "I'm fairly sure that quite a lot of GL 2.x hardware couldn't handle a uniform array of 200 vec4s. And even if you could, your shader would probably choke attempting to do lighting over 100 lights in one pass." Emphasis added.

    The fact that you found a specific 2.1 card that this happens to work on doesn't change what I said. Is there 2.1 hardware that can actually store 800 uniform components? Sure; Shader Model 3 requires at least 224 vec4 registers. That doesn't change the fact that there's plenty of non-Shader Model 3 capable hardware that still supports OpenGL 2.1. NVIDIA supports SM3 from GeForce 6xxx onward, but AMD didn't get SM3 support until the Radeon X1xxx line.

    So it's great that it works on your card. That doesn't mean you should assume that any OpenGL 2.1 card can handle it.

    I can calculate the 100 lights with a 27 FPS
    What are you rendering at 27 fps? How much overdraw does your scene get? And what makes you think that a decent deferred rendering implementation won't perform much better?

    I'm looking for more answer then just yours
    OK, here's my other answer.

    In GL 2.1, you have 2 ways to communicate information to shaders: uniforms and textures. If you attempt to use textures, you will have to not only use a float texture, you will have to fetch from that texture twice for every light. And if you compute all of the lights in one pass, that's 200 texture fetches.

    No AMD card before the Radeon HD line (ie: OpenGL 3.x) can handle that. I know the Radeon X1xxx line claims to be able to support SM3, but they're lying; they can't actually handle that. They can many texture accesses, but not 200. So it doesn't matter which performs better; you cannot assume that any OpenGL 2.1 card can handle the texture form.

    Therefore, unless you want to restrict yourself to NVIDIA-only hardware (in which case, you should state that up-front), uniforms are your only option. So there is no "fastest way". There is simply the way that works.

    As for NVIDIA hardware, I would be absolutely shocked if fetching from a texture 200 times was faster than accessing an array of uniform values.

  5. #5
    Member Regular Contributor Nowhere-01's Avatar
    Join Date
    Feb 2011
    Location
    Novosibirsk
    Posts
    251
    i don't see much point discussing hardware capabilities for that case. because OP is doing things utterly wrong and inefficient in the first place, nobody in the right mind implements multiple light sources like this.

    if you(original poster) really want to stick to forward-rendering(for some non-logical reason), you can divide your voxel-space into sections find most contributing lights for each section, based on distance and brightness. when rendering each section, take most contributing lights and pass their parameters as uniforms. it still gonna suck at performance, but it's going to a lot faster than your way of doing things. 27 fps generally is not acceptable, and with a somewhat complex scene your performance, most probably, is going to be closer to 0.1 fps.

    but the only right answer to this question is deferred rendering. it's going to be faster and way more straight-forward in that case.

  6. #6
    Junior Member Newbie
    Join Date
    Nov 2012
    Posts
    17
    Quote Originally Posted by Alfonse Reinheart View Post
    Ahem: "I'm fairly sure that quite a lot of GL 2.x hardware couldn't handle a uniform array of 200 vec4s. And even if you could, your shader would probably choke attempting to do lighting over 100 lights in one pass." Emphasis added.

    The fact that you found a specific 2.1 card that this happens to work on doesn't change what I said. Is there 2.1 hardware that can actually store 800 uniform components? Sure; Shader Model 3 requires at least 224 vec4 registers. That doesn't change the fact that there's plenty of non-Shader Model 3 capable hardware that still supports OpenGL 2.1. NVIDIA supports SM3 from GeForce 6xxx onward, but AMD didn't get SM3 support until the Radeon X1xxx line.

    So it's great that it works on your card. That doesn't mean you should assume that any OpenGL 2.1 card can handle it.



    What are you rendering at 27 fps? How much overdraw does your scene get? And what makes you think that a decent deferred rendering implementation won't perform much better?



    OK, here's my other answer.

    In GL 2.1, you have 2 ways to communicate information to shaders: uniforms and textures. If you attempt to use textures, you will have to not only use a float texture, you will have to fetch from that texture twice for every light. And if you compute all of the lights in one pass, that's 200 texture fetches.

    No AMD card before the Radeon HD line (ie: OpenGL 3.x) can handle that. I know the Radeon X1xxx line claims to be able to support SM3, but they're lying; they can't actually handle that. They can many texture accesses, but not 200. So it doesn't matter which performs better; you cannot assume that any OpenGL 2.1 card can handle the texture form.

    Therefore, unless you want to restrict yourself to NVIDIA-only hardware (in which case, you should state that up-front), uniforms are your only option. So there is no "fastest way". There is simply the way that works.

    As for NVIDIA hardware, I would be absolutely shocked if fetching from a texture 200 times was faster than accessing an array of uniform values.
    Thank you for your post. Also I would like to apologise for my behaviour, I do appreciate your time and help. I just had a horrible day and I should not have abreacted that on you. (I hope that's correct English)

    Deferred rendering looks awesome. I'm just a little afraid that I, as a beginner, am not gonna understand this proccess. Is it really hard? Do you think that with alot of research I should be able to make it?

    Once again, I'm really sorry.

  7. #7
    Senior Member OpenGL Pro
    Join Date
    Jan 2007
    Posts
    1,217
    You need to be a little clearer on what your requirement is. When you say "100 lights" do you actually mean 100 in the entire scene (or even the entire map) and with maybe only 2, 3 or 4 affecting each surface? Or do you really want to be able to potentially pile 100 lights onto a single surface?

    If the former, the classic way of doing it in a forward renderer is the approach taken by Doom 3, which (roughly) breaks down like so:

    - Clear the entire screen to black.
    - Run a depth-only prepass.
    - Switch the depth func to GL_EQUAL, disable depth writing, enable additive blending (glBlendFunc (GL_ONE, GL_ONE)).
    - For each light: figure which surfaces it hits and draw those surfaces using a standard one-light shader.

    This obviously precludes the use of too many overlapping lights, as you'll then get into colossal overdraw problems (although the early-z capabilities of more recent hardware can help some with that, but since you're aiming at GL2.1 you can't rely on early-z being present, and even if present switching the depth func may cause it to be disabled). Note that it doesn't even attempt to combine multiple lights in a single pass - while that would be possible on modern hardware, the number of texture units needed (for diffuse, normal map, specular map, light projection texture and light falloff texture) goes beyond that which you can rely on having at this kind of downlevel GL_VERSION.

    This worked well enough in Doom 3 (which targetted a GL_VERSION even earlier than 2.1) but it's really pushing at the limits of the suitability of a classic forward rendering approach for this kind of workload.

    And so I'd urge you to reconsider your insistence on not using GL3.1+. With the kind of workload you're talking about, older hardware will not only be less capable (in terms of features/functionality), but also significantly slower. GL2.1 hardware (and that class of hardware genuinely can be considered absolutely ancient nowadays) just lacks the basic raw horsepower to handle the kind of scene complexity you're aiming for (and some really heavy Doom 3 scenes can prove bothersome for even more recent hardware), so you need to be quite a deal more realistic in setting your ambitions and/or target specs - either dial back on what you're trying to do, or bump your hardware requirements, because you really can't have both.

  8. #8
    Junior Member Regular Contributor
    Join Date
    Mar 2012
    Posts
    129
    If you are making a Minecraft clone, just pre-calculate the lighting. That, or if it needs to be dynamic, use deferred shading. Most of these Minecraft engines use a flood fill algorithm to bake lighting into the vertices.
    Deferred shading isn't really hard to implement, but it must be done early on since it changes the way your engine fundamentally does things. However, baked lighting is still much simpler, and will perform better.
    For baked lights, the light count pretty much doesn't matter, it will always be fast. It doesn't look as good as deferred though.

  9. #9
    Junior Member Newbie
    Join Date
    Nov 2012
    Posts
    17
    Thank you both for your info.

    I'm not really making a Minecraft clone, it's more like a Voxel-world mmorpg. (Yes, I still have a long way to go but I'm willing to learn and spend a few years on it )
    Also, I really don't like the flood-fill algorithm, as per-fragment just looks sooo much better.

    Deferred rending indeed looks pretty awesome. I'm gonna implement it. Any tips on good tutorials/information on the subject?

  10. #10
    Junior Member Regular Contributor
    Join Date
    Mar 2012
    Posts
    129
    Here is a tutorial: http://ogldev.atspace.co.uk/www/tuto...utorial35.html
    The tutorial uses the stencil buffer to cull fragments that are not affected by the light, but I would recommend you do not do this. Just use a simple depth test instead, since the stencil version will actually be much slower, unless you do some hacks as I posted here: http://www.opengl.org/discussion_boa...ferred-Shading

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •