Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: Fast Stencil Light Volumes for Deferred Shading

  1. #1
    Junior Member Regular Contributor
    Join Date
    Mar 2012
    Posts
    129

    Fast Stencil Light Volumes for Deferred Shading

    Hello,

    A while back I posted about some deferred shading performance problems, and how using stencil light volumes actually slowed it down instead of speeding it up.
    This has probably been thought up before, but I thought I might share it here anyways in case it hasn't.
    I managed to batch light stencil tests into groups of 8 (8 stencil bits) by using glStencilMask to act as an OR operation to write which lights are affecting which pixels when performing a depth test on the front faces of the light volumes.
    In a second pass for the back faces of the light volumes, I then switch depth testing to GL_GREATER, and set the stencil func to render the light only if its bit was set earlier, using the and'ed mask parameter.

    With this system, there is no overdraw, and the stencil test is fast. Overall, the system is faster than without the stenciling.

    This is how the light rendering looks like:

    Code :
    // ---------------------------- Render Lights ----------------------------
     
     
        // Query visible lights
        std::vector<OctreeOccupant*> result;
     
     
        m_lightSPT.Query_Frustum(result, pScene->GetFrustum());
     
     
        glEnable(GL_VERTEX_ARRAY);
     
     
        glEnable(GL_STENCIL_TEST);
     
     
        glClearStencil(0);
     
     
        for(unsigned int i = 0, size = result.size(); i < size;)
        {
            glClear(GL_STENCIL_BUFFER_BIT);
     
     
            glStencilOp(GL_KEEP, GL_KEEP, GL_REPLACE);
     
     
            glColorMask(false, false, false, false);
     
     
            // Batch 8 lights together
            unsigned int firstLightIndex = i;
     
     
            for(unsigned int j = 0; j < 8 && i < size; j++, i++)
            {
                glStencilFunc(GL_ALWAYS, 0xff, 0xff);
                glStencilMask(m_lightIndices[j]);
     
     
                Light* pLight = static_cast<Light*>(result[i]);
     
     
                if(!pLight->m_enabled)
                    continue;
     
     
                pLight->SetTransform(pScene);
                pLight->RenderBoundingGeom();
            }
     
     
            i = firstLightIndex;
     
     
            glColorMask(true, true, true, true);
     
     
            // Now render with reversed depth testing and only to stenciled regions
            glCullFace(GL_FRONT);
     
     
            glDepthFunc(GL_GREATER);
     
     
            glEnable(GL_BLEND);
     
     
            glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
     
     
            for(unsigned int j = 0; j < 8 && i < size; j++, i++)
            {
                glStencilFunc(GL_EQUAL, 0xff, m_lightIndices[j]);
     
     
                Light* pLight = static_cast<Light*>(result[i]);
     
     
                if(!pLight->m_enabled)
                    continue;
     
     
                // If camera is inside light, do not perform depth test (would cull it away improperly)
                if(pLight->Intersects(pScene->m_camera.m_position))
                {
                    glDisable(GL_STENCIL_TEST);
     
     
                    pLight->SetTransform(pScene);
     
     
                    pLight->SetShader(pScene);
     
     
                    pLight->RenderBoundingGeom();
     
     
                    glEnable(GL_STENCIL_TEST);
                }
                else
                {
                    pLight->SetTransform(pScene);
     
     
                    pLight->SetShader(pScene);
     
     
                    pLight->RenderBoundingGeom();
                }
            }
     
     
            glCullFace(GL_BACK);
     
     
            glDepthFunc(GL_LESS);
     
     
            glDisable(GL_BLEND);
     
     
            Shader::Unbind();
        }
     
     
        // Re-enable stencil writes to all bits
        glStencilMask(0xff);
     
     
        glDisable(GL_VERTEX_ARRAY);
     
     
        glDisable(GL_STENCIL_TEST);
     
     
        GL_ERROR_CHECK();

  2. #2
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,213
    Quote Originally Posted by cireneikual View Post
    I managed to batch light stencil tests into groups of 8 (8 stencil bits) by using glStencilMask to act as an OR operation to write which lights are affecting which pixels when performing a depth test on the front faces of the light volumes. In a second pass for the back faces of the light volumes, I then switch depth testing to GL_GREATER, and set the stencil func to render the light only if its bit was set earlier, using the and'ed mask parameter.

    With this system, there is no overdraw, and the stencil test is fast. Overall, the system is faster than without the stenciling.
    That's interesting. Thanks!

    Say, curious question: did you happen to bench tile-based deferred with your system? That is, batching lights by screen tile, and then for each tile: read the G-buffer, apply the lights for that tile all-in-one-go, write the lighting buffer?

  3. #3
    Intern Contributor Godlike's Avatar
    Join Date
    May 2004
    Location
    Greece
    Posts
    67
    One thing you can do is use instancing for batched lights. This may improve things a bit. One bad thing with this implementation of deferred shading is that you speak allot with the API, also you clear the stencil buffer again and again.

    As Dark Photon mentioned tile based deferred shading is probably the way to go. I've recently implemented it and the performance advantage was INSANE. Now I can render 500 (limited by the UBO size at the moment) point lights without any significant impact. Before, the bottleneck was the lighting stage and now by far the material stage (where you create the G buffer). I am planning to write an article about the implementation I used so if someone is interested I can rush things a bit.

  4. #4
    Junior Member Regular Contributor
    Join Date
    Mar 2012
    Posts
    129
    @ Dark Photon: I tried the tiling method a while back, but it performed worse due to how I implemented it. I did it without compute shaders or OpenCL, since I wanted to support older hardware, and because I have no experience with either. All the tiling happened on the CPU, so it came out very CPU limited.
    @ Godlike: I couldn't get tiled deferred to work properly myself, so I would love to see that article! I want more lights

  5. #5
    Junior Member Regular Contributor Kopelrativ's Avatar
    Join Date
    Apr 2011
    Posts
    214
    I had a deferred shader updated with tile based lights. For each light, I created a quad that will precisely cover the light. It is positioned correctly in z, to make use of depth culling. The position is in front of the light, not at the z of the light source. There are some tricks to be aware of when the camera is inside the light (where my implementation still have some problems). The same technique can be used for a lot of nice effects, like adding spherical fogs, local color coded markers, etc. The vertex shader is as follows. The performance increased a lot, especially for lamps further away or hidden by objects. And that is the usual case, except for a few near lamps.
    Code :
    uniform vec4 Upoint;            // A light. .xyz is the coordinate, and .w is the strength (reach)
    layout (location = 0) in vec2 vertex;
    out vec2 screen;             // Screen coordinate
    void main(void)
    {
        float strength = Upoint.w;
        // Relative bounding (2D) box around the point light
        vec3 box =  vec3(vertex*2-1, 0)*strength;
        vec4 viewPos = UBOViewMatrix * vec4(Upoint.xyz, 1);
        vec3 d = normalize(viewPos.xyz);
        // We want to move the quad towards the player. It shall be moved so as
        // precisely be outside the range of the lamp. This is needed as the depth
        // culling will remove parts of the quad that are hidden.
        float l = min(strength, -viewPos.z-1); // Correction if camera inside light
        // The modelView is one of the corners of the quad in view space.
        vec4 modelView = -vec4(d, 0)*l + vec4(box, 0) + viewPos;
        vec4 pos = UBOProjectionMatrix * modelView;
        pos /= pos.w;
        gl_Position = pos;
        // Copy position to the fragment shader. Only x and y is needed. Scale it
        // from interval -1 .. 1, to the interval 0 .. 1.
        screen = pos.xy/2+0.5;
    }
    The fragment shader do the usual thing (add light intensity as a function of the distance from the lamp to the pixel).

  6. #6
    Advanced Member Frequent Contributor
    Join Date
    Apr 2009
    Posts
    592
    The system I use is.. funky. It goes like this, I have a render target that is essentially RGBA_8/16/32UI, depending on how many lights I wish to support in a single call. For each light I render a "light volume", in the same fashion as one does stencil shadows, but rather than incrementing, I just flip the bit of that integer buffer (I get away with this because the light volume is essentially the "shadow" of a planar polyon) This way all lights get drawn to the integer buffer. Then in the final pass, each bit of that integer buffer indicates if a light is active. In my system, a part of the G-buffer is an offset into a range of a texture buffer object that "lists" the lights the mesh worries about (done via a CPU computation) and the pass to do the lighting iterates over that range so that only those lights whose "bounding-whatever" that intersect the bounding box of a mesh are added. The main thing I get out of this is that I avoid lots more look ups (for doing a whatever per light means the g-buffer needs to be read again for each light), I avoid blending to add the lights (and the icky choice of doing FP16/FP32 blending or Fixed8 blending with banding)... this system also lets me do detect when any set of lights are active on one pixel by using a mask, opening up the door for drawing weird stuff (like change in a funky way if light A and light B are hitting the same thing).

  7. #7
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,213
    Quote Originally Posted by kRogue View Post
    I have a render target that is essentially RGBA_8/16/32UI, depending on how many lights I wish to support in a single call. For each light I render a "light volume", ... I just flip the bit of that integer buffer ... each bit of that integer buffer indicates if a light is active.
    Sounds like Light Indexed Deferred Rendering (also in ShaderX7)

  8. #8
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    411
    @kRogue: please paragraphs

    I've seen opening posts in forums that are a page long without spaces and its a wonder anyone replies. Spread the word!


    I just wanted to ask, since this thread seems to have become a general discussion. I want to implement a similar shader framework and I am wondering if it counts as "deferred" or not...

    It's really more about shadows than lights. In short the static (deterministic) elements of the scene have shadow geometry precomputed. The goal is to make accurate (soft umbra/penumbra etc) go anywhere shadows that do not have jagged / saw-toothed qualities. The scene is sectioned into chunks that each have up to 4 shadow generating lights. Then the shadows are drawn to an RGBA buffer one component per shadow with colour masks and a depth buffer. A second (MRT) buffer probably generates a depth texture for later lookup as I suspect copying the depth buffer into a texture would not fly.

    At this point shadows can be generated for non-deterministic elements of the scene via some real time algorithm; haven't given it much thought but it is complicated by the chunking. And shadows for blended geometry can be accumulated without writing to the depth buffer.

    Then the same depth buffer is used to draw the scene as usual discarding pixels that are behind the shadows and the lights are modulated by the greyscale values in the shadow buffer. If a pixel is in front (or on top) of a shadow its depth must be compared with the saved depth texture to determine if it is shadowed or not. Blended geometry can skip the depth comparison.

    Is this deferred lighting? There is no G buffer, but it's kind of flipped around. Also the shadows can be light instead (more technically the inverse of shadow) if a scene is more dark than light.
    Last edited by michagl; 09-17-2012 at 10:18 AM.
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

  9. #9
    Advanced Member Frequent Contributor
    Join Date
    Apr 2009
    Posts
    592
    It is sort-of-ish like it, but not quite the same. Basically the system has the following g-buffer:

    • Material ID (just a GL_R16UI)
    • diffuse and specular color packed into on GL_RGBA16UI
    • normal + depth
    • light bitfield buffer
    • some other buffers for FX and transparency


    The MaterialID is an offset into a texture buffer object that stores a header consisting of:
    • shader ID
    • lightBegin, lightEnd which gives a range of indices into another texture buffer object about what lights to worry about for the fragment
    • another pair of ranges for "custom" float data for the mesh of that fragment into another texture buffer object
    • another pair of ranges for "custom" uint data for the mesh of that fragment into another texture buffer object


    The data of the lights are all stored in one texture buffer object and each light has:

    • position of light
    • direction of light
    • color of light
    • radial attenuation coefficients (linear and quadratic)
    • angular attenuation range (cosine of angles stored instead of actual angle)
    • "light bit mask"


    the standard lighting shader, then loops over the lights in the range [lightBegin, lightEnd), the light is considered active if the lightbitmask locigcal-anded with light bitfield buffer matches with the lightbitmask. The use I have for this was having a light go through a portal. The face of the portal was always planar so the light volume it case was always ok to just do flipping. You can see the demos of this pet project at: http://www.youtube.com/playlist?list=PL2322715E8A420CCD ... sighs it has been a long time since I have had the time to do that project

  10. #10
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,213
    Quote Originally Posted by michagl View Post
    Is this deferred lighting?
    Sure doesn't sound like it.

    With Deferred Shading, you sample your materials into a screen-sized buffer and then go back and apply lighting to it.

    With Deferred Lighting, you reverse it: sample your lighting (irradiance) into a screen-sized buffer and then go back and apply materials to it.

    In both cases, you're sampling at the nearest opaque fragment within each pixel (or sample, if doing MSAA). Thus the complication with translucents.

    Neither of these necessarily requires that shadowing be handled for any/all light sources.

    ...the shadows are drawn to an RGBA buffer one component per shadow with colour masks and a depth buffer.
    What this does sound like is what I've seen called "Deferred Shadows", "Shadow Collector", or "Screen-space Shadow Mask". The idea is you sample your shadow term at the nearest opaque fragment within each pixel (or sample, if doing MSAA), and then just apply the shadowing term to each pixel (or sample) in your final pass when generating the composite radiance/luminance for each pixel (or sample).

    ...discarding pixels that are behind the shadows and the lights are modulated by the greyscale values in the shadow buffer.
    I'm guessing by this you don't mean behind the shadows, but behind the nearest opaque fragment, which is where the occlusion field (shadows) is sampled. (?)
    Last edited by Dark Photon; 09-17-2012 at 05:19 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •