Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 6 of 6

Thread: Best solution for dealing with multiple light types

  1. #1
    Newbie Newbie
    Join Date
    Dec 2017
    Posts
    2

    Question Best solution for dealing with multiple light types

    Hi all,

    I am working on my own 3D engine and I recently ran into an issue when trying to combine different light types using a single shader. Multiple lights of a single type work fine, but when I combine a Point light (with cube map shadows), Directional light (with 2D shadows), and Spot lights (also with 2D shadows) things started to break. I found a solution to this problem, but I wonder if there is a better way of doing it. Let me first summarise my initial solution that failed and then talk about the solution I found.

    I pass an array of lights to the shader that is used to render a mesh. This array is defined as follows in my shader:

    Code :
    #version 420
     
    const int nr_lights = 5;
     
    const int DIRECTIONAL_LIGHT = 0;
    const int SPOT_LIGHT = 1;
    const int POINT_LIGHT = 2;
     
    struct light {
    	int type;
    	bool enabled;
    	vec4 position;
    	vec4 diffuse;
    	vec4 ambient;
    	vec4 specular;
     
    	mat4 shadow_matrix;
     
    	float constant_attenuation;
    	float linear_attenuation;
    	float quadratic_attenuation;
     
    	vec3 direction;
    	float light_angle;
     
    	samplerCube cube_depth_texture;
    	sampler2DShadow depth_texture;
    };
    uniform light lights[nr_lights];

    For spot lights and directional lights I use a 2D shadow sampler to project the depth values. For point lights I created a cube texture which contain the linearised depth values. The beef of the lighting calculations are in the fragment shader and read as follows:

    Code :
    for (int i = 0; i < lights.length(); ++i)
    {
    	if (!lights[i].enabled)
    	{
    		continue;
    	}
     
    	vec4 halfVector = normalize(H[i]);
    	vec4 lightVector = normalize(L[i]);
     
    	float dotValue = max(dot(normalVector, lightVector), 0.0);
    	if (dotValue > 0.0)
    	{
    		float distance = length(lights[i].position - worldPos);
    		float intensity = 1.0;
    		if (lights[i].type != DIRECTIONAL_LIGHT)
    			intensity = 1.0 / (lights[i].constant_attenuation + lights[i].linear_attenuation * distance + lights[i].quadratic_attenuation * distance * distance);
    		vec4 ambient = material_ambient * lights[i].ambient;
     
    		bool inLight = true;
     
    		if (lights[i].type == SPOT_LIGHT)
    		{
    			vec3 nLightToVertex = vec3(normalize(worldPos - lights[i].position));
    			float angleLightToFrag = dot(nLightToVertex, normalize(lights[i].direction));
    			float radLightAngle = lights[i].light_angle * 3.141592 / 180.0;
     
    			if (angleLightToFrag < cos(radLightAngle))
    				inLight = false;
    		}
     
    		if (inLight)
    		{
    			float shadowf = 1;
    			if (lights[i].type == SPOT_LIGHT || lights[i].type == DIRECTIONAL_LIGHT)
    			{
    				shadowf = textureProj(lights[i].depth_texture, shadow_coord[i]);
    			}
    			else if(lights[i].type == POINT_LIGHT)
    			{
    				float sampled_distance = texture(lights[i].cube_depth_texture, direction[i].xyz).r;
    				float distance = length(direction[i]);
     
    				if (distance > sampled_distance + 0.1)
    					shadowf = 0.0;
    			}
     
    			vec4 diffuse = dotValue * lights[i].diffuse * material_diffuse;
    			vec4 specular = pow(max(dot(normalVector, halfVector), 0.0), 10.0) * material_specular * lights[i].specular;
    			outColor += intensity * shadowf * (diffuse + specular * 100);
    		}
     
    		outColor += intensity * ambient;
    	}
    }
    outColor += material_emissive;

    This clearly does not work due non-uniform control flow (a term I only learned about yesterday ).

    So, what I have done is to move all the texture lookups out of the non-uniform control flow. However, this means that I need to provide depth textures for all lights (even if they are not used for rendering) and sample both the cube and 2dShadow textures. Let me show you the updated fragment shader bit:

    Code :
    for (int i = 0; i < lights.length(); ++i)
    {
    	float spot_shadowf = textureProj(lights[i].depth_texture, shadow_coord[i]);
    	float sampled_distance = texture(lights[i].cube_depth_texture, direction[i].xyz).r;
    	if (!lights[i].enabled)
    	{
    		continue;
    	}
     
    	vec4 halfVector = normalize(H[i]);
    	vec4 lightVector = normalize(L[i]);
     
    	float dotValue = max(dot(normalVector, lightVector), 0.0);
    	if (dotValue > 0.0)
    	{
    		float distance = length(lights[i].position - worldPos);
    		float intensity = 1.0;
    		if (lights[i].type != DIRECTIONAL_LIGHT)
    			intensity = 1.0 / (lights[i].constant_attenuation + lights[i].linear_attenuation * distance + lights[i].quadratic_attenuation * distance * distance);
    		vec4 ambient = material_ambient * lights[i].ambient;
     
    		bool inLight = true;
     
    		if (lights[i].type == SPOT_LIGHT)
    		{
    			vec3 nLightToVertex = vec3(normalize(worldPos - lights[i].position));
    			float angleLightToFrag = dot(nLightToVertex, normalize(lights[i].direction));
    			float radLightAngle = lights[i].light_angle * 3.141592 / 180.0;
     
    			if (angleLightToFrag < cos(radLightAngle))
    			{
    				inLight = false;
    			}
    		}
     
    		if (inLight)
    		{
    			float shadowf = 1;
    			if (lights[i].type == SPOT_LIGHT)
    			{
    				shadowf = spot_shadowf;
    			}
    			else if(lights[i].type == POINT_LIGHT)
    			{
    				float distance = length(direction[i]);
     
    				if (distance > sampled_distance + 0.1)
    					shadowf = 0.0;
    			}
     
    			vec4 diffuse = dotValue * lights[i].diffuse * material_diffuse;
    			vec4 specular = pow(max(dot(normalVector, halfVector), 0.0), 10.0) * material_specular * lights[i].specular;
    			outColor += intensity * shadowf * (diffuse + specular * 100);
    		}
     
    		outColor += intensity * ambient;
    	}
    }
     
    outColor += material_emissive;

    This works! In my engine I create 2 dummy shadows of size 1x1, one is a GL_TEXTURE_2D stored as a GL_DEPTH_COMPONENT, the other is a GL_TEXTURE_CUBE_MAP that only stores GL_RED values. When less than 5 lights are needed to render a mesh I pass these values to the cube_depth_texture and depth_texture values of the respective light and set the isEnabled flag to false.

    While this does work, it creates a lot of overhead. In the worst case, when no lights are being used, it will still sample 10 textures!

    Is there a better way around this issue? My engine currently does forward rendering, it is not clear to me whether using a G-Buffer provides a cleaner solution. If I can I would like to stick to forward rendering, so any solution and comments you have are greatly appreciated.

    Many thanks!
    Bram

    P.S. For those interested, my 3D enigne Dreaded Portal Engine can be found here: http://bramridder.com/index.php/pers...-portal-engine

  2. #2
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,522
    Quote Originally Posted by Bram Ridder View Post
    This clearly does not work due non-uniform control flow (a term I only learned about yesterday ).

    So, what I have done is to move all the texture lookups out of the non-uniform control flow. However, this means that I need to provide depth textures for all lights (even if they are not used for rendering) and sample both the cube and 2dShadow textures.
    An alternative is to avoid using texture lookup functions which perform implicit derivative calculations, and instead calculate derivatives or LoD explicitly outside of the conditional and pass the result to textureProjGrad() or textureProjLod().

    However, this may still perform texture lookups in cases where the condition is false (it depends upon whether the hardware has branch instructions). If you're going to be perform lookups regardless, it would be better to use a 1x1 texture (or force the use of the 1x1 mipmap level of some texture) for cases where you don't need the result.

    If the hardware doesn't have branch instructions, then putting code inside a conditional doesn't avoid the cost of executing it, only the side-effects. So e.g. setting radLightAngle to π would avoid the need to use a conditional for the inside-cone test (cos(π)=-1, so the test will always be false).

  3. #3
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,170
    Quote Originally Posted by Bram Ridder View Post
    While this does work, it creates a lot of overhead. In the worst case, when no lights are being used, it will still sample 10 textures!

    Is there a better way around this issue? My engine currently does forward rendering, it is not clear to me whether using a G-Buffer provides a cleaner solution. If I can I would like to stick to forward rendering, so any solution and comments you have are greatly appreciated.
    I'd definitely see if you can meet your goals with small changes to your shader logic as GClements is suggesting.

    If after pursuing those, you bench your app and determine that the performance still isn't up to the level you need, profile carefully to determine exactly what the biggest bottleneck is (it helps to gather a few worst-case test cases). You can use the results as a filter to evaluate which tech approaches will reduce that inefficiency the most. Just using some intuition about how your rendering algorithms work will save time with this.

    If the main bottleneck ends up being the fact that you're using a shader supporting max(lights) and max(shadows) for all fragments on the entire screen and you can't easily avoid most of inefficiency associated with that with small shader changes, consider a tiled or clustered shading approach. Given your desire to stick with forward and the drawbacks of deferred approaches (which aren't insurmountable, but do require nontrivial effort), I'd suggest looking most closely at tiled or clustered forward shading techniques (websearch: tiled forward, clustered forward, and forward+ for the latest papers, blog posts, and conference presentations). However, be sure and profile other aspects of your rendering too (e.g. shadow casting and culling).
    Last edited by Dark Photon; Yesterday at 07:36 PM.

  4. #4
    Newbie Newbie
    Join Date
    Dec 2017
    Posts
    2
    Thanks for the very helpful feedback.

    I agree that using TextureProjGrad() or textureProjLod() is one way to solve this problem. Although, as Dark Photon mentioned, I need to check whether doing texture lookups using 1x1 textures does create a bottleneck.

    Thank you Dark Photon for letting me know about Forward+ and clustered methods. Did not even know these existed, very exiting!

    At the moment I cannot use more than 5 lights per mesh. I guess this is because the limit of 16 textures per shader? Or is there another limit that prohibits using an array of say 32 lights?

    In any case I have some research and then some coding to do .

  5. #5
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,522
    Quote Originally Posted by Bram Ridder View Post
    At the moment I cannot use more than 5 lights per mesh. I guess this is because the limit of 16 textures per shader? Or is there another limit that prohibits using an array of say 32 lights?
    Your "struct light" has 43 components; 6 of those would total 258 components, which may be exceeding some implementation limit. You can get around that by using textures (e.g. buffer textures), or you may be able to use uniform blocks or shader storage blocks. Note that you'd need to keep the samplers separate; you can't store samplers in uniform blocks, shader storage blocks or textures.

    If you hit the limit on the number of texture units, consider using array textures. These effectively allow you to aggregate multiple textures into a single texture, with the constraint that all layers must have the same format and dimensions, and sampling parameters (e.g. filter and wrap modes) apply to the texture as a whole.

  6. #6
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,170
    Quote Originally Posted by Bram Ridder View Post
    At the moment I cannot use more than 5 lights per mesh. I guess this is because the limit of 16 textures per shader? Or is there another limit that prohibits using an array of say 32 lights?
    You can use bindless texture or texture arrays to get past the 16 textures/shader.

    However, even if textures weren't limiting you (e.g. no point or spot light shadows), I suspect you'll hit other problems trying to push the number of lights up to even 32. If I were you, I'd just try it. This will provide valuable profiling data on which to base your future design decisions, and you can also see if you hit any big performance drop-offs or blocks as you increment the number of lights applied simultaneously from 1 to32.

    It's been years, but it seems like when I pushed up the number of lights being applied simultaneously in every fragment shader execution to 32 I hit a performance cliff or two and a wall before I got there with the way I was doing it. Seems like at least one cliff had to do with the GLSL compiler (in NVidia's driver) dynamically determining the maximum number of iterations to automatically unroll loops in the shader (at the time, I generating a shader permutation with the number of lights baked in). When it flipped to not unrolling I hit a big perf drop-off IIRC (NOTE: Whether and when the compiler unrolls loops can be controlled with a #pragma directive). Pushing the number of lights up even further resulted in hitting a limit with the max amount of uniform space I could pass into the shader using standard uniforms. This of course can be bypassed by any number of methods (SSBOs, UBOs, TBOs, etc.), but with potential performance reductions. Not sure any of this is useful to you nowadays (OpenGL has moved on), but I just mention it in case you do hit perf cliffs or walls with your profiling to give you a few possible potential causes to check into to see if they apply in your case. But long story short, doing this test made it blatantly obvious that I couldn't get where I wanted to go with the GPU by just simple forward shading. I ended up implementing Deferred Shading which supported 100s-1000s of lights even without tile-based deferred, but that was before Tiled/clustered forward and Forward+ like approaches (nowadays and knowing what I know about deferred's limitations and challenges, I'd seriously consider using Tiled/clustered Forward/Forward+ like approaches instead).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •