Thread: Best solution for dealing with multiple light types

1. Best solution for dealing with multiple light types

Hi all,

I am working on my own 3D engine and I recently ran into an issue when trying to combine different light types using a single shader. Multiple lights of a single type work fine, but when I combine a Point light (with cube map shadows), Directional light (with 2D shadows), and Spot lights (also with 2D shadows) things started to break. I found a solution to this problem, but I wonder if there is a better way of doing it. Let me first summarise my initial solution that failed and then talk about the solution I found.

I pass an array of lights to the shader that is used to render a mesh. This array is defined as follows in my shader:

Code :
```#version 420

const int nr_lights = 5;

const int DIRECTIONAL_LIGHT = 0;
const int SPOT_LIGHT = 1;
const int POINT_LIGHT = 2;

struct light {
int type;
bool enabled;
vec4 position;
vec4 diffuse;
vec4 ambient;
vec4 specular;

float constant_attenuation;
float linear_attenuation;

vec3 direction;
float light_angle;

samplerCube cube_depth_texture;
};
uniform light lights[nr_lights];```

For spot lights and directional lights I use a 2D shadow sampler to project the depth values. For point lights I created a cube texture which contain the linearised depth values. The beef of the lighting calculations are in the fragment shader and read as follows:

Code :
```for (int i = 0; i < lights.length(); ++i)
{
if (!lights[i].enabled)
{
continue;
}

vec4 halfVector = normalize(H[i]);
vec4 lightVector = normalize(L[i]);

float dotValue = max(dot(normalVector, lightVector), 0.0);
if (dotValue > 0.0)
{
float distance = length(lights[i].position - worldPos);
float intensity = 1.0;
if (lights[i].type != DIRECTIONAL_LIGHT)
intensity = 1.0 / (lights[i].constant_attenuation + lights[i].linear_attenuation * distance + lights[i].quadratic_attenuation * distance * distance);
vec4 ambient = material_ambient * lights[i].ambient;

bool inLight = true;

if (lights[i].type == SPOT_LIGHT)
{
vec3 nLightToVertex = vec3(normalize(worldPos - lights[i].position));
float angleLightToFrag = dot(nLightToVertex, normalize(lights[i].direction));
float radLightAngle = lights[i].light_angle * 3.141592 / 180.0;

inLight = false;
}

if (inLight)
{
if (lights[i].type == SPOT_LIGHT || lights[i].type == DIRECTIONAL_LIGHT)
{
}
else if(lights[i].type == POINT_LIGHT)
{
float sampled_distance = texture(lights[i].cube_depth_texture, direction[i].xyz).r;
float distance = length(direction[i]);

if (distance > sampled_distance + 0.1)
}

vec4 diffuse = dotValue * lights[i].diffuse * material_diffuse;
vec4 specular = pow(max(dot(normalVector, halfVector), 0.0), 10.0) * material_specular * lights[i].specular;
outColor += intensity * shadowf * (diffuse + specular * 100);
}

outColor += intensity * ambient;
}
}
outColor += material_emissive;```

This clearly does not work due non-uniform control flow (a term I only learned about yesterday ).

So, what I have done is to move all the texture lookups out of the non-uniform control flow. However, this means that I need to provide depth textures for all lights (even if they are not used for rendering) and sample both the cube and 2dShadow textures. Let me show you the updated fragment shader bit:

Code :
```for (int i = 0; i < lights.length(); ++i)
{
float sampled_distance = texture(lights[i].cube_depth_texture, direction[i].xyz).r;
if (!lights[i].enabled)
{
continue;
}

vec4 halfVector = normalize(H[i]);
vec4 lightVector = normalize(L[i]);

float dotValue = max(dot(normalVector, lightVector), 0.0);
if (dotValue > 0.0)
{
float distance = length(lights[i].position - worldPos);
float intensity = 1.0;
if (lights[i].type != DIRECTIONAL_LIGHT)
intensity = 1.0 / (lights[i].constant_attenuation + lights[i].linear_attenuation * distance + lights[i].quadratic_attenuation * distance * distance);
vec4 ambient = material_ambient * lights[i].ambient;

bool inLight = true;

if (lights[i].type == SPOT_LIGHT)
{
vec3 nLightToVertex = vec3(normalize(worldPos - lights[i].position));
float angleLightToFrag = dot(nLightToVertex, normalize(lights[i].direction));
float radLightAngle = lights[i].light_angle * 3.141592 / 180.0;

{
inLight = false;
}
}

if (inLight)
{
if (lights[i].type == SPOT_LIGHT)
{
}
else if(lights[i].type == POINT_LIGHT)
{
float distance = length(direction[i]);

if (distance > sampled_distance + 0.1)
}

vec4 diffuse = dotValue * lights[i].diffuse * material_diffuse;
vec4 specular = pow(max(dot(normalVector, halfVector), 0.0), 10.0) * material_specular * lights[i].specular;
outColor += intensity * shadowf * (diffuse + specular * 100);
}

outColor += intensity * ambient;
}
}

outColor += material_emissive;```

This works! In my engine I create 2 dummy shadows of size 1x1, one is a GL_TEXTURE_2D stored as a GL_DEPTH_COMPONENT, the other is a GL_TEXTURE_CUBE_MAP that only stores GL_RED values. When less than 5 lights are needed to render a mesh I pass these values to the cube_depth_texture and depth_texture values of the respective light and set the isEnabled flag to false.

While this does work, it creates a lot of overhead. In the worst case, when no lights are being used, it will still sample 10 textures!

Is there a better way around this issue? My engine currently does forward rendering, it is not clear to me whether using a G-Buffer provides a cleaner solution. If I can I would like to stick to forward rendering, so any solution and comments you have are greatly appreciated.

Many thanks!
Bram

P.S. For those interested, my 3D enigne Dreaded Portal Engine can be found here: http://bramridder.com/index.php/pers...-portal-engine

2. Originally Posted by Bram Ridder
This clearly does not work due non-uniform control flow (a term I only learned about yesterday ).

So, what I have done is to move all the texture lookups out of the non-uniform control flow. However, this means that I need to provide depth textures for all lights (even if they are not used for rendering) and sample both the cube and 2dShadow textures.
An alternative is to avoid using texture lookup functions which perform implicit derivative calculations, and instead calculate derivatives or LoD explicitly outside of the conditional and pass the result to textureProjGrad() or textureProjLod().

However, this may still perform texture lookups in cases where the condition is false (it depends upon whether the hardware has branch instructions). If you're going to be perform lookups regardless, it would be better to use a 1x1 texture (or force the use of the 1x1 mipmap level of some texture) for cases where you don't need the result.

If the hardware doesn't have branch instructions, then putting code inside a conditional doesn't avoid the cost of executing it, only the side-effects. So e.g. setting radLightAngle to π would avoid the need to use a conditional for the inside-cone test (cos(π)=-1, so the test will always be false).

3. Originally Posted by Bram Ridder
While this does work, it creates a lot of overhead. In the worst case, when no lights are being used, it will still sample 10 textures!

Is there a better way around this issue? My engine currently does forward rendering, it is not clear to me whether using a G-Buffer provides a cleaner solution. If I can I would like to stick to forward rendering, so any solution and comments you have are greatly appreciated.
I'd definitely see if you can meet your goals with small changes to your shader logic as GClements is suggesting.

If after pursuing those, you bench your app and determine that the performance still isn't up to the level you need, profile carefully to determine exactly what the biggest bottleneck is (it helps to gather a few worst-case test cases). You can use the results as a filter to evaluate which tech approaches will reduce that inefficiency the most. Just using some intuition about how your rendering algorithms work will save time with this.

If the main bottleneck ends up being the fact that you're using a shader supporting max(lights) and max(shadows) for all fragments on the entire screen and you can't easily avoid most of inefficiency associated with that with small shader changes, consider a tiled or clustered shading approach. Given your desire to stick with forward and the drawbacks of deferred approaches (which aren't insurmountable, but do require nontrivial effort), I'd suggest looking most closely at tiled or clustered forward shading techniques (websearch: tiled forward, clustered forward, and forward+ for the latest papers, blog posts, and conference presentations). However, be sure and profile other aspects of your rendering too (e.g. shadow casting and culling).

4. Thanks for the very helpful feedback.

I agree that using TextureProjGrad() or textureProjLod() is one way to solve this problem. Although, as Dark Photon mentioned, I need to check whether doing texture lookups using 1x1 textures does create a bottleneck.

Thank you Dark Photon for letting me know about Forward+ and clustered methods. Did not even know these existed, very exiting!

At the moment I cannot use more than 5 lights per mesh. I guess this is because the limit of 16 textures per shader? Or is there another limit that prohibits using an array of say 32 lights?

In any case I have some research and then some coding to do .

5. Originally Posted by Bram Ridder
At the moment I cannot use more than 5 lights per mesh. I guess this is because the limit of 16 textures per shader? Or is there another limit that prohibits using an array of say 32 lights?
Your "struct light" has 43 components; 6 of those would total 258 components, which may be exceeding some implementation limit. You can get around that by using textures (e.g. buffer textures), or you may be able to use uniform blocks or shader storage blocks. Note that you'd need to keep the samplers separate; you can't store samplers in uniform blocks, shader storage blocks or textures.

If you hit the limit on the number of texture units, consider using array textures. These effectively allow you to aggregate multiple textures into a single texture, with the constraint that all layers must have the same format and dimensions, and sampling parameters (e.g. filter and wrap modes) apply to the texture as a whole.

6. Originally Posted by Bram Ridder
At the moment I cannot use more than 5 lights per mesh. I guess this is because the limit of 16 textures per shader? Or is there another limit that prohibits using an array of say 32 lights?
You can use bindless texture or texture arrays to get past the 16 textures/shader.

However, even if textures weren't limiting you (e.g. no point or spot light shadows), I suspect you'll hit other problems trying to push the number of lights up to even 32. If I were you, I'd just try it. This will provide valuable profiling data on which to base your future design decisions, and you can also see if you hit any big performance drop-offs or blocks as you increment the number of lights applied simultaneously from 1 to32.

7. Brilliant! Thanks for the very insightful replies.

8. Quick update. I implemented some of your recommendations and I have successfully rendered a scene with 56 lights (including shadow maps)! I now use two UBOs, one for the view and projection matrix and the other for all the lighting information. I ran into an issue with having to many outs in my vertex shader so I moved all the calculation to the fragment shader (doing so somehow doubled my FPS ). While I am happy it works, it really shouldn't...

As far as I understand I should have exceeded the sampler limit in the fragment shader (GL_MAX_TEXTURE_IMAGE_UNITS), but it just seems to work. Maybe you can help me figure out what is going on. Let me present the shaders I use at the moment:

Code :
```#version 420
uniform vec4 material_ambient;
uniform vec4 material_diffuse;
uniform vec4 material_specular;
uniform vec4 material_emissive;

const int nr_lights = 60;

const int DIRECTIONAL_LIGHT = 0;
const int SPOT_LIGHT = 1;
const int POINT_LIGHT = 2;

layout (std140) uniform Lights
{
int type[nr_lights];
bool enabled[nr_lights];
vec4 position[nr_lights];
vec4 diffuse[nr_lights];
vec4 ambient[nr_lights];
vec4 specular[nr_lights];

float constant_attenuation[nr_lights];
float linear_attenuation[nr_lights];

vec3 direction[nr_lights];
float light_angle[nr_lights];
} lights;

uniform samplerCube cube_depth_texture[nr_lights];

layout (std140) uniform Matrices
{
mat4 projection_matrix;
mat4 view_matrix;
};

uniform mat4 model_matrix;

in vec3 a_Vertex;
in vec2 a_TexCoord0;
in vec3 a_Normal;

out vec2 texCoord0;
out vec4 worldPos;
out vec4 pos;
out vec4 N;

void main(void)
{
texCoord0 = a_TexCoord0;
pos = view_matrix * model_matrix * vec4(a_Vertex, 1.0);
worldPos = model_matrix * vec4(a_Vertex, 1.0);
N = view_matrix * model_matrix * vec4(a_Normal, 0.0);
gl_Position = projection_matrix * pos;
}```

Code :
```#version 420

uniform vec4 material_ambient;
uniform vec4 material_diffuse;
uniform vec4 material_specular;
uniform vec4 material_emissive;

const int nr_lights = 60;

const int DIRECTIONAL_LIGHT = 0;
const int SPOT_LIGHT = 1;
const int POINT_LIGHT = 2;

layout (std140) uniform Lights
{
int type[nr_lights];
bool enabled[nr_lights];
vec4 position[nr_lights];
vec4 diffuse[nr_lights];
vec4 ambient[nr_lights];
vec4 specular[nr_lights];

float constant_attenuation[nr_lights];
float linear_attenuation[nr_lights];

vec3 direction[nr_lights];
float light_angle[nr_lights];
} lights;

uniform samplerCube cube_depth_texture[nr_lights];

layout (std140) uniform Matrices
{
mat4 projection_matrix;
mat4 view_matrix;
};

uniform sampler2D texture0;
uniform float transparency;

in vec2 texCoord0;
in vec4 worldPos;
in vec4 pos;
in vec4 N;

out vec4 outColor;

void main(void) {

if (texture(texture0, texCoord0.st).a == 0.0)
{
}

outColor = vec4(0, 0, 0, 1);

vec4 normalVector = N;
if (N != vec4(0, 0, 0, 0))
{
normalVector = normalize(N);
}

for (int i = 0; i < nr_lights; ++i)
{
if (!lights.enabled[i])
{
break;
}

vec3 lightPos = (view_matrix * lights.position[i]).xyz;
vec4 L;
vec4 H;
vec4 direction;

if (lights.type[i] == DIRECTIONAL_LIGHT)
{
L = vec4(-lights.direction[i], 0.0);
H = vec4((-lights.direction[i]).xyz, 1.0) - pos;
direction = L;
}
else
{
L = vec4(lightPos - pos.xyz, 0.0);
H = vec4((lightPos - pos.xyz).xyz, 1.0) - pos;
direction = worldPos - lights.position[i];
}

float sampled_distance = texture(cube_depth_texture[i], direction.xyz).r;

vec4 halfVector = normalize(H);
vec4 lightVector = normalize(L);

float dotValue = max(dot(normalVector, lightVector), 0.0);
if (dotValue > 0.0)
{
float distance = length(lights.position[i] - worldPos);
float intensity = 1.0;

if (lights.type[i] != DIRECTIONAL_LIGHT)
intensity = 1.0 / (lights.constant_attenuation[i] + lights.linear_attenuation[i] * distance + lights.quadratic_attenuation[i] * distance * distance);
vec4 ambient = material_ambient * lights.ambient[i];

bool inLight = true;

if (lights.type[i] == SPOT_LIGHT)
{
vec3 nLightToVertex = vec3(normalize(worldPos - lights.position[i]));
float angleLightToFrag = dot(nLightToVertex, normalize(lights.direction[i]));
float radLightAngle = lights.light_angle[i] * 3.141592 / 180.0;

{
inLight = false;
}

}

if (inLight)
{
if (lights.type[i] == SPOT_LIGHT)
{
}
else if(lights.type[i] == POINT_LIGHT)
{
float distance = length(direction);

if (distance > sampled_distance + 0.1) {
}
}
vec4 diffuse = dotValue * lights.diffuse[i] * material_diffuse;
vec4 specular = pow(max(dot(normalVector, halfVector), 0.0), 10.0) * material_specular * lights.specular[i];
outColor += intensity * shadowf * (diffuse + specular * 100);
}

outColor += intensity * ambient;
}
}

outColor += material_emissive;
outColor *= texture(texture0, texCoord0.st);
}```

My understanding is that currently 121 textures are currently used by the Fragment shader; texture0 + 60 * (depth_texture + cube_depth_texture). When I check the value of GL_MAX_TEXTURE_IMAGE_UNITS on my GPU it returns 32. GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS returns 160, which would be enough but should not apply to solely the fragment shader. What am I not understanding and what magic is being used?

Thanks again for your help improving my 3D engine .

9. Just for curiosity:

I suppose it is few compared to what the fragment shader can support.

10. Originally Posted by Silence
Just for curiosity: