Niruz91

03-02-2014, 06:18 AM

I seem to have some problems implementing frustum culling for tiled shading. When I try and visualise which tiles has a point light count of more than 1 when rendering 100 lights on a small scene only a few single tiles appear to have any lights on them. I believe the problem lies within how I do the frustum culling, I haven't really found any good sources for how they are supposed to be implemented when using tiles and compute shaders but everyone seems to agree that they should be implemented in a certain way which I've followed.

Here's the relevant compute shader code I've tried to play around with using some sources I've found on stackoverflow and dice's presentation on tiled rendering.

The frustum creation:

vec4 frustumPlanes[6];

vec2 tileScale = vec2(SCREEN_WIDTH,SCREEN_HEIGHT) * (1.0f / float( 2 * MAX_WORK_GROUP_SIZE));

vec2 tileBias = tileScale - vec2(gl_WorkGroupID.xy);

vec4 col1 = vec4(-projectionMatrix[0][0] * tileScale.x, projectionMatrix[0][1], tileBias.x, projectionMatrix[0][3]);

vec4 col2 = vec4(projectionMatrix[1][0], -projectionMatrix[1][1] * tileScale.y, tileBias.y, projectionMatrix[1][3]);

vec4 col4 = vec4(projectionMatrix[3][0], projectionMatrix[3][1], -1.0f, projectionMatrix[3][3]);

//Left plane

frustumPlanes[0] = col4 + col1;

//right plane

frustumPlanes[1] = col4 - col1;

//top plane

frustumPlanes[2] = col4 - col2;

//bottom plane

frustumPlanes[3] = col4 + col2;

//near

frustumPlanes[4] = vec4(0.0f, 0.0f, -1.0f, -minDepthZ);

//far

frustumPlanes[5] = vec4(0.0f, 0.0f, -1.0f, maxDepthZ);

for(int i = 0; i < 4; i++)

{

frustumPlanes[i] *= 1.0f / length(frustumPlanes[i].xyz);

}

The minDepthZ and maxDepthZ is the min/max for each tile extracted from the depth buffer.

Here's the actual culling code:

uint numActiveLights = NUM_OF_LIGHTS;

uint threadCount = MAX_WORK_GROUP_SIZE * MAX_WORK_GROUP_SIZE;

uint passCount = (numActiveLights + threadCount - 1) /threadCount;

for (uint passIt = 0; passIt < passCount; ++passIt)

{

uint lightIndex = passIt * threadCount + gl_LocalInvocationIndex;

lightIndex = min(lightIndex, numActiveLights);

PointLight p = pointLights[lightIndex];

vec4 pos = viewProjectionMatrix * vec4(p.posX,p.posY,p.posZ, 1.0f);

float rad = p.radius/pos.w;

if (pointLightCount < MAX_LIGHTS_PER_TILE)

{

bool inFrustum = true;

for (uint i = 3; i >= 0 && inFrustum; i--)

{

float dist = dot(frustumPlanes[i], pos);

inFrustum = (-rad <= dist);

}

if (inFrustum)

{

uint id = atomicAdd(pointLightCount, 1);

pointLightIndex[id] = lightIndex;

}

}

}

And the global variables used in there:

#define MAX_WORK_GROUP_SIZE 16

#define SCREEN_WIDTH 1280.0f

#define SCREEN_HEIGHT 720.0f

#define MAX_LIGHTS_PER_TILE 40

#define NUM_OF_LIGHTS 100

shared uint pointLightIndex[NUM_OF_LIGHTS];

shared uint pointLightCount = 0;

The code I use to generate the radius that a sphere would normally be scaled with when doing regular deferred shading is

float GBuffer::CalcPointLightBSphere(const TDPointLight& Light)

{

float MaxChannel = max(max(Light.color.x, Light.color.y), Light.color.z);

float ret = (-Light.Linear + sqrtf(Light.Linear * Light.Linear - 4 * Light.Exp * (Light.Exp - 256 * MaxChannel * Light.diffuseIntensity)))

/

2 * Light.Exp;

return ret;

}

Basically I have the lights points defined by a random number between 80 and -80 on the x and z axis so they are translated already which is why I multiply them by the viewProjection matrix before I check them against the frustums. And I believe that if I were to scale the light by the radius provided by the above function then for the radius to be correct in viewProjection (clip space) It'd have to be divided by w?

I might be doing something that's off somewhere but I believe it should be right

Here's the relevant compute shader code I've tried to play around with using some sources I've found on stackoverflow and dice's presentation on tiled rendering.

The frustum creation:

vec4 frustumPlanes[6];

vec2 tileScale = vec2(SCREEN_WIDTH,SCREEN_HEIGHT) * (1.0f / float( 2 * MAX_WORK_GROUP_SIZE));

vec2 tileBias = tileScale - vec2(gl_WorkGroupID.xy);

vec4 col1 = vec4(-projectionMatrix[0][0] * tileScale.x, projectionMatrix[0][1], tileBias.x, projectionMatrix[0][3]);

vec4 col2 = vec4(projectionMatrix[1][0], -projectionMatrix[1][1] * tileScale.y, tileBias.y, projectionMatrix[1][3]);

vec4 col4 = vec4(projectionMatrix[3][0], projectionMatrix[3][1], -1.0f, projectionMatrix[3][3]);

//Left plane

frustumPlanes[0] = col4 + col1;

//right plane

frustumPlanes[1] = col4 - col1;

//top plane

frustumPlanes[2] = col4 - col2;

//bottom plane

frustumPlanes[3] = col4 + col2;

//near

frustumPlanes[4] = vec4(0.0f, 0.0f, -1.0f, -minDepthZ);

//far

frustumPlanes[5] = vec4(0.0f, 0.0f, -1.0f, maxDepthZ);

for(int i = 0; i < 4; i++)

{

frustumPlanes[i] *= 1.0f / length(frustumPlanes[i].xyz);

}

The minDepthZ and maxDepthZ is the min/max for each tile extracted from the depth buffer.

Here's the actual culling code:

uint numActiveLights = NUM_OF_LIGHTS;

uint threadCount = MAX_WORK_GROUP_SIZE * MAX_WORK_GROUP_SIZE;

uint passCount = (numActiveLights + threadCount - 1) /threadCount;

for (uint passIt = 0; passIt < passCount; ++passIt)

{

uint lightIndex = passIt * threadCount + gl_LocalInvocationIndex;

lightIndex = min(lightIndex, numActiveLights);

PointLight p = pointLights[lightIndex];

vec4 pos = viewProjectionMatrix * vec4(p.posX,p.posY,p.posZ, 1.0f);

float rad = p.radius/pos.w;

if (pointLightCount < MAX_LIGHTS_PER_TILE)

{

bool inFrustum = true;

for (uint i = 3; i >= 0 && inFrustum; i--)

{

float dist = dot(frustumPlanes[i], pos);

inFrustum = (-rad <= dist);

}

if (inFrustum)

{

uint id = atomicAdd(pointLightCount, 1);

pointLightIndex[id] = lightIndex;

}

}

}

And the global variables used in there:

#define MAX_WORK_GROUP_SIZE 16

#define SCREEN_WIDTH 1280.0f

#define SCREEN_HEIGHT 720.0f

#define MAX_LIGHTS_PER_TILE 40

#define NUM_OF_LIGHTS 100

shared uint pointLightIndex[NUM_OF_LIGHTS];

shared uint pointLightCount = 0;

The code I use to generate the radius that a sphere would normally be scaled with when doing regular deferred shading is

float GBuffer::CalcPointLightBSphere(const TDPointLight& Light)

{

float MaxChannel = max(max(Light.color.x, Light.color.y), Light.color.z);

float ret = (-Light.Linear + sqrtf(Light.Linear * Light.Linear - 4 * Light.Exp * (Light.Exp - 256 * MaxChannel * Light.diffuseIntensity)))

/

2 * Light.Exp;

return ret;

}

Basically I have the lights points defined by a random number between 80 and -80 on the x and z axis so they are translated already which is why I multiply them by the viewProjection matrix before I check them against the frustums. And I believe that if I were to scale the light by the radius provided by the above function then for the radius to be correct in viewProjection (clip space) It'd have to be divided by w?

I might be doing something that's off somewhere but I believe it should be right