PDA

View Full Version : Summed Area Tables in GLSL



OniDaito
04-11-2011, 04:05 PM
Hi guys. I've been playing so much with Shadows these days, trying to get it right for a client. Im looking at Summed Area Variance Shadow maps and following GPU Gems 3. Ive found this paper:

http://www.shaderwrangler.com/publications/sat/SAT_EG2005.pdf

Which tells you how to create summed area maps for use in various effects, using the GPU. The algorithm seems straight forward but I cant seem to get it right:


-(void) generateSummedTables:(QCOpenGLContext *)context withTex:(GLuint) tex {

CGLContextObj cgl_ctx = [context CGLContextObj];
// Horizontal Scan


int nm = ceil(log2(mFBOSize));
glUseProgramObjectARB([mSummedTableShader programObject]);
glUniform1iARB([mSummedTableShader getUniformLocation:"texWidth"],mFBOSize);

GLuint atex = tex;
[mSummedFBO bindNoDraw];

glMatrixMode(GL_PROJECTION);
glPushMatrix();
glLoadIdentity();
glOrtho(-1, 1, -1, 1, 0.0, 10.0);
glMatrixMode(GL_MODELVIEW);
glPushMatrix();
glLoadIdentity();
glColor3f(1.0,1.0,1.0f);

unsigned int ni = 1;
BOOL usingA = TRUE;
for(int i=0; i < nm; i++){

//glClear(GL_COLOR_BUFFER_BIT);

if (usingA) {
glDrawBuffer(GL_COLOR_ATTACHMENT0_EXT);
} else {
glDrawBuffer(GL_COLOR_ATTACHMENT1_EXT);
}


// Start off with an input texture A
glUniform1iARB([mSummedTableShader getUniformLocation:"Ni"],ni);
glUniform1iARB([mSummedTableShader getUniformLocation:"texture"],atex);

glBindTexture(GL_TEXTURE_2D, atex);

glBegin(GL_QUADS);
glTexCoord2f(0.0, 0.0); glVertex3f(-1.0, -1.0, 0.0);
glTexCoord2f(1.0, 0.0); glVertex3f(1.0, -1.0, 0.0);
glTexCoord2f(1.0, 1.0); glVertex3f(1.0, 1.0, 0.0);
glTexCoord2f(0.0, 1.0); glVertex3f(-1.0, 1.0, 0.0);
glEnd();

ni = ni << 1; // Move up


if (usingA) {
atex = [mSummedFBO getTextureAtTarget:0];
} else {
atex = [mSummedFBO getTextureAtTarget:1];
}

usingA = !usingA;

}

// Vertical Scan

usingA = FALSE; // Sure about that?
ni = 1;
atex = [mSummedFBO getTextureAtTarget:1];
glUseProgramObjectARB([mSummedTableVShader programObject]);
glUniform1iARB([mSummedTableVShader getUniformLocation:"texWidth"],mFBOSize);

for(int i=0; i < nm; i++){

//glClear(GL_COLOR_BUFFER_BIT);

if (usingA) {
glDrawBuffer(GL_COLOR_ATTACHMENT0_EXT);
} else {
glDrawBuffer(GL_COLOR_ATTACHMENT1_EXT);
}


// Start off with an input texture A
glUniform1iARB([mSummedTableVShader getUniformLocation:"Ni"],ni);
glUniform1iARB([mSummedTableVShader getUniformLocation:"texture"],atex);

glBindTexture(GL_TEXTURE_2D, atex);

glBegin(GL_QUADS);
glTexCoord2f(0.0, 0.0); glVertex3f(-1.0, -1.0, 0.0);
glTexCoord2f(1.0, 0.0); glVertex3f(1.0, -1.0, 0.0);
glTexCoord2f(1.0, 1.0); glVertex3f(1.0, 1.0, 0.0);
glTexCoord2f(0.0, 1.0); glVertex3f(-1.0, 1.0, 0.0);
glEnd();


ni = ni << 1; // Move up


if (usingA) {
atex = [mSummedFBO getTextureAtTarget:0];
} else {
atex = [mSummedFBO getTextureAtTarget:1];
}

usingA = !usingA;
}

glPopMatrix();
glMatrixMode(GL_PROJECTION);
glPopMatrix();

[mSummedFBO unbindFBO];

glUseProgramObjectARB(NULL);
}


AND THE FRAGMENT SHADER FOR GENERATING TABLES (Vertical and Horizontal)

uniform int texWidth; // Should have size I imagine
uniform int Ni; // texels along 2 ^ i (so we pre power)

// TODO - this is practically identical to horiz therefore we should just have a swap variable or something

uniform sampler2D texture;

void main (void) {
// vertical Pass
vec2 s = gl_TexCoord[0].st;
vec2 sd = s;
sd.y = sd.y + ( 1.0/ float(texWidth) * float(Ni) );
vec4 c = texture2D(texture, s) + texture2D(texture, sd);

gl_FragColor = c;

// Now we SWAP textures

}



uniform int texWidth;
uniform int Ni; // texels along 2 ^ i (so we pre power)

uniform sampler2D texture;

void main (void) {
// Horizontal Pass
vec2 s = gl_TexCoord[0].st;
vec2 sd = s;
sd.x = sd.x + (1.0/ float(texWidth) * float(Ni));
vec4 c = texture2D(texture, s) + texture2D(texture, sd);

gl_FragColor = c;

// Now we SWAP textures

}


So what I have is a ping/pong fbo with two textures, both 512 x 512 power of two, square textures. They are GL_RGB32F_ARB textures represented with GL_FLOAT.

I go through horizontally summing and then vertically summing. Each texture has a clamp to border flag set with a colour of 0 to make sure that any overruns dont affect the summing result.

How would I check this is correct? I suppose the easiest way is to write a shader that converts back to the previous result probably?

OniDaito
04-12-2011, 05:06 AM
Ok so I found one error. I was passing the texture ID to the shader and NOT the texture unit. Classic Schoolboy error.

I implemented a shader that would reverse the summed tables to see if what I was getting was correct. The results are rather odd! :S

http://farm6.static.flickr.com/5027/5612431507_0ee4d8eb4f.jpg

Im guessing there has been some precision loss or something going on here. My algorithm simpy takes the sum at that point and the texels immediately to the left, above and above left and performs the standard summed lookup but omits the divide. This should return the original pixel and it certainly seems to be in most cases.

OniDaito
04-12-2011, 07:32 AM
Sorry guys, fixed it now. There was a step missing from my loop and also, I needed to swap A and B textures one last time.

bjoern
01-06-2012, 07:56 AM
Hej OniDaito,
I tried to implement this algorithm on iOS in OpenGL ES 2.0 and I'm stuck with a problem of precision and data transfer. As I understand the fragment shader output, it is clamped to a range of [0.0 .. 1.0] and 8 bit per component (I think this is a limitation by the hardware manufacturer). To fit the larger integral values into the framebuffer/texture, I might try to compute the summed area for only one color channel. Is it possible to write to multiple framebuffers at the same time (with different results - every channel would use a seperate buffer).
Does anybody have had similar problems or an idea how to solve it?

BionicBytes
01-06-2012, 08:54 AM
Is it possible to write to multiple framebuffers at the same time
It is on desktop GL - it's called multiple render targets (MRT).
Deferred Rendering often uses this technique to output to two or more buffers at the same time from a single fragment shader.
The should be no reason (other than performance) why OpenGL ES 2.0 can't support this - but I'm not an ES expert I'm afraid.