Blooming effect

Hello, me again :slight_smile:

This time I’m wondering how I can achieve a blooming result with OpenGL… So far here’s what I did:

[ul][li]Render the whole scene once to a texture (using a FBO) which I’ll call basetexture[]Display that rendered texture using a bright-pass shader and render that into a new texture (let’s call it brightpasstexture)[]Display the brightpass texture to the screen and filter it using a gaussian blur shader, and render it to finalpasstexture[*]Render the whole scene using multitexturing, with basetexture into texunit 0 and finalpasstexture into texunit 1, with GL_ADD[/ul][/li]
Here are the questions:

  • Is there a better way to render each step than rendering to texture, then displaying using the next shader and rendering to texture again?
  • I know I have to generate several textures from the brightpass one (or whatever texture is going to be used as “bright spots” one) to create the blooming effect. I know that using several gaussian blur shaders with bigger and bigger kernels is too slow, so I read I have to use the same kernel (usually 3x3) BUT each time with a downsampled texture (i.e. first time 256x256 then 128x128 then 64x64…). The question is: how do I generate these downsampled textures? Is there a fast way, or should I resize it at each pass? (I bet this is SLOW)

Here’s the code I’m currently using (note I only use one blur pass at this stage):

	// ************************************************************
	// *** Render scene *******************************************
	// ************************************************************

	m_fboBasePass->UseFBO();

	glViewport(0,0,width, height);
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
	glLoadIdentity();

	gluLookAt(pos.x, pos.y, pos.z,
		  view.x, view.y, view.z,
		  up.x, up.y, up.z);

	m_skybox->Render(pos.x, pos.y, pos.z, engineconfig->farPlane-10.0f, engineconfig->farPlane-10.0f, engineconfig->farPlane-10.0f);
	m_mapI->Render();
	
	m_fboBasePass->ReleaseFBO();

	////////////////////////////////////////////////////////////////////////
	// 2D Drawings /////////////////////////////////////////////////////////
	////////////////////////////////////////////////////////////////////////

	glViewport(0,0,width, height);
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
	glMatrixMode(GL_PROJECTION);
	glLoadIdentity();
	gluOrtho2D(0.0f, width-1.0f, 0.0f, height-1.0f);
	glMatrixMode(GL_MODELVIEW);
	glLoadIdentity();

	//// render post processing /////////////////////////////////////////////////////////
	glDisable(GL_LIGHTING);
	glActiveTextureARB(GL_TEXTURE0_ARB);
	glColor4f(1.0f, 1.0f, 1.0f, 1.0f);
	glBlendFunc(GL_ONE, GL_ZERO);

	// bright pass filter
	m_fboBrightPass->UseFBO();

	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
	HT_Graphics::SetActiveShaderProgram(HT_Constants::SH_POST_BRIGHTPASS);
	glBindTexture(GL_TEXTURE_2D, m_fboBasePass->GetTexture());
	int my_sampler_uniform_location = GetVariable("tex0");
	glUniform1iARB(my_sampler_uniform_location, 0);
	DrawFullScreenQuad(width, height);

	m_fboBrightPass->ReleaseFBO();

	// blur pass filter
	m_fboFinalPass->UseFBO();

	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
	HT_Graphics::SetActiveShaderProgram(HT_Constants::SH_POST_GAUSSIANBLUR);
	glBindTexture(GL_TEXTURE_2D, m_fboBrightPass->GetTexture());
	my_sampler_uniform_location = GetVariable("tex0");
	glUniform1iARB(my_sampler_uniform_location, 0);
	my_sampler_uniform_location = GetVariable("width");
	glUniform1iARB(my_sampler_uniform_location, width);
	my_sampler_uniform_location = GetVariable("height");
	glUniform1iARB(my_sampler_uniform_location, height);
	
	DrawFullScreenQuad(width, height);

	m_fboFinalPass->ReleaseFBO();

	// final pass
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
	HT_Graphics::SetActiveShaderProgram(HT_Constants::SH_NONE);
	
	glBlendFunc(GL_ONE, GL_ZERO);

	glActiveTexture(GL_TEXTURE0);
	glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE);
	glEnable(GL_TEXTURE_2D);
	glBindTexture(GL_TEXTURE_2D, m_fboBasePass->GetTexture());

	glActiveTexture(GL_TEXTURE1);
	glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_ADD);
	glEnable(GL_TEXTURE_2D);
	glBindTexture(GL_TEXTURE_2D, m_fboBrightPass->GetTexture());

	DrawFullScreenQuad(width, height);

Thanks a lot for your feedback!

Cheers :wink:

HT

Search the FBO extension spec for ‘GenerateMipmapEXT’ to generate the downsampled mipmaps.

Thanks Zbuffer. As for the way I render each step, is what I do correct? should I really render each stage to a separate FBO and then blend each into a multitextured final stage?

HT

Hi!
My first HDR + bloom implementation was based on mipmaps but I eventually quit that when I got to the point that GenerateMipmapEXT threw exception on ATI (all parameters and texture valid).
Here is what I do now:

  1. Render scene to texture

  2. Render this texture to 4 times smaller one - use pixel shader that computes average of 4 samples, and place each sample at center between 4 pixels - This way I downsize 1280x1024 to 320x256 in single pass.

  3. render this quarter texture into next texture performing horizontal blur - again using combination of pixel shader and native texture filtering to blur more samples as fast as possible

  4. render to yet another texture this time performing vertical blur

  5. Render to screen using that final texture and base texture.

By the way:
glBlendFunc(GL_ONE, GL_ZERO);
Why not just:
glDisable(GL_BLEND);

If you do not want textures to blend then disable blending. Perhaps the driver will optimize it, but if not then you will have better performance when you disable blending if not used.

i havent done the mipmapping path, does it give acceptabler results, as good as doing the blur yourself?, ie pingponging horizontal + vertical blurs

the mipmap is not for the blur itself, only to downsample original image so blur is faster.

the mipmap is not for the blur itself
I’ve actually used mipmap for simple blur :wink:
Final color:
0.95level0 + 0.05level1 + 0.02*level2 …

You can imagine that it didn’t look to well.

only to downsample original image so blur is faster
To access downsampled image you have to specify texture LOD bias in the texture2D function in the shader - works fine on GeForce 7800, but on my GeForce 6600GT using:
texture2D( tex, coord, 0.0);
is noticeably slower than:
texture2D( tex, coord );

So I actually got a penalty for using mipmaps. Downsampling image yourself and then blurring it actually works faster if you do it right (4x4 -> 1x1 downsample in single pass).

Yet another reason for me to stop using mipmaps for fake blurring.

Originally posted by ZbuffeR:
the mipmap is not for the blur itself, only to downsample original image so blur is faster.
ta, in that case its worthless as IMO bloom should not use the onscreen framebuffer as it produces worse results than a selected bloom

Just to know… here’s what I do to render my post-processing effect: (I won’t use mips, but downsample shader instead)

> render scene to FBOBase
> Use FBOBase + brightpass shader to FBOPing
> Use FBOPing + downsample shader to FBOPong
> Use FBOPong + blur shader to FBOPing
> Use FBOBase GL_ADD FBOPing to screen

Do you think it’s correct? how many “downsample + blur” ping-pong passes do you think could be needed in order to get a nice blooming effect? Is one enough, or should I render several ones? (I’m concerned about these FBO and shader changes actually)

Moreover, should I add some overload by splitting 4x4 blur to vertical blur and horizontal blur seperately? I guess it’s cheaper in terms of texture sample access (3+3 instead of 3*3) but I’m also concerned about the overload caused by the FBO ping-pong and the shader switch… What’s best/“least worse”? ^^

Thanks

HT

I meant “3x3” blur instead of “4x4” - i’m using a 3x3 kernel gaussian blur

Here’s a shot depicting an issue I’m having with mips:

See the edges around the clouds? They’re kind of “jaggy”. Here’s how I did:

const static byte BLUR_STAGE_0	= 4;
const static byte BLUR_STAGE_1	= 8;

...

	//** bright pass filter *******************************
	HT_Graphics::SetActiveShaderProgram(HT_Constants::SH_POST_BRIGHTPASS);
	my_sampler_uniform_location = GetVariable("tex0");
	glUniform1iARB(my_sampler_uniform_location, 0);	

	RenderFullScreenTextureToFBO(m_fboBrightPass, m_fboBasePass->GetTexture(), width, height);

	//** blur passes filter *******************************
	HT_Graphics::SetActiveShaderProgram(HT_Constants::SH_POST_GAUSSIANBLUR);
	my_sampler_uniform_location = GetVariable("tex0");
	glUniform1iARB(my_sampler_uniform_location, 0);
	my_sampler_uniform_location = GetVariable("width");
	glUniform1iARB(my_sampler_uniform_location, width/HT_Constants::BLUR_STAGE_0);
	my_sampler_uniform_location = GetVariable("height");
	glUniform1iARB(my_sampler_uniform_location, height/HT_Constants::BLUR_STAGE_0);
	RenderFullScreenTextureToFBO(m_fboBlur0Pass, m_fboBrightPass->GetTexture(), width/HT_Constants::BLUR_STAGE_0, height/HT_Constants::BLUR_STAGE_0);

	my_sampler_uniform_location = GetVariable("width");
	glUniform1iARB(my_sampler_uniform_location, width/HT_Constants::BLUR_STAGE_1);
	my_sampler_uniform_location = GetVariable("height");
	glUniform1iARB(my_sampler_uniform_location, height/HT_Constants::BLUR_STAGE_1);
	RenderFullScreenTextureToFBO(m_fboBlur1Pass, m_fboBlur0Pass->GetTexture(), width/HT_Constants::BLUR_STAGE_1, height/HT_Constants::BLUR_STAGE_1);

	//** render to screen *******************************
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

	HT_Graphics::SetActiveShaderProgram(HT_Constants::SH_NONE);
	
	glActiveTexture(GL_TEXTURE0);
	glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE);
	glEnable(GL_TEXTURE_2D);
	glBindTexture(GL_TEXTURE_2D, m_fboBasePass->GetTexture());

	glActiveTexture(GL_TEXTURE1);
	glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_ADD);
	glEnable(GL_TEXTURE_2D);
	glBindTexture(GL_TEXTURE_2D, m_fboBlur0Pass->GetTexture());

	glActiveTexture(GL_TEXTURE2);
	glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_ADD);
	glEnable(GL_TEXTURE_2D);
	glBindTexture(GL_TEXTURE_2D, m_fboBlur1Pass->GetTexture());

	DrawFullScreenQuad(width, height);

...

with RenderFullScreenTextureToFBO being:

void HT_Graphics::RenderFullScreenTextureToFBO(HT_FBO* _fbo, GLuint _texture, int _width, int _height)
{
	_fbo->UseFBO();

	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
	glBindTexture(GL_TEXTURE_2D, _texture);
	DrawFullScreenQuad(_width, _height);

	_fbo->ReleaseFBO();
}

and the FBO's:

	m_fboBasePass = new HT_FBO(width, height);
	m_fboBrightPass = new HT_FBO(width, height);
	m_fboBlur0Pass = new HT_FBO(width/HT_Constants::BLUR_STAGE_0, height/HT_Constants::BLUR_STAGE_0);
	m_fboBlur1Pass = new HT_FBO(width/HT_Constants::BLUR_STAGE_1, height/HT_Constants::BLUR_STAGE_1);

So I have downsampled FBO’s for blur passes, each with linear filtered textures. I guess that’s the reason why my edges are jaggy; but not because of the downscale, but because of the blur which is probably not strong enough. I’m using this gaussian blur shader:

  
uniform sampler2D tex0;
uniform float width;
uniform float height;

#define KERNEL_SIZE 9

// Gaussian kernel
// 1 2 1
// 2 4 2
// 1 2 1	
float kernel[KERNEL_SIZE];
vec2 offset[KERNEL_SIZE];

void main(void)
{
float width = 800.0;
float height = 600.0;
	
   float step_w = 1.0/width;
   float step_h = 1.0/height;

   offset[0] = vec2(-step_w, -step_h);
   offset[1] = vec2(0.0, -step_h);
   offset[2] = vec2(step_w, -step_h);
   
   offset[3] = vec2(-step_w, 0.0);
   offset[4] = vec2(0.0, 0.0);
   offset[5] = vec2(step_w, 0.0);
   
   offset[6] = vec2(-step_w, step_h);
   offset[7] = vec2(0.0, step_h);
   offset[8] = vec2(step_w, step_h);
   
   kernel[0] = 1.0/16.0; 	kernel[1] = 2.0/16.0;	kernel[2] = 1.0/16.0;
   kernel[3] = 2.0/16.0;	kernel[4] = 4.0/16.0;	kernel[5] = 2.0/16.0;
   kernel[6] = 1.0/16.0;   	kernel[7] = 2.0/16.0;	kernel[8] = 1.0/16.0;
   
   int i = 0;
   vec4 sum = vec4(0.0);
   
   for( i=0; i<KERNEL_SIZE; i++ )
   {
		vec4 tmp = texture2D(tex0, gl_TexCoord[0].st + offset[i]);
		sum += tmp * kernel[i];
   }
   
   gl_FragColor = sum;
}

Something went obviously wrong :slight_smile:

HT

Is your input image 800*600?

You have
uniform float width;
uniform float height;
and
float width = 800.0;
float height = 600.0;
in the code. The latter wins.

The banding in the sky is intended?
I suspect 16 bit textures or 16 bit framebuffers.

You could eliminate the 1.0 weight multiplications of the kernel by leaving them unscaled, unrolling the loop and multiplying by 1/16 on the final result. That could also increase the precision.

yeah sorry, 800x600 is for debug purposes.

about the banding in the sky, I don’t know, I think it’s the texture depth.

as for the final point, why not, but I don’t think that would prevent me from having that blocky effect I see on my bloom

any other ideas?

A/ youre mixing fixed function stuff eg glEnable( GL_TEXTURE_2d ) with glsl shaders
B/ a better blur is to first blur verticle + then with that resulting texture use as an input to blur horizontal, repeat this a few times

A/ I disable shaders first before enabling states/rendering:

HT_Graphics::SetActiveShaderProgram(HT_Constants::SH_NONE);

B/ I noticed it’s not only nicer, it’s also kind of faster: with a 7x7 kernel size gaussian blur performed vertically then horizontally, I only sample 77 = 14 times, and I get a nicer result than when I sample 33 = 9 in one pass, with no noticeable performance loss.

Check out www.hardtopnet.net to see the results!

HT

The more times you downsample your base texture the more ‘blocky’ effect you’ll get.
So, if using 3x3 kernel for 4x downsapled image you can expect to have such effect.
I’m also using 4x downsampled image, but 15x15 kernel. Here is the result:

Four hints:

#1
I see you are performing brightpass first - why doing it on full-size image?
Combine brightpass + downsample into one pass - you will save memory bandtwidth

#2
Split kernel into vertical and horizontal (I see you allready done that while I wrote this post).

#3
My kernel is 15x15 but, I’m using olny 18 texture lookups for that - 2 times 9x1 kernel. How do I implement 15x1 kernel in just 9 texture lookups?
I combine texture lookups into pairs (except for the middle sample). If you use linear filtering then:

color  = texture2D(tex, coord1) * 0.2;
color += texture2D(tex, coord2) * 0.5;

Is equivalent to:

color  = texture2D(tex, (coord1 * 0.2 + coord2 * 0.5) / (0.2 + 0.5)) * (0.2 + 0.5);

#4
Instead of computing texture cordinates in fragment shader put everything in vertex shader, so your fragment shader will look like this:

  gl_FragColor =
    texture2D(tex, gl_TexCoord[0].xy) * kernel[0]
  + texture2D(tex, gl_TexCoord[0].zw) * kernel[1]
  + texture2D(tex, gl_TexCoord[1].xy) * kernel[2]
  + texture2D(tex, gl_TexCoord[1].zw) * kernel[3]

hello k_szczech, thanks for your feedback again :slight_smile:

Really nice rendering effect on your shot!

#1: I’m doing it already :slight_smile: I thought about that yesterday. The brightpass I’m now using is texel = texel³. this gives a nive contrast effect.

#2: also done that yesterday

#3: nice move, I’ll give that a try, it’ll save even more samples

#4: I’ve seen many people doing so, but I never thought of doing that.

I’ll stick to a smaller kernel than 15x15 I think, with V/H split 7x7 kernel I got 14 samples lookups… Uh oh well, 18 for a 15x15 is maybe interesting all the same :wink:

I’ve got a question for you: I’m using two blur passes consisting of 2x downsample and 4x downsample with 7x7 gaussian blur.

I’ve seen many implementations (such as Humus) using 4 passes (1x, 2x, 4x and 8x).
Do you think I should use only one blur pass (4x probably) with a bigger blur kernel, or stick to 2 passes with a lighter kernel? I mean performance-wise and quality-wise.

Thanks a lot for your interest

HT

I’m confused - do you ask for downsampling or blurring?
I’m using single pass 4x downsample (linear) and then 2-pass blur (15x1) on that image - that’s it.

Sorry-- I meant:

[ul][li]4 downsampling passes (whole, half, quarter, eighth) with a 2-pass 7x1 blur each[/ul][/li]VS.
[ul][li]1 downsampling pass (quarter) with a 2-pass 15x1 (or even more) blur[/ul][/li]
I think you answered my question: the second choice is what you chose, and that’s what I’m going to test.

Thanks, I’ll post feedback :slight_smile:

HT