Performance Trouble FBO

Hi,

I am experience some weird performance troubles:

I create a FBO with both, color and depth being textures since I need them later in my shader. However, clearing the Depht-Buffer with
glClear(GL_DEPTH_BUFFER_BIT);
really kills the performance. It drops from 250fps when just clearing the color-texture to about 55fps when clearing the depth buffer as well.

The FBO is created like this:

	glGenFramebuffersEXT(1, &m_frameBuffer);
	glGenTextures(1, &m_depthBuffer);
	glGenTextures(1, &m_colorBuffer);

glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, m_frameBuffer);

glBindTexture(GL_TEXTURE_RECTANGLE_EXT, m_colorBuffer);
glTexImage2D(GL_TEXTURE_RECTANGLE_EXT, 0, GL_RGBA8, _width, _height, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
//glTexImage2D(GL_TEXTURE_RECTANGLE_EXT, 0, GL_FLOAT_RGBA16_NV, _width, _height, 0, GL_RGBA, GL_FLOAT, NULL);
glTexParameteri(GL_TEXTURE_RECTANGLE_EXT, GL_TEXTURE_MIN_FILTER,GL_LINEAR);
glTexParameteri(GL_TEXTURE_RECTANGLE_EXT, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_RECTANGLE_EXT, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_RECTANGLE_EXT, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_TEXTURE_RECTANGLE_EXT, m_colorBuffer, 0);

glBindTexture(GL_TEXTURE_RECTANGLE_EXT, m_depthBuffer);
glTexImage2D(GL_TEXTURE_RECTANGLE_EXT, 0, GL_DEPTH_COMPONENT32, _width, _height, 0, GL_DEPTH_COMPONENT, GL_UNSIGNED_BYTE, NULL);
glTexParameteri(GL_TEXTURE_RECTANGLE_EXT, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_RECTANGLE_EXT, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_RECTANGLE_EXT, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_RECTANGLE_EXT, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_DEPTH_ATTACHMENT_EXT, GL_TEXTURE_RECTANGLE_EXT, m_depthBuffer, 0);

Also, when changing the DEPTH_COMPONENT to DEPTH_COMPONENT16, the FBO can’t be created, only DEPTH_COMPONENT24 and DEPTH_COMPONENT32 work but show no difference in Performance.

Does anybody have any ideas?

Afaik, no cards support 32 bit depth. If you ask for 32 they give you 24. Thats why there’s no performance difference. 24 bits is considered to be the ‘optimal’ path… stray from it at your own risk.

For the rest, I probably can’t help, but what drivers and card are you using? As a work around you could try clearing the depth buffer using a fragment program + full screen quad.

I’d advise against clearing depth with a fragment program. This will most likely disable early z test.

You load a depth writing program to clear it, then set your normal render shader after. I don’t think early z culling sets any persistant state in the drawable, but rather is disabled/enabled based on the state of the pipeline (ie. the fragment program - does it write z or not).

Unless you’re thinking something along the lines that once the drawable has a z-write fp enabled on it, it doesn’t use heirarchal z buffer anymore but switches to something else… ??

can yous explain the last 2 posts?
“eg You load a depth writing program to clear it, then set your normal render shader after.”
im not to sure what that means

I don’t know how fast it would be, but what I meant by clearing the depth buffer is could you do something like set the depth test to always pass and draw a full screen quad at the depth of the far clip plane? Or just use a shader that sets z to to far clip plane depth…

Originally posted by zed:
can yous explain the last 2 posts?
“eg You load a depth writing program to clear it, then set your normal render shader after.”
im not to sure what that means

He’s saying that the guy should bind a shader that exclusively overwrite the depth value in the framebuffer and then disables it and moves on with his rendering…
It’s either that or I should give up on the English language at once :smiley:

Hello! I’m having a similar trouble with shadow maps using a FBO configured as a 24 bit depth-texture.

This is what happens:

  • Initially my app is running at 3000 FPS without showing anything on the screen. (FPS showed using fraps)
  • I create a depth-tender-target-texture (only once)
  • Now each frame I switch to the rendertarget, clear it and switch back to the default FBO.

This last step makes the application fall down to 400 FPS. Note that I do not render anything to the RT, only clear it.

SwitchFBO(depthTex)
ClearDepthBuffer();
Switch(defaultWin)

This may be normal because switching and clearing the FBO must have a cost. However if I do not clear the depthbuffer and only do the switchings like this:

SwitchFBO(depthTex)
Switch(defaultWin)

… then there is almost no performance lost. So the depth clearing on a depthRT is killing my performance.

The most strange thing is that after the first depthRT clearing, my performance is killed forever, even if do not switch to the render target anymore!

Thus, questions turn:

1- Does anyone of you have any idea about this?
2- What is the most efficient way to do/configure a depth-render-target to perform shadow mapping.
3- It is better to not clear the depthbuffer (and do not use the depth-test) when calculating the shadow map?

PD: I’m running a GeForce 6800 Ultra.

I will appreciate any idea.
Thanks in advance.

To comment on the 32bit depth on Nvidia hardware if I remember right defaults to 24bit last time I checked so setting isn’t going to have a 32bit buffer.

[b]However if I do not clear the depthbuffer and only do the switchings like this:

SwitchFBO(depthTex)
Switch(defaultWin)

… then there is almost no performance lost. So the depth clearing on a depthRT is killing my performance.

The most strange thing is that after the first depthRT clearing, my performance is killed forever, even if do not switch to the render target anymore!
[/b]
If you do nothing between the switches, the driver has no reason to do anything with the HW (many setups are defered by the driver until draw command is issued) so there is likely only some validation going on so there is no performace loss. You only see the “performance loss” (it is more likely a difference between “do nothing” and “do something” situations) when it has to do some work like clearing the buffer.


3- It is better to not clear the depthbuffer (and do not use the depth-test) when calculating the shadow map?

The HW can use the fact that depthbuffer was cleared using the glClear() operation to do some clever tricks regarding depth buffer compression, reactivate early z tests if they were disabled (especially on nVidia cards). This is also highly important on SLI configurations where without glClear() the depthbuffer may need to be copied to the second card.