PDA

View Full Version : Early-Z and stencil test



olmeca
02-20-2007, 01:38 AM
Hi,

I use a computiationally expensive fragment shader in a multipass algorithm. I want to compute fragments which are not marked in eralier iterations. My question is:

1. When do such tests as Depth-Test, Stencil test and Alpha test happend - before or after the fragment processor?

2. Is it right, that early-z happens before the fragment processor if gl_FragDepth is not touched in the fragment shader?

Thanks in advance

Zengar
02-20-2007, 02:36 AM
1. All tests are done after the fragment shader in order scissor->alpha->stencil->depth

2. Yes, this is correct, but there may be additional conditions. On nvidia hardware, you cannot use early-z if rejection of a fragment would still lead to changes in depth/stencil/color. So basically, if you modify stencil on depth fail, you won't get early-z.

cass
02-20-2007, 09:03 AM
The logical location of the depth/stencil test is after shading. In practice we do it early when we can.

Early z testing has lots of possible implementations, so it's difficult to make claims that are universal. For example on GeForce 6 series and beyond, you can get early z rejection even when the depth buffer is being updated.

Thanks -
Cass

olmeca
02-20-2007, 10:36 AM
So what can I do to "mask out" fragments from fragment processing (prevent computation)? Is there any other way?

Korval
02-20-2007, 12:40 PM
So what can I do to "mask out" fragments from fragment processing (prevent computation)?Guaranteed? Nothing.

However, in general, if early-Z is available at all, it would likely happen under the following circumstances:

1: No alpha test.
2: No stencil test (or no stencil write?).
3: No depth write.

dimensionX
02-20-2007, 03:02 PM
http://www.gpgpu.org/forums/viewtopic.php?t=361
http://www.gpgpu.org/forums/viewtopic.php?t=256
http://www.gpgpu.org/forums/viewtopic.php?t=367

Humus
02-21-2007, 09:23 PM
Originally posted by Korval:
2: No stencil test (or no stencil write?).
Stencil test is fine. Stencil write is only a problem together with alpha test. Stencil op other than KEEP for fail and zFail disables Hierarchical-Z.

olmeca
03-13-2007, 02:03 AM
Hi,
Thanks for your answers. I still have problems with the early-Z. I have got a multipass algorithm, which just filters the input Texture (> 1024x1024) (ping-pong), so at the end I receive a set of numIterations textures, filtered.
I want to disable smoothing certain areas - I do this in an additional pass before the smoothing:


glDisable( GL_TEXTURE_2D );

glClearDepth( 1.0 );
glClear( GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT );

glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE );
glDepthMask( GL_TRUE );
glDepthFunc( GL_LESS );

// draws mask-quad in the middle of the screen
glLoadIdentity();
glTranslatef( 0.25, 0.25, 0 );
glScalef( 0.5, 0.5, 1 );
glColor3f( 0, 1, 0 );
glCallList( displayList );

glDepthMask( GL_FALSE );
glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE );The shader is now "simple" - two pass gausian blur with 5 tex-reads each pass. Early-Z changes nothing in terms of performance.

Is the shader "too simple"?

thanks

Humus
03-13-2007, 04:40 AM
You should see a performance increase even for trivial shaders. Are you sure you have depth test enabled in your depth pass?

olmeca
03-13-2007, 05:18 AM
Hi Humus and thanks for taking the time. The complete procedure:


glEnable( GL_DEPTH_TEST );
glDisable( GL_TEXTURE_2D );

glClearDepth( 1.0 );
glClear( GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT );

glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE );
glDepthMask( GL_TRUE );
glDepthFunc( GL_LESS );

// draws mask-quad in the middle of the screen
glLoadIdentity();
glTranslatef( 0.25, 0.25, 0 );
glScalef( 0.5, 0.5, 1 );
glColor3f( 0, 1, 0 );
glCallList( displayList );

glDepthMask( GL_FALSE );
glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE );





glLoadIdentity();
shader->enable(); {

for ( int k=0; k<numIterations; k++ ){

// horizontal blur
shader->setUniform( "offset", offset_h );
fbo->enable();
glCallList( displayList );
fbo->disable();
fbo->swap();
fbo->bindAsTexture( GL_TEXTURE0 );

// vertical blur
shader->setUniform( "offset", offset_v );
fbo->enable();
glCallList( displayList );
fbo->disable();
fbo->swap();
fbo->bindAsTexture( GL_TEXTURE0 );

}

} shader->disable();

glDisable( GL_DEPTH_TEST ); Afterwards the fbo is bound as texture and drawn to the screen. I can see that the rectangle in the middle is masked out (so depth test works), but no increase in performance.

the projection is gluOrtho2D(0,1,0,1)

benjamin

olmeca
03-13-2007, 07:47 AM
The first block of code is enclosed by fbo->enable() and fbo->disable().

olmeca
03-14-2007, 05:57 AM
Hi again,

What I found out until now:

1. If I do the following:


glEnable(GL_DEPTH_TEST);
fbo->enable();
glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE );
glDepthMask( GL_TRUE );
glClearDepth( 0.5 );
glClear( GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT );

glDepthFunc( GL_GREATER );

// I draw nothing into the depthbuffer

glDepthMask( GL_FALSE );
glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE );
fbo->disable();everything works fine. The whole screen is blocked , and fps = 385.

When I now write the depth in the shader (from my post above), the framerate drops to 39. This is what I expected, beacause early-Z is disabled.


2. Now I want to "mark" the regions in the depth buffer. I replace the line

// I draw nothing into the depthbufferwith the lines

[CODE]
glBegin( GL_QUADS ); {

glVertex3f( 0, 0, -1.0 );
glVertex3f( 1, 0, -1.0 );
glVertex3f( 1, 1, -1.0 );
glVertex3f( 0, 1, -1.0 );

} glEnd();
[/CODES]

which mimic the behavior of the 1st example. I don't change depth values afterwards, but fps stays at 39fps. Shouldn't they go up to 385?

thanks again

olmeca
03-14-2007, 07:35 AM
Is there a problem of using FBO + Depth Attachment + "normal" depth buffer on NVIDIA?