Early-Z and stencil test

Hi,

I use a computiationally expensive fragment shader in a multipass algorithm. I want to compute fragments which are not marked in eralier iterations. My question is:

  1. When do such tests as Depth-Test, Stencil test and Alpha test happend - before or after the fragment processor?

  2. Is it right, that early-z happens before the fragment processor if gl_FragDepth is not touched in the fragment shader?

Thanks in advance

  1. All tests are done after the fragment shader in order scissor->alpha->stencil->depth

  2. Yes, this is correct, but there may be additional conditions. On nvidia hardware, you cannot use early-z if rejection of a fragment would still lead to changes in depth/stencil/color. So basically, if you modify stencil on depth fail, you won’t get early-z.

The logical location of the depth/stencil test is after shading. In practice we do it early when we can.

Early z testing has lots of possible implementations, so it’s difficult to make claims that are universal. For example on GeForce 6 series and beyond, you can get early z rejection even when the depth buffer is being updated.

Thanks -
Cass

So what can I do to “mask out” fragments from fragment processing (prevent computation)? Is there any other way?

So what can I do to “mask out” fragments from fragment processing (prevent computation)?
Guaranteed? Nothing.

However, in general, if early-Z is available at all, it would likely happen under the following circumstances:

1: No alpha test.
2: No stencil test (or no stencil write?).
3: No depth write.

http://www.gpgpu.org/forums/viewtopic.php?t=361
http://www.gpgpu.org/forums/viewtopic.php?t=256
http://www.gpgpu.org/forums/viewtopic.php?t=367

Originally posted by Korval:
2: No stencil test (or no stencil write?).

Stencil test is fine. Stencil write is only a problem together with alpha test. Stencil op other than KEEP for fail and zFail disables Hierarchical-Z.

Hi,
Thanks for your answers. I still have problems with the early-Z. I have got a multipass algorithm, which just filters the input Texture (> 1024x1024) (ping-pong), so at the end I receive a set of numIterations textures, filtered.
I want to disable smoothing certain areas - I do this in an additional pass before the smoothing:

	glDisable( GL_TEXTURE_2D );
   
	glClearDepth( 1.0 );
	glClear( GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT );
    
	glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE );
	glDepthMask( GL_TRUE );
	glDepthFunc( GL_LESS );

        // draws mask-quad in the middle of the screen
	glLoadIdentity();
	glTranslatef( 0.25, 0.25, 0 );
	glScalef( 0.5, 0.5, 1 );
	glColor3f( 0, 1, 0 );
    glCallList( displayList ); 
    
	glDepthMask( GL_FALSE );
    glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE );

The shader is now “simple” - two pass gausian blur with 5 tex-reads each pass. Early-Z changes nothing in terms of performance.

Is the shader “too simple”?

thanks

You should see a performance increase even for trivial shaders. Are you sure you have depth test enabled in your depth pass?

Hi Humus and thanks for taking the time. The complete procedure:

glEnable( GL_DEPTH_TEST ); 
glDisable( GL_TEXTURE_2D );
   
	glClearDepth( 1.0 );
	glClear( GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT );
    
	glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE );
	glDepthMask( GL_TRUE );
	glDepthFunc( GL_LESS );

        // draws mask-quad in the middle of the screen
	glLoadIdentity();
	glTranslatef( 0.25, 0.25, 0 );
	glScalef( 0.5, 0.5, 1 );
	glColor3f( 0, 1, 0 );
    glCallList( displayList ); 
    
	glDepthMask( GL_FALSE );
    glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE );





glLoadIdentity();
	shader->enable(); {

		for ( int k=0; k<numIterations; k++ ){
        
			// horizontal blur
			shader->setUniform( "offset", offset_h );
			fbo->enable();
			glCallList( displayList );
			fbo->disable();
			fbo->swap();
			fbo->bindAsTexture( GL_TEXTURE0 );

			// vertical blur
			shader->setUniform( "offset", offset_v );
			fbo->enable();
			glCallList( displayList );
			fbo->disable();
			fbo->swap();
			fbo->bindAsTexture( GL_TEXTURE0 );

		}

	} shader->disable();

glDisable( GL_DEPTH_TEST ); 

Afterwards the fbo is bound as texture and drawn to the screen. I can see that the rectangle in the middle is masked out (so depth test works), but no increase in performance.

the projection is gluOrtho2D(0,1,0,1)

benjamin

The first block of code is enclosed by fbo->enable() and fbo->disable().

Hi again,

What I found out until now:

  1. If I do the following:
glEnable(GL_DEPTH_TEST);
fbo->enable();
glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE );
glDepthMask( GL_TRUE );
glClearDepth( 0.5 );
glClear( GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT );
    
glDepthFunc( GL_GREATER );

// I draw nothing into the depthbuffer

glDepthMask( GL_FALSE );
    glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE );
	fbo->disable();

everything works fine. The whole screen is blocked , and fps = 385.

When I now write the depth in the shader (from my post above), the framerate drops to 39. This is what I expected, beacause early-Z is disabled.

  1. Now I want to “mark” the regions in the depth buffer. I replace the line
// I draw nothing into the depthbuffer

with the lines


glBegin( GL_QUADS ); {

	glVertex3f( 0, 0, -1.0 );
	glVertex3f( 1, 0, -1.0 );
	glVertex3f( 1, 1, -1.0 );
	glVertex3f( 0, 1, -1.0 );

} glEnd();
[/CODES]
 
which mimic the behavior of the 1st example. I don't change depth values afterwards, but fps stays at 39fps. Shouldn't they go up to 385?

thanks again

Is there a problem of using FBO + Depth Attachment + “normal” depth buffer on NVIDIA?