Early-Z and stencil test

olmeca · February 20, 2007, 12:38am

Hi,

I use a computiationally expensive fragment shader in a multipass algorithm. I want to compute fragments which are not marked in eralier iterations. My question is:

When do such tests as Depth-Test, Stencil test and Alpha test happend - before or after the fragment processor?
Is it right, that early-z happens before the fragment processor if gl_FragDepth is not touched in the fragment shader?

Thanks in advance

Zengar · February 20, 2007, 1:36am

All tests are done after the fragment shader in order scissor->alpha->stencil->depth
Yes, this is correct, but there may be additional conditions. On nvidia hardware, you cannot use early-z if rejection of a fragment would still lead to changes in depth/stencil/color. So basically, if you modify stencil on depth fail, you won’t get early-z.

cass · February 20, 2007, 8:03am

The logical location of the depth/stencil test is after shading. In practice we do it early when we can.

Early z testing has lots of possible implementations, so it’s difficult to make claims that are universal. For example on GeForce 6 series and beyond, you can get early z rejection even when the depth buffer is being updated.

Thanks -
Cass

olmeca · February 20, 2007, 9:36am

So what can I do to “mask out” fragments from fragment processing (prevent computation)? Is there any other way?

Korval · February 20, 2007, 11:40am

So what can I do to “mask out” fragments from fragment processing (prevent computation)?
Guaranteed? Nothing.

However, in general, if early-Z is available at all, it would likely happen under the following circumstances:

1: No alpha test.
2: No stencil test (or no stencil write?).
3: No depth write.

imported_dimensionX · February 20, 2007, 2:02pm

http://www.gpgpu.org/forums/viewtopic.php?t=361
http://www.gpgpu.org/forums/viewtopic.php?t=256
http://www.gpgpu.org/forums/viewtopic.php?t=367

Humus · February 21, 2007, 8:23pm

Originally posted by Korval:
2: No stencil test (or no stencil write?).

Stencil test is fine. Stencil write is only a problem together with alpha test. Stencil op other than KEEP for fail and zFail disables Hierarchical-Z.

olmeca · March 13, 2007, 2:03am

Hi,
Thanks for your answers. I still have problems with the early-Z. I have got a multipass algorithm, which just filters the input Texture (> 1024x1024) (ping-pong), so at the end I receive a set of numIterations textures, filtered.
I want to disable smoothing certain areas - I do this in an additional pass before the smoothing:

	glDisable( GL_TEXTURE_2D );
   
	glClearDepth( 1.0 );
	glClear( GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT );
    
	glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE );
	glDepthMask( GL_TRUE );
	glDepthFunc( GL_LESS );

        // draws mask-quad in the middle of the screen
	glLoadIdentity();
	glTranslatef( 0.25, 0.25, 0 );
	glScalef( 0.5, 0.5, 1 );
	glColor3f( 0, 1, 0 );
    glCallList( displayList ); 
    
	glDepthMask( GL_FALSE );
    glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE );

The shader is now “simple” - two pass gausian blur with 5 tex-reads each pass. Early-Z changes nothing in terms of performance.

Is the shader “too simple”?

thanks

Humus · March 13, 2007, 4:40am

You should see a performance increase even for trivial shaders. Are you sure you have depth test enabled in your depth pass?

olmeca · March 13, 2007, 5:18am

Hi Humus and thanks for taking the time. The complete procedure:

glEnable( GL_DEPTH_TEST ); 
glDisable( GL_TEXTURE_2D );
   
	glClearDepth( 1.0 );
	glClear( GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT );
    
	glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE );
	glDepthMask( GL_TRUE );
	glDepthFunc( GL_LESS );

        // draws mask-quad in the middle of the screen
	glLoadIdentity();
	glTranslatef( 0.25, 0.25, 0 );
	glScalef( 0.5, 0.5, 1 );
	glColor3f( 0, 1, 0 );
    glCallList( displayList ); 
    
	glDepthMask( GL_FALSE );
    glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE );





glLoadIdentity();
	shader->enable(); {

		for ( int k=0; k<numIterations; k++ ){
        
			// horizontal blur
			shader->setUniform( "offset", offset_h );
			fbo->enable();
			glCallList( displayList );
			fbo->disable();
			fbo->swap();
			fbo->bindAsTexture( GL_TEXTURE0 );

			// vertical blur
			shader->setUniform( "offset", offset_v );
			fbo->enable();
			glCallList( displayList );
			fbo->disable();
			fbo->swap();
			fbo->bindAsTexture( GL_TEXTURE0 );

		}

	} shader->disable();

glDisable( GL_DEPTH_TEST );

Afterwards the fbo is bound as texture and drawn to the screen. I can see that the rectangle in the middle is masked out (so depth test works), but no increase in performance.

the projection is gluOrtho2D(0,1,0,1)

benjamin

olmeca · March 13, 2007, 7:47am

The first block of code is enclosed by fbo->enable() and fbo->disable().

olmeca · March 14, 2007, 5:57am

Hi again,

What I found out until now:

If I do the following:

glEnable(GL_DEPTH_TEST);
fbo->enable();
glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE );
glDepthMask( GL_TRUE );
glClearDepth( 0.5 );
glClear( GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT );
    
glDepthFunc( GL_GREATER );

// I draw nothing into the depthbuffer

glDepthMask( GL_FALSE );
    glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE );
	fbo->disable();

everything works fine. The whole screen is blocked , and fps = 385.

When I now write the depth in the shader (from my post above), the framerate drops to 39. This is what I expected, beacause early-Z is disabled.

Now I want to “mark” the regions in the depth buffer. I replace the line

// I draw nothing into the depthbuffer

with the lines


glBegin( GL_QUADS ); {

	glVertex3f( 0, 0, -1.0 );
	glVertex3f( 1, 0, -1.0 );
	glVertex3f( 1, 1, -1.0 );
	glVertex3f( 0, 1, -1.0 );

} glEnd();
[/CODES]
 
which mimic the behavior of the 1st example. I don't change depth values afterwards, but fps stays at 39fps. Shouldn't they go up to 385?

thanks again

olmeca · March 14, 2007, 7:35am

Is there a problem of using FBO + Depth Attachment + “normal” depth buffer on NVIDIA?