Alpha test and early z test

I was wondering if I enable alpha-testing does it disable early z-testing on modern boards? Looking at the OpenGL spec it appears that the alpha test should happen before the depth test so theorically the GL should calculate the alpha value of the pixel and do the alpha test before the depth test. Does this really happen in practice or is the alpha test just delayed until the alpha value is known after the stencil/depth tests?

Alpha-test disables early-Z on ATI and NVIDIA boards. It is reenabled on the Radeon after you switch off alpha-test, but remains disabled on the FX.

There is no early alpha-test, thus, as alpha-test is performed before the z-test according to the spec, early-z must be switched off.

BTW: ATI boards also perform an early-stencil test.

  • Klaus

Without overdoing the whole semantic thing again between early z vs coarse z vs a marketing catchall “hyperz”…

It doesn’t matter what the spec says w.r.t. the order of OpenGL fragment processing stages in this context. The point of early z is to do z testing first to save on the earlier pipelined operations should you fail z, similarly with coarse z. So there’s really no sensible excuse I’ve heard other than the intricacies of efficient hardware design for early z not working with alpha test. OK that’s a BIG and reasonable excuse.

Alpha test cannot turn a fragment back on so there is no risk to an early z test, the only issue is the accompanying early z write.

Additionally, if you consider something like coarse zbuffer region reject it shouldn’t be a problem but it is very hardware design dependent.

So the real issue as far as I can tell with early z & other operations that might reject fragments is early z writes on a depth pass either to the classic zbuffer or whatever else is there.

Once again, when software developers talk about early z they mean the whole thing. It’s a black box with performance being the only thing that can be measured, so early z hyperz coarse z effectively look like one operation. When hidden fragments render at normal speed developers can only say that all hidden fragment optimizations are defeated. When performance is improved they cannot tell why, only that something worked.

[This message has been edited by dorbie (edited 02-20-2004).]

Originally posted by dorbie:
Once again, when software developers talk about early z they mean the whole thing. It’s a black box with performance being the only thing that can be measured, so early z hyperz coarse z effectively look like one operation. When hidden fragments render at normal speed developers can only say that all hidden fragment optimizations are defeated. When performance is improved they cannot tell why, only that something worked.

There has to be some magic here, dorbie, else IHVs would go out of business

Anyway, I’d just read the IHV SDK docs and follow the guidelines there as those should help find the path with the best performance.

Well I’m thinking in many cases it would make more sense to perform a double z buffer operation than simply disabling early “top of shader” z, in this scenario you do an early z reject if possible without writing and repeat the test later if the shader hasn’t killed or alpha test failed the fragment. Ideally you’d actually keep the depth write around in a register before writing it but if that is difficult or too close to memory fine, just do two tests. The success will depend on the shader complexity and the ratio of occluded fragments but I’m thinking it would be a win a lot of the time, especially if the initial read is cached.

[This message has been edited by dorbie (edited 02-20-2004).]

On GeForce 4 and GeForceFX, alpha test prevents the fragments from being used as occluders.
These fragments can still be z-culled, they
just can’t z-cull subsequent fragments.
After alpha test is disabled, everything is
peachy keen and the z-cull works normally.
The use of alpha test does not invalidate
z-cull for the duration of the frame.

The use of alpha test does not invalidate z-cull for the duration of the frame.

This is definately not true, at least on the FX.

Klaus

Originally posted by Klaus:
[b] This is definately not true, at least on the FX.

Klaus[/b]

Klaus,

Jeremy’s statement is true, however, I wouldn’t go so far as to call it peachy keen. Depth buffer updates with alpha test enabled reduce early z reject effectiveness. Some of that effectiveness can be regained, some cannot.

As you might expect, it’s very scene and render-order dependent.

Thanks -
Cass

Cass,

i have a simple test program - it is rendering one single opaque quad in front and then a large number of semi-transparent quads behind it. As soon as alpha-test is enabled(only for the opaque quad), early-z does not work anymore.

No extensions used, frustum set conservatively, strict front-to-back rendering, works fine on the Radeon.

  • Klaus

Ok, thanks for your replys. That was what I feared… So it is not wise to couple alpha-test with big shaders because all fragments have to be shaded in order to do the alpha test even if most of them would be rejected by the depth-test. I’ll be careful with alpha-test from now on

This is a paper which told me to use the alpha-test as often as possible:
http://developer.nvidia.com/object/Alpha_Test_Tricks.html

Ok, it is from 2001, hardware has changed, but leaving this paper online creates a lot of confusion i think.

The demo doesn´t use shaders, not even register combiners. I think they should update it, and add a warning not to do this together with shaders or on modern hardware, whatever.

Certainly that´s the reason why i never saw any speed-improvement with my old engine, when i used a z-only pass.

Jan.

Originally posted by Jan2000:
Ok, it is from 2001, hardware has changed, but leaving this paper online creates a lot of confusion i think.

I’ve read that doc and what it says still holds true only in a very limited number of situations. If you are sure that most of your polygons are visible the advantages they mention might still be interesting but this depends on shader comlexity, they could be so limited that you’d barely notice them. But since alpha-testing effectively disables early-z test if your polygons are covered you end up with the hardware shading a lot of fragments which will never get into the frame-buffer and which would have been removed by the depth-test before being shaded if alpha-test was disabled. If they’d mention this in their doc it would be much better.

Originally posted by Klaus:
[b]Cass,

i have a simple test program - it is rendering one single opaque quad in front and then a large number of semi-transparent quads behind it. As soon as alpha-test is enabled(only for the opaque quad), early-z does not work anymore.

No extensions used, frustum set conservatively, strict front-to-back rendering, works fine on the Radeon.

  • Klaus[/b]

Klaus,

Are the transparent polygons rendered with depth writes enabled?

Are they rendered back-to-front?

If so, I would expect those results.

It is an unfortunate consequence of our early z test implementations in those chips. Rest assured we’re aware of it, and have been working hard to improve the effectiveness in this kind of scenario.

Thanks,
Cass

Cass,

the transparent polygons are rendered with depth-writes disabled, i.e. glDepthMask(GL_FALSE).

the transparent polygons are also rendered front-to-back with blending enabled, glBlendFunc(GL_ONE_MINUS_DST_ALPHA,GL_ONE), again strict front-to-back rendering.

Depth test is enabled with depth function set to GL_LESS, depth buffer size: 24 bits.

Graphics card is a a QuadroFX 3000, driver version 56.54, early-z enabled in OpenGL driver settings dialog.

I can send you the program if you like …

Klaus

O bit OT:
ATI said, that using GL_EQUAL as depth-function is “not so good”.
Does this mean it is a bit slower because of a (!GL_LESS && !GL_GREATER) test, or does this disable early-z and giving you a big performance hit?

Thanks,
Jan.

Originally posted by Klaus:
[b]Cass,

the transparent polygons are rendered with depth-writes disabled, i.e. glDepthMask(GL_FALSE).

the transparent polygons are also rendered front-to-back with blending enabled, glBlendFunc(GL_ONE_MINUS_DST_ALPHA,GL_ONE), again strict front-to-back rendering.

Depth test is enabled with depth function set to GL_LESS, depth buffer size: 24 bits.

Graphics card is a a QuadroFX 3000, driver version 56.54, early-z enabled in OpenGL driver settings dialog.

I can send you the program if you like …

Klaus[/b]

Hi Klaus, in that case, I would definitely not expect any improvement. I would, however, like to get the program.

Thanks!
Cass