Z-test before and/or after fragment program?

Do anyone know if it is possible to control when the depth test will happen (before or after the shader)?

Have this fragment program and it updates the fragment depth value. But I needed the Z-test to happen before the shader. If fragment passes Z-test it sould then run the shader and update depth value without any further Z-test.

But what happen is the Z-test is done after I update the depth so killing everything out. To go around it I’m using a 4th target to store depth (could use the depth buffer if such z-test option was available).

For now I’m converting a float in range [0,1) to a 24 bit RGB texture with following code:

float color_to_float(float3 color)
{
	const float3 byte_to_float =
		float3(1, 1.0/256.0, 1.0/(256.0*256.0));
	return dot(color,byte_to_float);
}

float3 float_to_color(in float f)
{
	float3 color;
	f *= 256;
	color.x = floor(f);
	f -= color.x;
	f *= 256;
	color.y=floor(f);
	f -= color.y;
	color.z = floor(f*256);
	return color*0.00390625; // color/256
}

Iis there a faster way to encode a float into colors? Above code works fine and color_to_float is a single dot product.

Originally posted by fpo:
Do anyone know if it is possible to control when the depth test will happen (before or after the shader)?
It’s not. Depth test always happens after the shader (semantically anyway).

AFAIK, you can get depth test before shader (early-z) if you use nvidia cards, high end FX’es and 6xxx.
But so far I’ve found you need to disable blending and this doesn’t work for render to fp16 texture.
I use depth first pass aproach and it works for me…

yeah ild like to see this info(*) as well from nvidia/ati
true in theory, depth comes after, but often the drivers do it before (cause of better performance).
like a nvidia guy said (on these forums) the other day enabling alphatest hurts performance for depthtest (or someit, hmmm forgotten it already, + heres me with it turned on in 50% the materials i use)

(*)in something offical like, eg a pdf.
the graphics companies must have this info already in some database/spreadsheet

Originally posted by M/\dm/
:
AFAIK, you can get depth test before shader (early-z) if you use nvidia cards, high end FX’es and 6xxx.
Yes, and even more so on ATI cards ever since original Radeon. This is not what fpo is asking about though. He’s basically asking whether it’s possible to do the depth test against the regular interpolated depth, but still write depth from the shader. That’s not possible since semantically depth testing happens after the shader. Regardless how the hardware implements it in practice it needs to keep the semantics that the GL spec dictates. This means that when you write the depth in the shader, that’s what going to be used in the depth test. And also, all pre-shader depth optimizations are disabled.

I won’t rule out the possibility that something like what fpo is asking could be done since I don’t know the hardware good enough, but in that case it would have to be provided as an OpenGL extension.

Originally posted by zed:
yeah ild like to see this info(*) as well from nvidia/ati
There’s some info here:
http://www.atitech.com/developer/dx9/ATI-DX9_Optimization.pdf

It’s in DX9 terminology, but it applies equally well to OpenGL.

Thanks Humus… that is fine as I can always write the corrected depth to a separate buffer instead.

In that case I do not write to out depth in shader so z-test should happen before shader executes speeding things up with early z culling (culled pixel should not execute shader am I correct?). I just write the corrected depth to a color buffer instead with formulas from first topic.

One more question Humus, I was hopping my relief mapping shader would run on the newer ATI hardware to be released soon. Can you tell if there is any limit on number of dependent texture reads like in 9800/x800 cards? I’m only using arb_pbuffer, arb_draw_buffers and arb_fragment_program with 16 dependent tex2d…

Originally posted by fpo:

Have this fragment program and it updates the fragment depth value. But I needed the Z-test to happen before the shader. If fragment passes Z-test it sould then run the shader and update depth value without any further Z-test.

Same idea had occured to me.

For some reason, they decided z test happens with the depth value you write and not the original depth the hardware has generated.

The same thing happens with the alpha test. The new alpha of the color will be used.

And for stencil… that’s not yet available but it would have followed the same ideology.


like a nvidia guy said (on these forums) the other day enabling alphatest hurts performance for depthtest (or someit, hmmm forgotten it already, + heres me with it turned on in 50% the materials i use)

That’s right. It’s a “coarse z test” issue.
Old hw did not have this “coarse z thing” and alpha test used to be good in the old days.
ATI calls their version hyper-something-or-other

Originally posted by fpo:
One more question Humus, I was hopping my relief mapping shader would run on the newer ATI hardware to be released soon. Can you tell if there is any limit on number of dependent texture reads like in 9800/x800 cards? I’m only using arb_pbuffer, arb_draw_buffers and arb_fragment_program with 16 dependent tex2d…
I can obviously not comment on any unannounced products.
An alternative would be Parallax Occlusion Mapping:
http://www.ati.com/developer/gdc/Tatarchuk-ParallaxOcclusionMapping-FINAL_Print.pdf

A RenderMonkey workspace of this technique will be included in the next SDK release.

Originally posted by Humus:
[b]An alternative would be Parallax Occlusion Mapping:
http://www.ati.com/developer/gdc/Tatarchuk-ParallaxOcclusionMapping-FINAL_Print.pdf

A RenderMonkey workspace of this technique will be included in the next SDK release.[/b]
Great, thanks for the link. Found this was also published in ShaderX 3 book… is there a demo for it with the book CD?

As I understand, it is just like my linear search but when a point inside the object is found we intersect the view with the line from last two points (inside and outside points).

It works fine in my demo and looks good when looking from above. But at steep angles it has even worse sampling problems than my relief mapping (can not also do high frequency depth maps whithout using too many steps in search).

I tried using my binary search with just a few steps after the POM and it gives much better results at steep angles. But then it looses the need for the view/segment intersection as the binary search will converge the point to the same point with or without it (and we end up with the linear search I had before).

Here is the POM shader function I made:

void ray_intersect_pom(
      in sampler2D relieftex,
      inout float4 p, // point (texcoord,0,1)
      inout float3 v) // view/view.z
{
	const int search_steps=10;

	v/=search_steps;

	float4 pp=p; // previous point

	for( int i=0;i<search_steps-1;i++ )
	{
		p.w=tex2D(relieftex,p.xy).w; // get depth

		if (p.w>p.z) // if not inside
		{
			pp=p;		// store previous point
			p.xyz+=v;	// increment to next sample
		}
	}

	// intersect view with line from p and prev p
	float f=(pp.w-pp.z)/(p.z-pp.z-p.w+pp.w);
	p=lerp(pp,p,f);
}

I have this new demo doing relief mapping (and now also POM) in a full scene with multiple lights using deferred shading. I think it could work now on ATI when using the POM imeplementation. Can you give it a try and tell me if it works for you??? I could PM you a link.

I had a 9800 card I got from developer relations a long time ago but when we separated the company into two this year it went with the other side. So I can not test anything with ATI hardware anymore :frowning: . Waiting for the new one now… will we be able to see the new hardware at siggraph 2005? I will be there!

Sure, I can give it test run.

The driver detects whether the pixel shader writes to depth and will enable/disable early-z as needed. Unless you absolutely positively have to do this, I would avoid it at all cost. Early-z is a very good optimization. On modern cards the hardware can reject entire pixel blocks (4x4,8x8, etc) with early-z/hi-z testing.