PDA

View Full Version : ARB_FP execution:before or after depth test?



mikeman
04-19-2004, 02:04 AM
The specs are not clear about this:
Depth(or stecil) testing is performed before or after fragment processing through FP?Since fragment programs can replace the z component of the fragment,depth testing should be done after the execution of fp.Does this mean that per-pixel lighting calculations,for example,are performed for every fragment,whether it's visible or not?Or is there a discrimination between depth-replacing programs and regular fragment programs?

harsman
04-19-2004, 02:10 AM
Conceptually, the depth test always happens after the fragment shader. However, most modern hardware has some kind of early depth-test that can discard lots of hidden fragments early in the pipeline. This early depth-test is disabled if the shader writes out depth.

mikeman
04-19-2004, 02:35 AM
Originally posted by harsman:
Conceptually, the depth test always happens after the fragment shader. However, most modern hardware has some kind of early depth-test that can discard lots of hidden fragments early in the pipeline. This early depth-test is disabled if the shader writes out depth.What do you mean lots of hidden fragments?If this early depth-test uses the z value provided by the fixed pipeline it should discard ALL hidden fragments(if,of course,the z-buffer is filled with the correct values after a quick z-buffer only pass).
So,if I use a depth-replacing program there is no way to use the z-buffer to reduce overdraw,since the fragments are discarding after the fragment shader?

LarsMiddendorf
04-19-2004, 02:51 AM
There is a huge slowndown when replacing the depth value. Would it be possibly to give a range to the hw and to guarantee that the difference between the default and the new depth value lies withing this range? When using z-correct bumpmapping the depth is shifted only by a small amount. If the hw knows that the new depth, calculated by the shader, is in the range DefaultDepth-MinRange...DefaultDepth+MaxRange , most of these fragments could also be culled before shader execution.

mikeman
04-19-2004, 03:11 AM
Originally posted by LarsMiddendorf:
There is a huge slowndown when replacing the depth value. Would it be possibly to give a range to the hw and to guarantee that the difference between the default and the new depth value lies withing this range? When using z-correct bumpmapping the depth is shifted only by a small amount. If the hw knows that the new depth, calculated by the shader, is in the range DefaultDepth-MinRange...DefaultDepth+MaxRange , most of these fragments could also be culled before shader execution.Funny...I started this thread after reading your suggestion in Opengl2x forum!It's a good suggestion,although I think it would make depth-testing a little more complex.How GL_EQUAL,for example,should work in this case?

LarsMiddendorf
04-19-2004, 03:51 AM
I meant it just as a hint for the hw occlusion culling. The depth test should continue to work as expected. Pseudo Code:


DepthBuffer=ReadDepthBuffer();
Default=gl_FragCoord.z;
switch (gl_DepthFunc)
{
case GL_EQUAL:
if (DepthBuffer<Default+DepthBiasMin) discard;
if (DepthBuffer>Default+DepthBiasMax) discard;
break;
case GL_LEQUAL:
if (DepthBuffer<Default+DepthBiasMin) discard;
break;
case GL_LESS:
if (DepthBuffer<=Default+DepthBiasMin) discard;
break;
...
}

ExecuteFragmentShader();
StandardDepthTest(Depth,gl_FragDepth,gl_DepthFunc) ;I hope there is a way to configure the early z test this way to make z-correct bumpmapping fast.

Nutty
04-19-2004, 04:26 AM
A very good suggestion Lars. I guess it depends on how many popular games are going to change the depth value in fragment programs, as to whether its viable in transistor space.

harsman
04-19-2004, 04:40 AM
Originally posted by mikeman:
What do you mean lots of hidden fragments?If this early depth-test uses the z value provided by the fixed pipeline it should discard ALL hidden fragments(if,of course,the z-buffer is filled with the correct values after a quick z-buffer only pass).
So,if I use a depth-replacing program there is no way to use the z-buffer to reduce overdraw,since the fragments are discarding after the fragment shader?Sorry, that was kind of unclear. All hidden fragments are of course discarded, there are no rendering artifacts. What I meant was that all fragments that won't contribute to any pixels won't necessarily be culled by an early depth-test. Think about what happens when you render all your objects in near to far order for example.

yooyo
04-19-2004, 09:38 AM
I suppose that driver know if some fragment program change FragDepth (while compiling FP code). In that case driver should disable early z-test. But if FP doesn't change FragDepth early z-test should be enabled. If it's work as I expect, we can get huge speed-up if we first render only to ZBuffer and after all render image with shaders. Something like Deffered Shading Rendering. It can save a lot of GPU clocks. :)

Someone should make a benchmark app for that. One big quad with depth = 0.0 and lot of quads with FP with depth > 0.0 and with FP that change FragDepth. I can do that but my card doesn't support FP (GF Ti-4800SE) :(

yooyo

mikeman
04-19-2004, 10:01 AM
Originally posted by yooyo:
If it's work as I expect, we can get huge speed-up if we first render only to ZBuffer and after all render image with shaders. Something like Deffered Shading Rendering. It can save a lot of GPU clocks. :)
yooyoI think this is common ground to developers of application using fragment shading.Doom3 does the exact same thing.
One quick pass that fills z-buffer with correct values,then turn z-writing(not z-testing) off and shade only visible fragments.If the rendering is done otherwise,any amount of overdraw would literally kill the performance,since fragment processing is too expensive to waste it on invisible fragments.In my applications,I see 30-40% gain in FPS,and I don't even use that complex geometry as real games.That's why it is absolutely necessary to come up with a solution of how to early discard fragments when replacing the depth,otherwise techniques like z-correct bumpmapping are useless for real-time rendering.

LarsMiddendorf
04-19-2004, 10:41 AM
I've got a Radeon 9800 pro and surprisingly it make nearly no difference if there are one or two screen filling quads behind each other with a complex shader when the zbuffer is filled previously. Areas in the shadow that are stenciled out are also drawn faster. It seems that there is also some kind of early stencil test. Unfortunatelly the whole speedup is away when replacing the depth value.

jwatte
04-19-2004, 05:07 PM
Yes, that is exactly how early Z is documented to work. If you first lay down the Z buffer, and then render with LEQUAL Z testing, you will get best performance. WHen laying down the Z buffer, you should draw near-to-far, as well.

And, yes, modifying the Z value means that early Z cannot be used (because the hardware can't know what you're going to change it to).

There are some idiosynchrasies; for example, I believe that enabling alpha test, or turning off depth writes, will disable the early Z test on some cards, so try not doing that if you want the performance boost.

mikeman
04-20-2004, 12:53 AM
Originally posted by jwatte:
There are some idiosynchrasies; for example, I believe that enabling alpha test, or turning off depth writes, will disable the early Z test on some cards, so try not doing that if you want the performance boost.That's not true.As long as hw has knowledge about the z value of the fragment before fragment processing,early z-test should be enabled.It's not important if the z-value will be written in the buffer in the end,z-test works as always.
In fact,after the first quick z-buffer only pass,you MUST turn off z-writes,otherwise you'll end up updating the z-buffer with the same values over and over again.
Alpha test happens,of course,after fragment processing,but that was always the case,even with the fixed pipeline,alpha is part of the color output.

yooyo
04-20-2004, 04:23 AM
Just to don't forget... Early z-test in case of mixing fixed function pipeline in first pass and vertex+fragment program in other passes can work only if OPTION POSITION INVARIANCE are enabled in vertex program.

yooyo

mikeman
04-20-2004, 04:51 AM
After some thought,I believe that the simplest solution to our problem(slowdown when replacing depth) is to add a special fragment shader variable that allows us to read the current value of the zbuffer.Such a variable is mentioned in GLSL spec but is not implemented yet.
Using that,we can simply disable fixed z-test and perform our own depth test in the begining of the fragment shader,killing the fragments that fail our test,and prevent further processing.Any form of depth test,including Lars' suggestion can be implemented this way.
Also,I was thinking:We could simulate this with current hardware by saving the zbuffer into a texture(a-la shadowmapping),reading the value in the fragment shader and performing our depth test as i mentioned above.It could speed things up for depth-replacing programs,but I'm not sure,I haven't try it yet.

Chuck0
04-20-2004, 07:03 AM
Originally posted by mikeman:

Also,I was thinking:We could simulate this with current hardware by saving the zbuffer into a texture(a-la shadowmapping),reading the value in the fragment shader and performing our depth test as i mentioned above.It could speed things up for depth-replacing programs,but I'm not sure,I haven't try it yet.terminating a fragment program for example with the kill instruction wont speed up the fragment processing in current hardware... the kill instruction does work, and prevents the fragment from getting a pixel, but it doesnt really stop the fp execution and thus doesnt save time...

btw there are really some cases where early z-test doesnt work, even though it theoretically should (witnessed on nv3x hardware... im afraid i didnt experiment enough on fx cards, but search for some posts by klaus, who did quite some research on this issue).

mikeman
04-20-2004, 07:37 AM
Originally posted by Chuck0:
terminating a fragment program for example with the kill instruction wont speed up the fragment processing in current hardware... the kill instruction does work, and prevents the fragment from getting a pixel, but it doesnt really stop the fp execution and thus doesnt save time...
[/QB]Right.I did a little test about this and you're right,the fragment shader keeps executing,but the results are discarded.It's just a waste of GPU cycles and I think it should be fixed.

Chuck0
04-20-2004, 07:44 AM
if you mean by fixed, that it should really terminate the fragment processing im quite sure present hardware just isnt able to do that, since the fragment processing units are highly pipelined and still not that flexible... as far as i have heard only the next generation hardware (like the nv40) will be able to really gain speed by the kill instruction.
this will certainly improve nowadays hardware raytracers and raycasters greatly since they wont have to rely on hacks that use multiple passes and the z-test to prevent fp execution :)

harsman
04-20-2004, 08:09 AM
Originally posted by mikeman:

Originally posted by jwatte:
There are some idiosynchrasies; for example, I believe that enabling alpha test, or turning off depth writes, will disable the early Z test on some cards, so try not doing that if you want the performance boost.That's not true.As long as hw has knowledge about the z value of the fragment before fragment processing,early z-test should be enabled.It's not important if the z-value will be written in the buffer in the end,z-test works as always.
Whatever you think about it, it happens to be a fact on ATI hardware. What is logical for software isn't always logical in hardware. If you need to juggle many fragments in parallel and avoid synchronisation issues, one solution is to put z-test and write logic very close to the memory controller. In that case, doing early z-testing would require you to keep the z-value of each fragment around for a long time during its travel down the pipeline, which might not be desirable.

I'm not saying that it's the best possible solution, but from a hardware view it probably has some benefits. I have no idea how Nvidia or other IHV's handles early z though.

mikeman
04-20-2004, 08:23 AM
Originally posted by harsman:
I'm not saying that it's the best possible solution, but from a hardware view it probably has some benefits. I have no idea how Nvidia or other IHV's handles early z though.[/QB]I have a GFX5200.The only things that disable early z-test is running depth-replacing fragment programs and,of course,glDisable(GL_DEPTH_TEST).Alpha test and zwrites have no impact.

LarsMiddendorf
04-20-2004, 08:36 AM
If the custom depth is always less that the default depth, it should be possible to use the standard depthbuffer and a depth texture. The depthbuffer contains the depth computed by the fixed pipeline and the depth texture contains the displaced depth. The fragment has to pass the normal fast depth test and a custom depth test with the depth texture in the shader. Most of the fragments will be occluded by the default depth test and the expensive pixel shader with the kill instruction is only executed for pixels, that were visible without z replacement. I haven't tried it yet, but perhabs this works. But Stencil Shadows could be a problem.

mikeman
04-20-2004, 08:53 AM
Originally posted by LarsMiddendorf:
The fragment has to pass the normal fast depth test and a custom depth test with the depth texture in the shader. Yes,but if the fragment shader replaces depth,the early depth test is disabled.That's the whole problem.

crystall
04-21-2004, 12:04 AM
Originally posted by mikeman:
I have a GFX5200.The only things that disable early z-test is running depth-replacing fragment programs and,of course,glDisable(GL_DEPTH_TEST).Alpha test and zwrites have no impact.Weird, I should try on mine. If the drivers follow the spec alpha testing should disable early-z testing since the alpha value generated by the fragment program must be tested before the depth component. Though there is a quirk, this does make a difference if z-writes are enabled since fragments rejected by the alpha test must not write to the depth buffer; but with z-writes disabled you can rearrange the tests w/o altering the results so early-z testing should be fine.

yooyo
04-21-2004, 02:11 AM
I think that following tests are performed in this order:

1. Scissor test.
2. Early Z-Test
3. Alpha test.
4. Stencil test.
5. Depth test.
6. Blending.
7. Dithering.

yooyo

mikeman
04-21-2004, 05:16 AM
Originally posted by crystall:
Weird, I should try on mine. If the drivers follow the spec alpha testing should disable early-z testing since the alpha value generated by the fragment program must be tested before the depth component. Though there is a quirk, this does make a difference if z-writes are enabled since fragments rejected by the alpha test must not write to the depth buffer; but with z-writes disabled you can rearrange the tests w/o altering the results so early-z testing should be fine.[/QB]Ok,I found a .ppt in the web that describes the rasterization pipeline.
It goes like this:

1)Rasterization(yielding a fragment)
2)a)Fixed:Texture Mapping Engine,Color Sum,Fog
b)Programmable:Fragment program
3)Pixel ownership test
4)Scissor test
5)Alpha test
6)Stencil test
7)Depth buffer test
8)Blending
9)Dithering
10)Logical Operations
11)Write to FrameBuffer

Early z-test should take place after rasterization and before fragment programs.There is no reason why alpha test should disable it.At least on NVidia,it doesn't.
You can find the document in "http://www.plunk.org/Performance.OpenGL/".
It contains a lot of material of how to boost performance in OpenGL.

crystall
04-21-2004, 03:28 PM
Originally posted by mikeman:
Ok,I found a .ppt in the web that describes the rasterization pipeline.
It goes like this:

1)Rasterization(yielding a fragment)
2)a)Fixed:Texture Mapping Engine,Color Sum,Fog
b)Programmable:Fragment program
3)Pixel ownership test
4)Scissor test
5)Alpha test
6)Stencil test
7)Depth buffer test
8)Blending
9)Dithering
10)Logical Operations
11)Write to FrameBuffer

Early z-test should take place after rasterization and before fragment programs.There is no reason why alpha test should disable it.At least on NVidia,it doesn't.
You can find the document in "http://www.plunk.org/Performance.OpenGL/".
It contains a lot of material of how to boost performance in OpenGL.I'll chech that document. As I said there is a reason for disabling early-z test when alpha testing is used with depth-writes enabled unless during the early-z test no z values are actually written out (they are kept in a buffer and written out after the depth test). If a fragment fails the alpha-test then its depth value is not written out, early-z usually requires that the depth test (and thus the write to the depth buffer) is done before the alpha test (that is before running the fragment program). So fragments discarded by the alpha test could end up writing in the depth buffer which is contrary to what is in the spec. If the hardware is smart enough to posticipate the depth-write to after the alpha test or if depth-writes are disabled then its fine to execute the early-z test before the alpha test.

mikeman
04-22-2004, 12:19 AM
Originally posted by crystall:
If a fragment fails the alpha-test then its depth value is not written out, early-z usually requires that the depth test (and thus the write to the depth buffer) is done before the alpha test (that is before running the fragment program)The writes to the buffer(color,depth,anything) happen in the final stage of the rasterization pipeline.I don't understand where you got the idea that zwrite must be done immediately after ztest.
I don't think you understood what early ztest is:
The raster pipeline works as I described.Early ztest is just a trick the hardware uses to discard fragments at an early stage.It doesn't mean that,if a fragment passes the z-test,the depth value is immediately written to the buffer.That would mean z-test would disable any other test(scissor,stencil,alpha).
I said I already did a test on my NVidia.A wrote a program that does specular bumpmapping and parallax mapping,running in 25FPS.Then i clear the depth value with 0.0 so nothing passes,and I have 500FPS(early z boost).I enable ALPHA_TEST,I still have 500FPS.I disable DEPTH_TEST,and I'm back to 25FPS.