Cost of Enabling/Disabling Depth Buffer

karbuckle · December 12, 2003, 9:51am

I’m trying to determine the fastest method to handle a particular procedure and wanted to find out how expensive it is turn off/on the depth buffer frequently.

I basically need to display the data twice – once with the depth buffer on and once with it off. I can either make two complete passes of the database, which has some overhead in doing so. Or I can turn on/off the depth buffer at the time of drawing each entity or groups of entities.

Obliviously the best way to find out which way is fastest is to code both methods, but I was trying to get some feel for which way to go before doing this.

Thanks in advance.

Csiki · December 12, 2003, 12:38pm

Originally posted by karbuckle:
[b]I’m trying to determine the fastest method to handle a particular procedure and wanted to find out how expensive it is turn off/on the depth buffer frequently.

I basically need to display the data twice – once with the depth buffer on and once with it off. I can either make two complete passes of the database, which has some overhead in doing so. Or I can turn on/off the depth buffer at the time of drawing each entity or groups of entities.

Obliviously the best way to find out which way is fastest is to code both methods, but I was trying to get some feel for which way to go before doing this.

Thanks in advance.[/b]

It’s cheap:

Enable/Disable (ms), Clock (1.53GHz)
GL_ALPHA_TEST 2.2488892e-005; 34
GL_BLEND 2.891429e-005; 44
GL_COLOR_MATERIAL 5.782858e-005; 88
GL_CULL_FACE 2.3047622e-005; 35
GL_CLIP_PLANE0 3.1288891e-005; 47
GL_DEPTH_TEST 2.891429e-005; 44
GL_DITHER 2.6679369e-005; 40
GL_FOG 2.3746035e-005; 36
GL_LIGHTING 0.00013367621; 204
GL_LINE_STIPPLE 1.8298415e-005; 28
GL_LINE_SMOOTH 4.1904768e-005; 64
GL_LOGIC_OP 2.6819051e-005; 41
GL_NORMALIZE 2.5003177e-005; 38
GL_POINT_SMOOTH 1.8019049e-005; 27
GL_POLYGON_SMOOTH 2.2628574e-005; 34
GL_POLYGON_STIPPLE 2.234921e-005; 34
GL_SCISSOR_TEST 0.00042449529; 650
GL_STENCIL_TEST 2.3187304e-005; 35
GL_TEXTURE_1D 3.1987307e-005; 49
GL_TEXTURE_2D 3.1987307e-005; 49

GeforceTi4200, 52.16 driver

Cyranose · December 12, 2003, 1:52pm

Originally posted by Csiki:
[b] It’s cheap:

Enable/Disable (ms), Clock (1.53GHz)
GL_ALPHA_TEST 2.2488892e-005; 34
GL_SCISSOR_TEST 0.00042449529; 650

GeforceTi4200, 52.16 driver[/b]

It looks like you just timed the calls. If so, I’m not sure that’s the best way to test.

I don’t know how NVidia or ATI handle this now, but I’ve used hardware where I was told by the HW folks that a significant part of the cost was due to the SIMD nature of the HW, which I understood to mean on certain state changes (e.g., lighting on/off), all units would have to flush, swap or jump in microcode, and then continue with the new instructions and new data. So a better test would note the change in rendering times between various ways of sorting the state changes with full loads of identical data and identical results – it’s more complicated but more useful data, IMO.

FWIW, my guess is that depth write on/off will be cheaper than texture and vertex buffer binds, but not as cheap as changing matrices. So a reasonable approach would be to sort by texture first [or first, after transparency, lighting, or other bigger state changes], bind a combined VB per texture, and do your two full passes, once with depth on and once with it off, rather than iterating on/off for each object.

Avi

[This message has been edited by Cyranose (edited 12-12-2003).]

imported_jwatte · December 12, 2003, 6:39pm

That measurements means very little, I think. Especially seeing as you’re not measuring the effects across multiple SwapBuffers() calls with a full pipeline.

My guess is this:

If you want maximum graphics throughput, and you are not CPU limited, then spending the CPU on traversing your database twice is the way to go.
If you are CPU limited (what does the profiler say?) and traversing the databse is the bottleneck (this would be VERY surprising! Rewrite your database!) then switching back and forth is better.

For the record, I believe you have to flush (bubble, stall, whatever) the pipeline when you enable/disable, because of early Z testing and whatnot. Pipeline stalls/flushes cost graphics throughput if the chip is running, but if you’re CPU limited, then the chip stalls out regularly anyway so there’s no big loss.

There’s also the issue of correctness. I’m assuming you know what you’re doing, and this is not for the traditional “transparency last” situation, in which case you clearly can’t enable/disable as you visit; you have to draw transparent things last.

zeckensack · December 13, 2003, 7:10am

My little rule of thumb:
If you have >=2500 vertices per batch, you’re doing pretty damn well already.

Ie if your objects are roughly 2500 verts each (or perhaps more), changing state once per object shouldn’t impact performance much at all.

With depth buffering this becomes tricky, however, because what you rendered earlier will somewhat influence the fill rate efficiency of what you render later.

What exactly do you mean by “once with the depth buffer on and once with it off”?
a)Disabling depth writes
or
b)Disabling the whole depth test
(or
c)setting depth function to GL_ALWAYS and leaving the depth test on)

?

Csiki · December 13, 2003, 8:56am

>That measurements means very little, I think.
>Especially seeing as you’re not measuring the effects across multiple SwapBuffers() calls with a full pipeline.
To tell the truth, I use the swapbuffer in sync mode, so I didn’t count it.

But even if I measure the effects of the swapbuffer, I don’t understand why shoud be flush the pipeline after changing the Z buffer?
Why can’t be that Z buffer switching is pipelined as any other simple thing (sending a vertex etc.).
You don’t call the swapbuffer more than 1000/sec (it must be a very simple scene), it mustn’t be a cruical performance factor…

zeckensack · December 13, 2003, 9:54am

Csiki,
that’s all right but it still doesn’t make your benchmark numbers representative.

State machines are generally designed in a way that functions to change states only consist of checking for argument errors and the recording of the new state.

The state change will only be fully applied the next time the machine does something, ie glBegin and equivalents in OpenGL.

You didn’t think the GL driver can touch a hardware register in 40 cycles (26 nanoseconds), did you?

evanGLizr · December 13, 2003, 7:26pm

Originally posted by zeckensack:
[b]The state change will only be fully applied the next time the machine does something, ie glBegin and equivalents in OpenGL.

You didn’t think the GL driver can touch a hardware register in 40 cycles (26 nanoseconds), did you?[/b]

You didn’t think the GL driver needs to touch any hardware register to set the depth mode, did you . What the hardware does is to queue the command to set the depth mode in the DMA buffer, so there’s no direct register tampering involved (other than the DMA registers, but those are not modified in a per-command basis, but in a per batch basis).

Nah, just nitpicking. You are right that simply measuring function call rates is not very significative, specially if the driver does dirty state management (sending only those commands for state that really changed between glBegin batches), but it’s actually what the original poster wanted.

To answer the original question, the cost of setting the depth buffer enable/disable should be negligible.

Csiki · December 14, 2003, 1:49am

Originally posted by zeckensack:
[b]Csiki,
that’s all right but it still doesn’t make your benchmark numbers representative.
State machines are generally designed in a way that functions to change states only consist of checking for argument errors and the recording of the new state.
The state change will only be fully applied the next time the machine does something, ie glBegin and equivalents in OpenGL.

You didn’t think the GL driver can touch a hardware register in 40 cycles (26 nanoseconds), did you?[/b]

Yes.
The test shows that it’s true for some cases and not true for others…
For example setting the viewport or the scissor test is quite expensive so don’t set it if you don’t sure that you will save fillrate.
And as I’ve written already, it’s likely that such things (it’s a simple ask from the gpu to do or not do something!) are pipelined already (or at least will be).

The test is not mine, I’ve changed just a bit (output writing code). I got it from one of these OpenGL forums (I don’t remember).
I didn’t find any correct test for programmers from things like this. I think, it’s better than nothing…

imported_jwatte · December 14, 2003, 8:17am

“the cost of setting the depth buffer enable/disable should be negligible”

Please define “cost”.

If the definition of “cost” is “time it takes to issue the call to OpenGL” then the measurements are OK.

If the definition of “cost” is “reduction in frame rate on a full-load scene” then the timing measurements are (my experience) not representative.

evanGLizr · December 14, 2003, 12:52pm

Originally posted by jwatte:
[b]“the cost of setting the depth buffer enable/disable should be negligible”

Please define “cost”.

If the definition of “cost” is “time it takes to issue the call to OpenGL” then the measurements are OK.

If the definition of “cost” is “reduction in frame rate on a full-load scene” then the timing measurements are (my experience) not representative.[/b]

You are right in that chaning the depth buffer settings may introduce a stall in the pipeline, but in most of the cases you are transformation bound or fill-rate bound, there are very few cases where you are message-rate bound (plain data upload to the card comes to mind) so the stall will be hidden or too small to be noticeable.

That’s why I think that the cost of setting the depth test enable or depth test mode should be negligible, no matter what cost you refer to.

Regarding his timing measuraments being representative or not, it depends on what he wants to measure. Looking at his problem (whether traversing the database twice vs. changing the depth enable on each primitive batch), the best experiment to do is to measure both alternatives (but my guess is that, if anything, changing depth enable will be faster). Any other measurement is going to be biased enough not to represent his real case (different batch sizes, different tri sizes, different cpu load…).

[This message has been edited by evanGLizr (edited 12-14-2003).]

karbuckle · December 15, 2003, 4:21am

A big thanks to all who have replied!!!

Sorry I haven’t been able to respond sooner, but I’ve been away from the computer until this morning.

Regarding the “cost” that I mentioned…it is not the frame rate per se that I am concerned with, but the overall time to complete the display of my entities. Some of which take a bit of calculation to generate the displayable data and therefore, I didn’t want to do this twice unless it was necessary.