Avoid glClear( GL_COLOR_BUFFER_BIT )?

What’s conventional wisdom nowadays on whether it’s worth it to avoid clearing the color buffer if you know that the entire screen will be overdrawn.

I know with depth and stencil you definitely want to clear (and may get some clear speed-up due to Hi-Z/ZCULL and its stencil equivalent). But I’ve not heard of anything analogous that might accelerate color clears.

Thanks.

I never clear color under that circumstance and it’s always faster for me (testing on a variety of ATI/NVIDIA and Intel). Like you I’ve never heard of anything that explicitly says do or don’t clear color.

I think the conventional wisdom applies more to old-school tricks to avoid clearing depth, like flipping the range on alternate frames (which I guess everyone would agree is not something you’d want to do nowadays).

That’s interesting. Thanks. Tends to be a really tiny perf difference for me, but that’s prob because I tend run the higher bandwidth boards of each GPU gen.

Drawing over a white pixel will be just the same speed as drawing over a black pixel, so I’d say don’t bother clearing it, just ensure you do draw to every pixel or you’ll get a nasty graphical effect due to double buffer switching between buffers with slightly different but out of date contents, which is much more offensive to the eye than just being able to see a background color.

http://developer.amd.com/media/gpu_assets/Depth_in-depth.pdf does mention that the color buffer uses a similar tile-based compression scheme to that used in HyperZ to save bandwidth, so if you’re drawing a background with a solid colour, it may be better to clear than to draw a full screen quad.

If you’re considering not clearing the color buffer, have you also switched to rendering the sky after all other opaque objects, sorted front to back, and if you have expensive fragment shaders, then with perhaps a pre-depth pass too? This could potentially save you time when the sky is entirely occluded by other objects.

AFAIK one reason the stencil buffer should also be cleared at the same time as the depth buffer is that they are often interleaved, so if you only clear the depth buffer by itself, or the stencil buffer by itself, then the data in the other buffer has to be kept, which slows things down.

That’s interesting. Thanks. Had skimmed that paper before, but hadn’t noticed that sentence.

If you’re considering not clearing the color buffer, have you also switched to rendering the sky after all other opaque objects, sorted front to back, and if you have expensive fragment shaders, then with perhaps a pre-depth pass too? This could potentially save you time when the sky is entirely occluded by other objects.

Typically not fill bound (again, high-end boards, plus lots of verts), and sky box last gives no real savings 99% of the time in our domain, but something I’ve got in the back of my mind in case we ever are.

As far as depth pre-pass, that’s never been a win when I’ve tried it. Given high-end cards and the shading isn’t super-expensive, submitting batches twice is by far more expensive and a performance loss than shading a few more pixels. However, on a fast CPU with slow GPU and expensive per-fragment costs, I could imagine that being a win.

AFAIK one reason the stencil buffer should also be cleared at the same time as the depth buffer is that they are often interleaved, so if you only clear the depth buffer by itself, or the stencil buffer by itself, then the data in the other buffer has to be kept, which slows things down.

I think you’re right. Not only that, from what I gather stencil has a hierarchical acceleration structure like depth (SCULL/Hi-Stencil) to avoid hitting the stencil buffer when you don’t have to. So in practice, it can be “cleared” quickly too. Humus mentions hierarchical stencil in that ATI Depth in-depth paper you referenced.

Thanks.

AFAIK one reason the stencil buffer should also be cleared at the same time as the depth buffer is that they are often interleaved, so if you only clear the depth buffer by itself, or the stencil buffer by itself, then the data in the other buffer has to be kept, which slows things down.:

:sorrow: Except The A to Z of DX10 Performance from GDC 2009 says:

I find it hard to believe that stencil clears are additional cost over depth, unless some newer funky hardware formats have emerged over the last short while.

Traditionally depth and stencil are interleaved in what D3D calls a D24S8 format (there’s also D24S4X4 but I guess it doesn’t get used so much). That on it’s own should tell you that clearing both at the same time is a simple 32-bit wipe, and even faster if the hardware can just set it back to the “compressed” state. Clearing only one will break this whole setup.

Theory aside, real-world benchmarks rule, and the real-world benchmarks I’ve run tell me that clearing both at the same time is still the way to go.

I guess whether you should clear together or not will require some internal knowledge of whether the depth+stencil are actually interleaved by the OpenGL implementation or not.

A format such as DEPTH32F_STENCIL8 probably isn’t interleaved, but DEPTH24_STENCIL8 probably is. Interleaved formats will need a Read/Modify/Write operation if you only clear one buffer.

It should be possible to clear non-interleaved formats + plain depth (DEPTH_COMPONENT16/24/32/32F) or plain stencil (STENCIL_INDEX1/4/8/16) renderbuffers attached to an FBO separately without a performance hit.

I guess if you were trying for the slowest clear possible, you could attach different DEPTH24_STENCIL8 renderbuffers to the depth + stencil attachments, and then clear! Assuming they are interleaved, then this would require Read/Modify/Write operations on both attachments to preserve the stencil data on the depth attachment and preserve the depth data on the stencil attachment.

On the subject of renderbuffers, I remember that the only way to successfully attach a stencil buffer to an FBO [on desktop GL] was using a texture of one of the combined depth-stencil formats, though maybe this has changed. The spec said you could attach stencil separate like, but before I just settled on the mix format, I never got a stencil except by using the combined depth-stencil formats. Admittedly, this was over 3 years ago though. Also, I think all the combined depth stencil formats on desktop GL implementations are interleaved (with empty bytes added for those not 32-bit aligned, i.e. all but DEPTH24_STENCIL8), but I have no proof of this. On a side note, a great deal of the embedded GPU’s do not have the stencil and depth interleaved and attaching separate depth16 (or depth24 or depth32) and stencil8 renderbuffers always works. One of the few places where the embedded world follows a spec better than the desktop world.

On a side note, for any tile based renderer, you are guaranteed to get a performance if you clear all buffers(color, stencil and depth) between frames [such GPUs include ARM Mali, PowerVR SGX and I think Qualcomm GPUs]. It comes down to the fact that for these clearing the buffer actually does not trigger a memset, but rather signals the GPU to not copy the buffer contents back into SRAM (one tile at a time) when a frame is done (usually at eglSwapBuffers, or glFinish, though FBO switches almost always trigger a tile walk).

Getting back to clearing the colour buffer (GL_COLOR_BUFFER_BIT), under Linux Radeon HD6xxx system with Catalyst 11.08 drivers, I’ve noticed that when I clear the colour buffer, I get 97.800003 fps, and when I don’t clear the colour buffer, I get 93.700005 fps.

In both cases, I draw a fullscreen background (no blending), so I’m confused as to why it’s slower when I DON’T clear the colour buffer.

It’s probably a simple cache optimization, much like clearing depth. Rather than reading a cache-line from a cleared color buffer, it just knows that the value will be the clear color. So it doesn’t need to read a cache-line to write data to that area, which is a minor performance savings.