Originally posted by Korval:
One, it’s a hint. If you’re trying to base an algorithm on a hint, you’re doomed. You can’t even test to see if the implementation is following it; all you can do is hope it is. What good is that when an implementation is ignoring you silently? Particularly when the algorithm you choose is based on it and will fail if the hardware doesn’t follow the hint.
Oh no, the algorithm will always work, because something akin to glDepthBoundsEXT (or a simpler depth test) will still cull the fragment properly. That much works fine even now.
It’s simply a question of whether it works 10x slower than it otherwise would, because early-z isn’t killing fragments which would eventually be culled.
It’s like this. Some passes do lots of work, and should be optimized with early-z. Some passes do very little work, and only set up the depth buffer for early-z on one of the heavy loads. Sometimes we alternate back and forth between the two types.
I’m simply wanting to avoid an extra texture-read on the depth-only passes by doing that depth-write while I’ve already got the relevant information in a register on the previous heavy-duty pass. Especially when each subsequent heavy-duty pass will never need to operate on pixels not seen in the last.
Two, it can cause breakage of other algorithms. Like using the depth buffer as a depth buffer. After all, when early-z is turned off, it is either for hardware reasons (the early-z logic is sometimes coupled with the alpha test logic) or because leaving it on would obviously break the intent of the depth buffer (fragment programs that change the depth). The particular case you cite with “discard” turning off early-z but depth_bounds undoing that seems to be a specific driver or hardware thing.
No, nothing would be broken that didn’t explicitly ask for early-z to re-enable itself after something turned it off. Default behavior would remain the same, obviously.
In your initial post, you mentioned, “explicitly writing depth still disables early-z even with depth_bounds”. To paraphrase Babbage, I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a comment. If a driver were to have early-z with a z-writing fragment program, that’s clearly an error. It violates the GL specification, along with the semantics of what it means to have a depth buffer and an active depth test.
It’s breathtakingly simple. You simply devise a test which is only executed at the start of a pass. You then allow fragment programs to write depth, understanding that what they’re writing is only relevant to what happens in the next pass.
I don’t know what the GL spec says about this, but it’s a completely obvious usage of early-z functionality, so the fact that it isn’t possible right now is, I say again, silly.
You don’t have to call it a depth test if you’d prefer…it’s just doing effectively the same thing using the same real buffer, but you can logically consider it something else easily enough.
And of course three, GL 3.0 almost certainly won’t have hints anymore. The design of GL 3.0 is such that either the feature exists and is real and you can rely on it, or it fails. No more of these “kinda working” things where you have to guess if it will work out OK. So even if you got your wish, it’d be short-lived.
Early-z has always been a kinda-working thing, as you’ve said yourself. Does this mean we can’t expect it to work at all anymore?
BTW, for your case of wanting to save the knowledge of whether some pixels are “good” and some are “bad”, you should use a second buffer and use multiple render targets. That second target is where you write your good/bad flag, not the depth buffer. Yes, you don’t get “early arbitrary buffer” testing as a performance enhancement, but your algorithm will be able to function.
Which is great if we could attach a special one-bit-per-pixel render target that wouldn’t strain write or read bandwidth at all. FBOs don’t allow that, unfortunately; in fact, I think no such internalFormat exists.
Still, if it did, it would be almost as good. Using such a thing for depth-only passes would effectively get you the desired behavior for free, since massive amounts of the thing could fit in the texture cache at once. It’d be a matter of microseconds even testing over a million fragments; right now, such a depth-only pass using RGBA float 32 textures takes about 2 milliseconds.
Even further, it may be a bug. For example, let’s say that the hardware implements “discard” by writing a failing depth value out of the shader, and preventing later shader ops from overwriting it (which is a valid though slightly silly way of implementing it). If that’s how it works, then this depth_bounds thing is literally a driver bug, one that may get patched later. So not only might it change, the change may be for the better overall.
The thing has been documented in several published papers as a useful exploit, so I think it’ll stick around. However, if not, everyone will just use CUDA or somesuch for more complex behavior anyway. This is all interim thinking.
Besides, discard can only ever make a depth value greater (assuming glDepthFunc is GL_LESS). So long as that’s all you do, there’s no reason early-z shouldn’t work. Some fragments may increase their depth, but none will become visible that would have been occluded.
Even a simple scheme that allows you to write depth without disabling early-z, so long as it’s greater than the interpolated depth, would be useful. And that’s not breaking the depth-buffer concept too badly, even.