Freaky gl_FragDepth

This I admit is a peculiar sounding suggestion but I have use cases for it. The basic idea is this:
[ul]
[li]depth test is performed with gl_FragCoord.z, i.e. the depth value determined from rasterization[/li][li]BUT the depth value written to the depth buffer is gl_FragDepth[/li][/ul]

Using this, one can have the depth test as GL_LESS, and yet after a draw, some values of the
depth buffer get larger. I admit that, again, this is odd, but I do have use cases.

Emulating this behavior with GL_ARB_shader_image_load_store by excessive use of memoryBarrier()
is a bad idea, just as what happens in trying to use it to emulate GL_EXT_shader_framebuffer_fetch of GLES land.

This has been asked for before, the ability to lie to the depth test by forcing it to test against one value, then write a different value if it passes.

It’s important to note that the explicit early depth test feature that the ARB added with shader_image_load_store [i]explicitly forbids this[/i]. They could have allowed it very easily, since they had to add specific language to say that it doesn’t work, that the depth value that gets tested will be the depth value that gets written. But they explicitly put the language in there to stop this exact thing from working.

There’s probably a good reason for that.

The blatantly obvious thing is the following:

  1. if a shader has early z-test on, then all writes to gl_FragDepth are ignored (as is currently)
  2. if “freaky depth is on”, then fragment write happens if and only if the rasterizer produces depth passes

In particular, if shader has early z-test on, then regardless if freaky depth is on, the value written to gl_FragDepth is ignored. That is consistent with the current situation anyways.

You’re missing the point. Your “freaky depth is on” is nothing more than “early-z + write whatever gl_FragDepth says”. That would provide 100% of the functionality you’re asking for.

The ARB had the opportunity to provide exactly this functionality. And yet, they explicitly forbid it. That’s a pretty good indication that what you want just isn’t possible with current hardware. Or at the very least, is something that all the IHVs agreed was a Bad Thing.

Like I said, there’s probably a good reason why they didn’t allow this.

I am not claiming that this is for current GL4 hardware, but for “next version of GL” which could mean for GL5.

This is not just early z + write whatever to gl_FragDepth. I really want the depth test to be against the value from the rasterizer, but the depth buffer updated to a different value that may or may not pass the depth test.

I suspect why it was not done is simple: the current GL implementations have that changing the Z forces the whole fragment check thing, i.e. hardwired into current hardware. As to why, I think the reason is barbarically simple: no version of D3D has this feature or anything really like it, so no hardware has the feature. Almost all of features in GL are found in D3D first OR extensions to what D3D requires that were quite easy to tack on. Other GL features that are not core, ARB or EXT, are then particulars of specific hardware (how I love thee, GL_NV_shader_buffer_load/store).

But now, the grapevine is like, there will be no D3D12, so… makes me wonder…

I am not claiming that this is for current GL4 hardware, but for “next version of GL” which could mean for GL5.

There’s been no suggestion that I’m aware of from the IHVs that there’s going to be a new generation of hardware coming out soon. At least, not a new generation that offers any significant functionality differences.

This is not just early z + write whatever to gl_FragDepth. I really want the depth test to be against the value from the rasterizer, but the depth buffer updated to a different value that may or may not pass the depth test.

That’s exactly what that would be. The early depth test tests against gl_FragCoord.z. If it passes, the fragment shader executes. And if gl_FragDepth is honored after the test, then the value written to the Z buffer will necessarily be the changed value.

How is that not what you’re asking for?

As to why, I think the reason is barbarically simple: no version of D3D has this feature or anything really like it, so no hardware has the feature.

I think you have that backwards. Microsoft doesn’t dictate from on high what goes into D3D without consultation with IHVs. They get together, and Microsoft probably pushes for things. But stuff doesn’t go into D3D unless the IHVs agree to it. Just look at the D3D10 debacle as evidence of that.

Originally, a form of tessellation was going to be in D3D10, which is why AMD_vertex_shader_tessellator exists. Note that this is a pre-vertex shader tessellation stage. But NVIDIA didn’t want to do it. Maybe for good reason, maybe not. But because of that, Microsoft couldn’t put it in. This also is what led to D3D10.1, which was just D3D10 with minor bits of stuff. Minor bits of stuff that notably NVIDIA did not implement until their D3D11 hardware, while virtually all AMD hardware was D3D10.1 capable.

So I don’t think it’s that D3D doesn’t have the feature. It’s more likely that the IHVs don’t want the feature, and that’s why D3D doesn’t have it.

But now, the grapevine is like, there will be no D3D12, so… makes me wonder…

What grapevine is that exactly?

Also, that’s not terribly surprising. With shader_image_load_store, there’s really just not very much left to add. Oh sure, you might want blending in shaders, but unless there’s some specialized hardware for it, you can cover that with load/store.

The most you might get is some form of streaming textures or whatever.

The current generation of hardware from both NVIDIA and AMD has features that are not exposed in D3D11 at all, for example NVIDIA’s bindless texture.

From a simplistic point of view, what I am suggesting likely means that the fragment shader will always get executed, even for those fragments that are discarded by gl_FragDepth.z. The reason is that because if there are two overlapping primitives, for the fragments that overlap, the values to use for the depth test for the next primitive will not be completed until the overlapping fragments of first primitive are done.

That is my point to some extent, an organized body pushes for elements to place in the hardware. That organized body also has developer clout, for Micrsoft that clout being it defines the API’s used on Windows. Similarly, console creators can push for features on the chips of their hardware, or infact insist on them. The PS4 is quite exciting to me with AMD’s heterogeneous unified memory access.

In contrast OpenGL does not have much of a driver. The drivers for OpenGL features are:

[ul]
[li]ISV’s that talk to IHV[/li][li]Suggestion forums[/li][li]IHV’s looking to expose abilities their hardware has[/li][/ul]

But none of that is organized.

You cannot do custom blending with ARB_shader_image_load_store unless you place memory barrier within you fragment source code. There is no guarantee that the fragments of primitive A will be processed before the fragments of primitive B even if A and B overlap with A being specified first. By placing a memory barrier, that means that the shader stalls waiting for those fragments of before to finish. This is BAD. All latency hiding just went out the window and the system will crawl.

Edit: actually even with memory barrier in the fragment shader you still do not get it because the rasterizer likely generates the fragments for B while the fragments of A are still in flight, and it is possible that some of the fragments of B will get processed before the fragments of A that share it (for example imagine that A is big giant triangle, B is a small portion at the center of that triangle, then the rasterizer will likely start queuing up the fragments of B long before the fragments of A are processed, and possibly those of B will find themselves starting to be processed waiting for a stall of current in flight A fragments.