Depth Clamped Decals

Hello,

I am trying to clamp decals to walls such that they do not overhang edges by testing if decal fragment depths are within a certain tolerance of the existing depth buffer value.

Is there a way to do this without a shader?

I tried using a shader, but it doesn’t seem to read the depth properly. The decals just disappear if they are a certain distance away.

Here is the pertinent line in the shader:


if(gl_FragCoord.z < texture2D(gDepthSampler, gl_FragCoord.xy / vec2(windowWidth, windowHeight)).r - 0.01)
        discard;

Thank you for any help you can offer.

Precision will always be an issue. Since you are dealing with floating point values, you need to do something like this


float depth = texture2D(gDepthSampler, gl_FragCoord.xy / vec2(windowWidth, windowHeight)).r;
if((abs(gl_FragCoord.z - depth) < 0.01)
  discard;

but perhaps it isn’t a good idea to use a fixed value like 0.01 since depth precision changes from back to front (when you are using a perspective projection).
There is less precision at the back than there is at the front.
Perhaps instead of a fixed value, use a 1D texture (floating point) with various values :


float depth = texture2D(gDepthSampler, gl_FragCoord.xy / vec2(windowWidth, windowHeight)).r;
float precision = texture1D(Sampler, gl_FragCoord.z).r;
if((abs(gl_FragCoord.z - depth) < precision)
  discard;

Thank you for your reply!

Precision will probably become an issue later on, but right now it seems like it isn’t reading the depth texture properly.

I set GL_TEXTURE_COMPARE_MODE to GL_NONE, and disabled depth buffer writes (glDepthMask). However, I seem to always get the same value from the texture, since the decals disappear all at once at a certain distance.

What are the proper settings for reading the depth texture in a shader?

Bumping…

Bumping…

Bumping…

Values in a depth texture will be in the range [0,1], and non-linear at that. To convert them back to a world space, you need to run them through:

Wz = a /  (depth + b)

where:

a = (far * near) / (near - far);
b = -0.5 * (far+near) / (far-near) - 0.5;

and far and near are the values of the near and far clip planes. You can precompute a and b in your CPU code and send them as a vec2 uniform to the shader. These values are taken from the inverse projection matrix entries that affect Z.

Doing this on both gl_FragCoord.z and the depth texture value will allow you to use a fixed Z offset based on your scene.

Also, if you’re using an AMD card, it doesn’t appear to like reading from a depth texture that’s attached to the current draw framebuffer, even if depth writes are disabled.

Is GlDepthFunc still around? I don’t suppose the tolerance for GL_EQUAL is acceptable if so. I always thought depth tests should have a tolerance state.

Is GlDepthFunc still around?

A better question would be, “How could it be gone?” If you don’t have glDepthFunc, you don’t have depth testing, since the function is what defines the test. It’s not like the only possibility other than GL_LESS was GL_EQUAL; even if they ditched GL_EQUAL (and there’s absolutely no reason to do so), they still would have to handle GL_GREATER and so forth.

I don’t suppose the tolerance for GL_EQUAL is acceptable if so.

The tolerance for GL_EQUAL is and has always been acceptable: equals. Not “kinda equals”. Or “nearly equals.” It’s “equals.” That is the tolerance for it, and sometimes that is exactly what you need.

^It was an expression; but then again you never know. I was just curious because that seemed like what the OP is looking for. But I realize z-fighting and all. I am way behind the times OpenGL wise. ES is just like I remember it. But I don’t know how the main branch has evolved in the last 5 to 10 years. I intend to find out over the course of the next few years.

Was just offering maybe another way to think about the problem because a depth buffer seems pretty drastic unless its possible to sample the bound depth buffer or it is already handy or the application is solely about decals! GL_EQUAL seems pretty useless because PolygonOffset can’t really address it, but I may be wrong. Don’t take my word for anything.

I was not able to get the shader to read the texture properly (I tried detaching the depth texture from the FBO, no success), but I found it to be much more convenient to just use the stencil buffer instead of a shader to do the depth clipping.
I use glPolygonOffset, so the depth offset is automatically linearized.
Anyways, thanks for the help!

If you are still around? Can you post how you are using the stencil buffer?

If two decals overlapped would the buffer be corrupted, or is that not a problem? Or are draw calls always guaranteed to be synchronized so that it is safe to draw to the stencil buffer and then draw the decal over that, and then clear that region of the buffer. I always assumed draw calls to be asynchronous to allow for performance optimizations but I’ve never really thought about it. OpenGL still has a flush API right?

It’s probably not the best for performance, a shader would probably be faster, but it is good enough for what I am doing. Right now I am also binding/unbinding a shader for every decal, which probably hurts performance, but this should be easy to work around (especially in a deferred shading system, since you can just keep the deferred rendering shader bound for both passes of the decal).

Overlapping decals are not a problem, it runs synchronously.

  
    glEnable(GL_STENCIL_TEST);
    glClearStencil(0);
    glClear(GL_STENCIL_BUFFER_BIT);


    glDepthFunc(GL_LEQUAL);
    
    glDepthMask(false);


    glEnable(GL_POLYGON_OFFSET_FILL);
    
    for(unsigned int i = 0, numDecals = m_pDecals.size(); i < numDecals; i++)
    {
        glColorMask(false, false, false, false);
        Shader::Unbind();


        glPolygonOffset(2.0f, 2.0f);


        // Mark parts where depth test fails
        glStencilFunc(GL_ALWAYS, 1, 0xff);
        glStencilOp(GL_KEEP, GL_REPLACE, GL_KEEP);


        m_pDecals[i]->Render_Batch_NoTexture();


        glColorMask(true, true, true, true);


        // Stencil test
        glStencilFunc(GL_EQUAL, 1, 0xff);
        glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
    
        glPolygonOffset(-2.0f, -2.0f);


        m_pDecals[i]->Render_Batch_Textured();


        glClear(GL_STENCIL_BUFFER_BIT);
    }


    glDisable(GL_POLYGON_OFFSET_FILL);
    glDepthMask(true);


    glDepthFunc(GL_LESS);


    glDisable(GL_STENCIL_TEST);

I tried working around using glClear by changing the stencil operation to GL_ZERO, GL_ZERO, GL_ZERO, but it only zeros the fragments that failed the test so it does not work.

So it doesn’t look like performance is a concern here. I think if you know the bounds of your decals in screen space you could use glScissor to minimize fill.

PS: For the record I had said before something like GL_EQUAL is useless. I think it probably is in this scenario. But I assume it is used primarily for deferred rendering if you can assume that the precision of the rasterizer will be exactly equivalent, at least with respect to the precision of the depth buffer.

If your decals are precisely lined up with the planes that they are being applied to it seems like glDepthFunc(GL_EQUAL) without glPolygonOffset would do the trick. It may be a naive approach, but have you even tried it?

glFlush might be appropriate too. I dunno. http://stackoverflow.com/questions/3799839/how-can-glflush-affect-rendering-correctness probably has the answer…

^http://www.opengl.org/wiki/Common_Mistakes#glFinish_and_glFlush

The short of it is flush can help on the CPU end if you are not going straight to SwapBuffers.

I guess if there are any optimizations for asynchronous rendering the driver would need to be able to analyze the state of the device and do so conservatively. Otherwise each draw call will have to finish before the next can begin. Seems short sighted to me.

Maybe there are newer APIs for asynchronous rendering (you’d think there would be some devices that are able to do it)

Otherwise each draw call will have to finish before the next can begin.

Each triangle’s rendering, from the top of the pipeline to the bottom, must act as though every triangle rendered beforehand had completed. Otherwise, there’s no way depth test, stencil test, or even blending could ever be specified to work in any consistent, reasonable way.

However, note the key phrase: “act as though”. The rendering system only needs to make sure that everything works out as if this were the case. It can reorder things however it wants, so long as all of the testing, blending, etc operations proceed as if everything rendered in a specific order.

All of the pre-rasterizer stages (vertex shaders, tessellation, geometry shaders, etc) can be parallelized all it wants, so long as the triangles on the other end come out in the expected order. So it can process groups of 16 vertices from the input attribute stream, so long as the triangles come out in the expected order.

Fragment shaders can operate independently of the blending units (which is one reason why they can’t do framebuffer pixel read-back). Early-depth-test has to proceed in-order, but it can be massively parallel. Hi-Z can cull large blocks of fragments from a triangle. And the ROP units (late depth-test, stencil tests, blending, etc) operate in-order, but are fixed-function and working over very small datasets.

That’s one of the reasons image load/store is so complicated; it doesn’t provide these kinds of ordering guarantees, so you have to make sure everything is properly done. While that’s fine for those cases when you’re arbitrarily reading/writing stuff, I wouldn’t want to have to do that kind of synchronization all the time when the driver can do it for me.

Maybe there are newer APIs

The OpenGL specification is not hidden, and the most recent version isn’t that difficult a read. You should familiarize yourself with it before wondering about whether something exists.

[QUOTE=Alfonse Reinheart;1242447]Each triangle’s rendering, from the top of the pipeline to the bottom, must act as though every triangle rendered beforehand had completed. Otherwise, there’s no way depth test, stencil test, or even blending could ever be specified to work in any consistent, reasonable way.

However, note the key phrase: “act as though”. The rendering system only needs to make sure that everything works out as if this were the case. It can reorder things however it wants, so long as all of the testing, blending, etc operations proceed as if everything rendered in a specific order.[/quote]

That’s right. But the client could make many optimizations since it knows more than the driver can know. If it were able to flush the calls manually. Just for the record, the glFlush explanation in the wiki on the site here does give an opt in for APIs that explicitly call themselves asynchronous.

The OpenGL specification is not hidden, and the most recent version isn’t that difficult a read. You should familiarize yourself with it before wondering about whether something exists.

You gotta choose your battles. That is a pass so that people do not feel obligated to chime in on the subject :slight_smile:

But the client could make many optimizations since it knows more than the driver can know.

Like what?

Like if two draw calls overlap in screen space. Or if two operations can even interfere assuming pixel writes are atomic (that the driver may or may not optimize around)

Like if two draw calls overlap in screen space. Or if two operations can even interfere assuming pixel writes are atomic (that the driver may or may not optimize around)

I’ll assume that you meant, “Like if two draw calls don’t overlap in screen space.” So what if they don’t? Do you think that the hardware is incapable of figuring out how to order the fragments spit out by the rasterizer? Do you think that the driver and hardware can’t work this stuff out on its own?

My point is that hardware already does this. It is not rasterizing one triangle, writing its values, then rasterizing the next. It is a pipelined, parallel process. The end products appear to have been rendered in-order, but that doesn’t mean that’s how the hardware did it.