Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 10 of 10

Thread: Simple blending shader using image load/store produces false results

  1. #1
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,128

    Simple blending shader using image load/store produces false results

    Hey everyone.

    While reasoning about the problem in this thread I started fiddling around with image load/store. Now, as a first exercise I went for simple compositing of two images inside a shader storing the result in a third image. Aside form not having to use a third image at all using image load/store, I came across the following behavior where the first image is the correct result after the first frame and the second is what I get when rendering the second, third, to the n-th frame:

    Click image for larger version. 

Name:	right.jpg 
Views:	54 
Size:	9.2 KB 
ID:	825 <-> Click image for larger version. 

Name:	wrong.jpg 
Views:	71 
Size:	8.6 KB 
ID:	826

    It's standard stuff - rendering a fullscreen quad with a pass-through vertex shader (plus tex coord) and the following fragment shader:

    Code :
    #version 420
     
    layout(binding = 0, rgba8) uniform image2D ImageA;
    layout(binding = 1, rgba8) uniform image2D ImageB;
    layout(binding = 2, rgba8) uniform image2D ImageResult;
    layout(binding = 2) uniform sampler2D ImageResultTex;
    layout(location = 0) out vec4 FragColor;
     
    in vec2 InterpTexCoord;
    const int Size  = 512;
    const float Alpha = 0.5;
     
     
    void main()
    {
        ivec2 ImageTexCoord = ivec2(InterpTexCoord * Size);    
     
        // Image A and B are declared read-only
        vec4  ColorA        = imageLoad(ImageA, ImageTexCoord);    
        vec4  ColorB        = imageLoad(ImageB, ImageTexCoord);
        vec4  ColorResult   = ColorA * (1.0 - Alpha) + ColorB * Alpha;        
     
        // ImageResult is declared read-write
        imageStore(ImageResult, ImageTexCoord, ColorResult);
     
        // false results after the first frame        
        FragColor = imageLoad(ImageResult, ImageTexCoord);
     
        // regular texture lookup is always correct
        //FragColor = texture(ImageResultTex, InterpTexCoord);
    }

    The image unit setup is done as follows:

    Code :
    glBindImageTexture(0, tbo_a_, 0, GL_FALSE, 0, GL_READ_ONLY, GL_RGBA8);    
    glBindImageTexture(1, tbo_b_, 0, GL_FALSE, 0, GL_READ_ONLY, GL_RGBA8);        
    glBindImageTexture(2, tbo_result_, 0, GL_FALSE, 0, GL_READ_WRITE, GL_RGBA8);

    The texture objects are setup correctly as well.

    As can be seen in the above code, only image loads return the wrong value after the first frame. Doing a normal lookup with the corresponding sampler always succeeds.

    I assumed the above should succeed because as I read the specs (GL and GLSL), memory transactions in a single invocation of the fragment shader are well-defined and need not be synchronized using a coherent qualifier or a memoryBarrier(). Correct?

    I get no errors at any time. The GPU is Radeon HD 6350 with Catalyst 12.5 reporting 8 image units.

    Does anyone spot what's going wrong?

  2. #2
    Member Regular Contributor malexander's Avatar
    Join Date
    Aug 2009
    Location
    Ontario
    Posts
    315
    Have you tried putting a memoryBarrier() call between the imageStore() and imageLoad() to ImageResult? I'm pretty sure the imageLoad() won't wait for the store to complete otherwise.

  3. #3
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,128
    Yes, I did - with no effect. As I stated, I don't think the barrier is necessary there because I'm altering one texel per invocation and so no other invocation is dependent on the result. If I'm not mistaken memoryBarrier() is only needed if you have other invocations needing to see the results of memory transactions of the current invocation.

  4. #4
    Member Regular Contributor malexander's Avatar
    Join Date
    Aug 2009
    Location
    Ontario
    Posts
    315
    With all the layers of memory and cache on a GPU, I would expect that you'd need some sort of read/write synchronization, even on the same texel. But if the barrier doesn't work and coherent doesn't work, perhaps it's a driver bug.

  5. #5
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    You didn't declare your image values as `coherent`. You also didn't put a proper memory barrier between the write and the read. You must do both in order for a write followed by a read to work.

  6. #6
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,128
    Hmm, just checked on my home machine with the exact same code, but an HD 6780 with Catalyst 12.6 installed and it works. I'm confused.

  7. #7
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    That's because it's undefined behavior. It may work and it may not.

    If you make your variables coherent and use a proper memory barrier, then it will work everywhere.

  8. #8
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,128
    Quote Originally Posted by Alfonse
    You also didn't put a proper memory barrier between the write and the read. You must do both in order for a write followed by a read to work. [..] If you make your variables coherent and use a proper memory barrier, then it will work everywhere.
    But how does that correllate with what the spec is saying?

    Quote Originally Posted by The GLSL Spec
    When writing a variable declared as coherent, the values written will be reflected in subsequent coherent reads performed by other shader invocations.
    and

    Quote Originally Posted by The GLSL Spec
    When this function (they mean memoryBarrier()) returns, the results of any memory stores performed using coherent variables performed prior to the call will be visible to any future coherent memory access to the same addresses from other shader invocations.
    This perplexes me. Say the fragment shader is executed exactly once per texel and invocations operate exactly on that texel. Why would I need to declare the variables corherent in this case? I swap buffers immediately after rendering the quad and do no transactions in the meantime. I inserted a memoryBarrier() after the store and it had no effect but, to be fair, I didn't declare the variables as coherent.

    Quote Originally Posted by The GLSL Spec
    While the order of reads and writes within a single shader invocation is well-defined, the relative order of reads and writes to a single shared memory address from multiple separate shader invocations is largely undefined. The order of memory accesses performed by one shader invocation, as observed by other shader invocations, is also largely undefined but can be controlled through memory control functions.
    Again, if no interaction between invocations is given, why is it a problem for a single invocation if "the order of reads and writes within a single shader invocation is well-defined"? Although they are subsequent operations in a single invocation they may still be executed out of order? I have no trouble grasping that you need to synchronize when an invocation depends on data written by another but during the same single invocation? How is it undefined if it's well-defined?

    Maybe it's just me.

  9. #9
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,128
    Having just tested the same thing on the Radeon 6350 with Catalyst 12.6 I can state that it's a bug in Catalyst 12.5

    Anyway, I'd still like to hear from you guys on the above matter.

  10. #10
    Junior Member Regular Contributor
    Join Date
    Mar 2009
    Posts
    153
    In my opinion you don't need neither memoryBarrier() call nor 'coherent' qualifier and your shader code is correct. This is because each shader invocation accesses unique memory location and no interaction between invocations occur.
    Last edited by randall; 07-27-2012 at 05:42 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •