I am trying to implement a certain multi-pass rendering algorithm, am I am getting artifacts. Debugging this issue, I have cut down my shaders to just a few lines that still reproduce the problem. So I must be staring at the bug, but a whole day of staring at the code and OpenGL ES 3.1 spec and I still don’t understand what is going on…
Both passes share the same vertex shader:
#version 310 es
in vec2 a_Position;
out vec2 v_TexCoordinate;
void main()
{
v_TexCoordinate = a_Position + 0.5; // this makes it [ 0.0 , 1.0 ]
gl_Position = vec4(2.0*a_Position,1.0,1.0);
}
Pass1 fragment shader:
#version 310 es
out vec4 fragColor;
in vec2 v_TexCoordinate;
uniform vec2 u_Size; // Size of the screen (width x height)
layout (binding=0, offset=0) uniform atomic_uint u_Counter;
layout (std430,binding=1) buffer linkedlist
{
uint u_Records[]; // number of entries: twice the number of pixels on the screen
};
void main()
{
uint index = uint( (v_TexCoordinate.x + v_TexCoordinate.y * u_Size.y) * u_Size.x); // we are rendering a quad = unique index for pixel AFAIK ?
uint ptr = atomicCounterIncrement(u_Counter) + uint(u_Size.x*u_Size.y); // unique location the 'real per-pixel value' is stored at
u_Records[index] = ptr; // store a 'pointer' to the real value (kind of 'per pixel head pointer')
u_Records[ptr ] = (v_TexCoordinate.x>0.5?2u:1u); // store the 'real value' - 1 for all pixels on the left of the screen and 2 otherwise
discard;
}
Pass2 fragment shader:
#version 310 es
out vec4 fragColor;
in vec2 v_TexCoordinate;
uniform vec2 u_Size;
layout (std430,binding=1) buffer linkedlist
{
uint u_Records[];
};
void main()
{
uint index = uint( (v_TexCoordinate.x + v_TexCoordinate.y * u_Size.y)* u_Size.x);
uint ptr = u_Records[index]; // retrieve the 'per-pixel unique pointer to the real value'
if( u_Records[ptr]==2u ) fragColor = vec4(1.0,0.0,0.0,1.0); // if the real value is 2, draw a red pixel
else if( u_Records[ptr]==1u ) fragColor = vec4(0.0,1.0,0.0,1.0); // if the real value is 1, draw a green pixel
else fragColor = vec4(0.0,0.0,1.0,1.0); // this should never happen
}
Application:
zeroOutAtomicCounter();
Pass1();
glMemoryBarrier(GL_ALL_BARRIER_BITS);
Pass2();
Now, as you can see I am using a flat table of uints in a SSBO to communicate between the Passes. In the first pass, I fill up the table with values, in the second - read it and display appropriate colours.
The table has exactly twice as many entries as there are pixels on the screen. In the first half, each entry holds an index to another location of the table, where the real per-pixel value is stored.
Pass2 follows this ‘two element per-pixel linked list’ and displays appropriate coloured pixels.
If things were working, I should be getting the left half of the screen solid GREEN and the right half - solid RED. This is what happens in case of 95% of the pixels on the screen, but about 5% are of the wrong color (some left pixels are RED and some right - GREEN). Those ‘wrong’ pixels keep dancing on the screen.
Now, I have spent the whole day thinking about this and the only possibility I see is that some invocations of Pass2 are running when some invocations of Pass1 still haven’t completed and are, in fact, done with the first write to our table (the ‘head pointer’ ) but are not done with the second write.
But how that can be - notice that in the application, between the passes, I do call ‘glMemoryBarrier(GL_ALL_BARRIER_BITS)’ which, according to the spec, is AFAIK supposed to guarantee that all memory writes of all sorts from Pass1 will be complete when Pass2 begins running?
I have tested this on 3 phones with Adreno 418, Adreno 530 and Mali T760 mobile GPUs onboard and all of them show very similar artefacts, so looks like this is my problem and not some bug in the driver