ssbo/image load store woes
I'm implementing OIT /w linked lists and I have some really crazy artifacts. On some (majority) of the fragments i get exactly what is expect of this algorithm. On a few of them i'm getting some kind of flickering, which i suppose comes from bad list formation.
I am sure i build/bind/use the ssbo/atomic counter/image buffer correctly from cpu code. Double checked them.
I'm drawing the scene like this (i'm using a single vbo for drawscene to make this as contained as possible, general self occluding geometry):
Pass1 creates the fragments and the list heads for each pixel.
glMemoryBarrier(GL_ALL_BARRIER_BITS); //overkill, just shader_store | image_access should be sufficient
Pass2 reads the heads of each pixel and gets all the fragments in the list
Pretty much basic stuff, classic OIT.
Pass 1 (relevant) code:
layout(binding = 2) uniform atomic_uint atomicbuffer;
layout(binding = 0, r32ui) coherent uniform uimage2D imageBuffer;
layout(std430, binding = 1) buffer FragmentBuffer
float alpha =0.3;
vec3 color = someinnocentbrdf();
// get list head for this pixel
uint head = imageLoad(imageBuffer, ivec2(gl_FragCoord.xy)).x; // image is initialized to 1200*700*4+1 <- magic value that says end of list, guaranteed to no be reached by counter
//get fragment number
uint counter = atomicCounterIncrement(atomicbuffer); // starts from 0 each frame
//write some data for this fragment
fragments[counter].color = color;
fragments[counter].alpha = alpha;
fragments[counter].depth = gl_FragCoord.z;
uint oneovermax = 1200*700*4+1; // ignore obvious nonsense if, read paragraph after code section, 1200*700*4+1 is magic value for end of list
if(oneovermax==head) fragments[counter].nextFragment = oneovermax;
else fragments[counter].nextFragment = head;
//fragments[counter].nextFragment = head; // !!always crashes video driver !!
//store new head for list
imageAtomicExchange(imageBuffer, ivec2(gl_FragCoord.xy), counter);
memoryBarrier(); //is this really necessary? Shouldn't the atomic exchange suffice? Just added it for good measure,will be cleaned in production code.
Now with option 1 it seems to work. With option 2 i get a guaranteed driver crash (on a Nvidia GTX460M, latest driver). I'm totally puzzled. They should be doing absolutelly the same thing.
I manually checked for a possible memory overflow and i am nowhere near the edge of the buffer. (using like 20% of Alha buffer memory)
In pass 2 i just walk the list .. nothing to write home about.
So my problems are:
- flickering on some pixels
- crash when writing like option 2
I suppose i miss something on the topic of synchronization (only thing that can be causing bad ordering AND apparently a deadlock). Any ideas? No code needed just a "it's this way it should be done" or "rtfm @lineX paragraphY" will suffice.