Using spinlock to control image read/writes

I’m working on a fragment shader to read and write to image arrays. So that only one shader invocation can access a given element in the image at once, I’ve implemented a spinlock image to control access to the images. The problem is that the spinlock keeps running into deadlock conditions. Here’s my fragment shader code:


#version 420 core

coherent uniform layout(rgba8) image2DArray ColorImages;
// 0 = element available for read/writes; 1 = element is locked by another invocation
coherent uniform layout(r32ui) uimage2D SpinlockImage;

void main(void)
{
  const ivec2 coords = ivec2(gl_FragCoord.xy);
  // obtain the lock
  while (imageAtomicCompSwap(SpinlockImage, coords, 0, 1) != 0);
  // do something with ColorImages. As a simple example, load value from layer 0 and write to layer 1
  vec4 val = imageLoad(ColorImages, ivec3(coords, 0));
  imageStore(ColorImages, ivec3(coords, 1), val);
  // ensure that image writes are visible to other shader invocations
  memoryBarrier();
  // unlock the spinlock
  imageAtomicExchange(SpinlockImage, coords, 0);
}

My understanding of imageAtomicCompSwap() in the above context is that only one shader invocation will read a value of zero from the image. In the same atomic operation, the image value will be set to 1. This in effect will lock out all other invocations, causing them to spin in the while loop until the invocation that has the lock sets the image value back to 0.

Why would this code cause a deadlock?

When the deadlock happens on my Windows 7 system, I get this message in a window:

“The NVIDIA OpenGL driver lost connection with the display driver due to exceeding the Windows Time-Out limit and is unable to continue.
The application must close.”

I did a little research and the Windows Time-Out limit is two seconds.

I was able to find a fix for this problem. Apparently the Nvidia GLSL compiler (maybe all GLSL compilers do this), optimize away certain parts of the code that it deems unnecessary (especially looping structures). I’m not sure exactly what the compiler was doing but when I changed my code to the following, it worked (notice the use of keepWaiting flag and entire function code block being inside the while loop):


#version 420 core
 
coherent uniform layout(rgba8) image2DArray ColorImages;
// 0 = element available for read/writes; 1 = element is locked by another invocation
coherent uniform layout(r32ui) uimage2D SpinlockImage;
 
void main(void)
{
  const ivec2 coords = ivec2(gl_FragCoord.xy);
  bool keepWaiting = true;
  while (keepWaiting)
  {
    // acquire lock
    if (imageAtomicCompSwap(SpinlockImage, coords, 0, 1) == 0)
    {
      // do something with ColorImages. As a simple example, load value from layer 0 and write to layer 1
      vec4 val = imageLoad(ColorImages, ivec3(coords, 0));
      imageStore(ColorImages, ivec3(coords, 1), val);
      // ensure that texture writes are visible to other shader invocations
      memoryBarrier();
      // release lock
      keepWaiting = false;
      imageAtomicExchange(SpinlockImage, coords, 0);
    }
  }
}

Hopefully this helps with other programmers trying to implement a semaphore to manage resource access.

Thanks for that - I have had problems like this that seem to be resolved only by fooling the optimizer. (Optimizers used to be bad many years ago on cpu compilers - I guess it just takes a while for them to sort out these sort of errors)

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.