I have a fragment shader in which I need to guarantee exclusive access to some memory locations for one thread. As this is not just one operation for which I could use an atomic operation, I need to lock a part of my code. Currently I tried the following pattern (simplified):

Code :
bool keepWaiting = true;
int MAX_TRY = 30;
int try = 0;
while (keepWaiting && try < MAX_TRY) {
    if (imageAtomicExchange( mutex, coord, 1u) == 0u) {
        keepWaiting = false;
        imageAtomicExchange( mutex, coord, 0u);
    } else {

My problem with this is the following: As doWork() needs some time (basically a couple of image store operations), the waiting threads which run on another warp/wavefront can burn thru there tries very quickly and just give up. If I increase MAX_TRY to counter this, the performance drops drastically as each try will need one expensive atomic memory access. If all threads fighting for the same lock would run on the same warp I wouldn't have this problem, sadly, this is not always the case.

Now my question is, are there better suited pattern for this?