I’ve stumbled upon some strange behaviour when playing with compute shaders. I have a basic OIT implementation using linked lists (although my guess is any application which uses a fairly large local array in a shader will exhibit the following effect). In a fragment shader while rendering a full screen quad, a vec4 array is filled with fragment data for that pixel from main video memory, sorted, then alpha blended. Using the local array and sorting it is the bottleneck of the whole app. If I then run a compute shader, which declares a large local array, just once, I get a speedup of ~40% in OIT for the remainder of the application’s life. Restarting the app brings back the normal speed. Sorting, or some expensive operation, is necessary to notice speedup. The compute shader I use is below. Note the “random” uniform which is always zero, but required to stop the array being compiled away. myBigArray must be sufficiently large or the speedup is not observed.
I have a GTX 670, 313.18 drivers. Same thing happens on a 660, I’ve tried with a few other drivers too.
#version 430
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
uniform int random;
layout(rgba8) uniform image2D someTexture;
void main()
{
vec4 myBigArray[256];
myBigArray[0] = vec4(1,0,0,1);
imageStore(someTexture, ivec2(0), myBigArray[random]);
}
Again, if I run the above compute shader calling glDispatchCompute(1, 1, 1) once (even binding zero to the image unit), other shaders using large local arrays speed up for the remainder of the application. It must be a compute shader - vertex or fragment shaders do not trigger this.
I can only guess some state change is triggered by using a compute shader with a big array, enabling an optimization which then gets applied to other shaders too.
Has anyone else noticed this? Can you speculate on a cause?