OpenGL Compute Shaders vs OpenCL
I've been working on computing image histogram using OpenGL compute shaders, but it's very slow. What I do is to divide image into rows between threads and each thread computes the histogram of the respective rows. I use imageLoad() function to read pixels from a image texture.
I tried to measure OpenGL compute shaders performance just to sum up a constant value, but it's still very slow
for (uint i = start; i < end; ++i)
for (uint j = 0; j < 480; ++j)
uint mask = 1;
uvec4 color = uvec4(1);
sum+= color.r + mask;
I want to know if OpenGL compute shaders are running into the OpenGL rendering pipeline or on the CUDA Multiprocessors. Now it seems like the code above runs as slow as a fragment shader code.
On my GTX 460 I have 7 CUDA Multiprocessors/OpenCL compute units running at 1526 Mhz and 336 shader units. It should be possible to execute the above loop extremely fast on a 1526 Multiprocessor, shouldn't it?
Please clarify for me the difference between OpenGL compute shaders and OpenCL. Where do they run? What's the cost of switching between OpenCL and OpenGL?