You practically hijacked the topic because this isn’t really related to the original question, but I’d still try to answer you, because you brought up an interesting topic.
So here are my answers:
It is possible that there is a completely different compiler behind OpenGL compute shaders and OpenCL kernels.
Also, not all compute capabilities present in OpenCL are available in OpenGL.
Not to mention that the way synchronization is handled in OpenCL is wildly different than that of OpenGL.
Finally, it may even vary from hardware to hardware.
All of these are options. Personally, for read-only data I’d prefer using texture fetches, simply because on some hardware might have a different path for storage buffers or load/store images as those are R/W data sources.
Also, there could be a difference between storage buffer and load/store image implementations as well, as the later has a fixed element size while the former doesn’t really have the definition of an element at all, thus especially dynamic indexing could result in different performance in the two cases.
Another thing is that storage buffers, image buffers and texture buffers access linear memory, while other images and textures usually access tiled memory, thus there can be a huge difference in performance because of this as well.
No, why would you? Unless you plan to use data written by the compute shader through image stores, storage buffer writes or atomic counter writes you don’t have to. The memory barrier rules are the same as before.
Also note that while calling glMemoryBarrier is not free and people are afraid of its performance, don’t think that other write-to-read hazards, like those in case of framebuffer writes or transform feedback writes are free, just they are implicit, no additional API call is done, but still might happen behind the scenes, which is even worse than the new mechanism, as here at least the app developer has explicit control over whether he needs sync or not.
Maybe on some hardware, maybe not on others. Fragment shaders are kind of different than compute shaders. They are instantiated by the rasterizer which means that the granularity (work group size in compute shader terminology) might be different. Compute shaders provide more explicit behavior. If you specify a work group size of 16x16, you are guaranteed that those will be on the same compute unit as they may share memory, while the number fragment shader instances that are issued on a single compute unit and which fragments they processed is determined by the rasterizer and can vary wildly between different GPUs.
Also, the individual shader instances might be submitted in a different pattern to the actual ALUs for compute shaders and fragment shaders thus access to various types of resources (linear or tiled) also result in different patterns, thus one can be worse than the other. But all this depends on the GPU design, the type of the resource you access and the access pattern of your shader.
A benefit of using fragment shaders is that you can use framebuffer writes to output data which is almost guaranteed to be faster than writing storage buffers or performing image writes. You can even perform limited atomic read-modify-write when doing framebuffer writes thanks to blending, color logic op and stencil operations.
Finally, note that GPUs don’t rasterize quads, thus if you do compute with a fragment shader you actually rendering two triangles, which means across the diagonal edge where the two triangles meet, on some hardware, you might end up having half-full groups of shaders being executed on a compute unit which will on its own already result in a slight drop in overall performance.
To sum it up, there is no general answer whether OpenCL is better than OpenGL compute shaders, or that OpenGL compute shaders are better than fragment shaders. It all depends on the hardware, driver, your shader code, and the problem you want to solve.
What I can suggest based on what I’ve heard from developers though is that if you want to do some compute stuff in a graphics application that already uses OpenGL, you better of not using OpenCL GL interop as it seems that the interop performance is usually pretty bad, independent of GPU generation or vendor.