I've been looking into offloading some work in my renderer to CL kernels, but so far this has proven to be futile.
I've come across two problems. The first is that GL_ARB_sync and CL_KHR_gl_event don't appear to be implemented on any hardware, forcing me to synchronize via glFinish and clFinish. Causing a pipeline stall here is just about the worst thing you can possibly do for performance, and this alone kills any practical integration between OpenCL and OpenGL. Is there some secret handshake needed to get this working properly, or are people just conveniently forgetting this when they talk about CL<->GL interop?
The second, stranger, problem is that creating an OpenCL context (and then doing nothing with it) and sharing it with the GL context causes my frame rate to drop from 115 to about 30. This doesn't happen if the GL context is not shared.