OpenCL interop performance

I’ve been looking into offloading some work in my renderer to CL kernels, but so far this has proven to be futile.

I’ve come across two problems. The first is that GL_ARB_sync and CL_KHR_gl_event don’t appear to be implemented on any hardware, forcing me to synchronize via glFinish and clFinish. Causing a pipeline stall here is just about the worst thing you can possibly do for performance, and this alone kills any practical integration between OpenCL and OpenGL. Is there some secret handshake needed to get this working properly, or are people just conveniently forgetting this when they talk about CL<->GL interop?

The second, stranger, problem is that creating an OpenCL context (and then doing nothing with it) and sharing it with the GL context causes my frame rate to drop from 115 to about 30. This doesn’t happen if the GL context is not shared.

After further investigation, it would seem that creating a texture and attaching it to an FBO while sharing the GL context with CL, causes apparently all operations on both the CPU and GPU to slow down until that texture is deleted.

Edit: Just updated to the latest AMD beta drivers (Catalyst 12.6 Beta), and the problem appears to have gone away. So it was a driver bug.

I hit this a few years back when I was stuck with OpenCL 1.0. Fortunately I didn’t need to swap between GL and CL much so it wasn’t completely prohibitive, just bothersome.

OpenCL 1.1 drivers are out now that I would have thought would have it. However, checking on that here on NVidia’s latest public beta drivers (302.07b) by running oclDeviceQuery, I see:


  CL_PLATFORM_VERSION:     OpenCL 1.1 CUDA 4.2.1
 OpenCL SDK Revision:     7027912
...
  CL_DEVICE_NAME:             GeForce GTX 560 Ti
  CL_DEVICE_VENDOR:             NVIDIA Corporation
  CL_DRIVER_VERSION:             302.07
...
  CL_DEVICE_EXTENSIONS:            cl_khr_byte_addressable_store
                    cl_khr_icd
                    cl_khr_gl_sharing
                    cl_nv_compiler_options
                    cl_nv_device_attribute_query
                    cl_nv_pragma_unroll
                    cl_khr_global_int32_base_atomics
                    cl_khr_global_int32_extended_atomics
                    cl_khr_local_int32_base_atomics
                    cl_khr_local_int32_extended_atomics
                    cl_khr_fp64

Nope, still no cl_khr_gl_event on the CL side (and no ARB_cl_event on the GL side). Wonder what the hold-up on this is? The lack of these extensions discourages heavy use of OpenGL and OpenCL on the same GPU.

As some consolation, several nice features added to OpenGL in recent years have reduced the cases where you’d otherwise have needed to resort to OpenCL (that’d actually make an excellent SIGGRAPH/GDC course – showing how a few classically GPGPU graphics-related techniques can be mapped to OpenCL and GLSL 4.2 and the pros/cons of each).

Just tracked down why I thought cl_khr_gl_event / ARB_cl_event was an OpenCL 1.1 feature:

See pg. 36-37. Sure sounds like that’s paired with OpenCL 1.1. And if you check the OpenCL 1.1 Specification, you find (see pg. 332-336) that this functionality is in there, but it’s described as an optional extension.

Double-checking, if I dump the exported OpenCL symbols in the NVidia OpenCL 1.1 API, I find no clCreateEventFromGLsyncKHR. Alas, no help there…