OpenGL and OpenCL running simultaneously

Hello after a long absence :slight_smile:

I can’t find the answer to this anywhere, so I’m posting here. Maybe I should register on OpenCL forums for that, but it seems pointles to create another account for one question.

Let’s assume I have an application that renders some stuff “as fast as possible” and is also using OpenCL to do physics “in realtime”, at some constant rate (let’s say 500 FPS). I wonder how GPU time / compute units would be allocated in that case and if physics would really work “in realtime” this way.

So, a particular case:
CPU issues glDrawElements with a lot of polygons to render, and some more commands after that. GPU has allready started executing that glDrawElements command and of course has more pending commands in the queue.

Now another thread on the CPU tries to execute a short OpenCL program.

What will happen?

  1. OpenCL will have to wait until glDrawElements is finished?
  2. OpenCL command will be scheduled for execution after all commands currently in OpenGL queue have been executed?
  3. OpenCL will kick-in immediately putting OpenGL on hold?

and

A. OpenCL will take some compute units letting OpenGL run parallelly
B. OpenCL will take all compute units

Of course OpenCL is independent from OpenGL, so I don’t see why scenario 2 should be true. I assume scenario 1A or 1B is most likely but perhaps driver can split GPU job at lower level effectively allowing OpenCL to kick-in in the middle of glDrawElements execution?

I am not experienced but there is an asyn mode in GPU that can allow this. It also depends upon the load on gpu. For fully loaded multiprocessors, scheduling would take place and scenario 2 will happen.

I’m no OpenCL guru, but may be able to give you some key search terms to read on further.

This discussion assumes you want to use the same GPU for both GL and CL rendering (or in CL terms, your CL device is the one associated with your GL context). If different GPU, this doesn’t apply.

IIRC, in the absense of ARB_cl_event / cl_khr_gl_event, you have to glFinish() when swapping the device from GL to CL usage, and use a clFinish() when swapping from CL to GL usage. This flushes the command queue so the device is only doing one or the other at a time. The event extensions provide a more efficient way of sharing the GPU without a full CPU-coordinated pipeline flush, and may or may not result in CL and GL running exclusively on the GPU (appears that is up to the implementation).

(This from memory/experiences months ago adding some OpenCL code for GPU crunching to our renderer, so that each frame both OpenGL and OpenCL are used for GPU crunching. At that time, ARB_cl_event / cl_khr_gl_event weren’t available, so I did explicit finishes, but it was still fast because I only had 1 GL->CL and 1 GL->CL switch per frame)

Some follow-up reading:

This discussion assumes you want to use the same GPU for both GL and CL rendering

Precisely.

I only had 1 GL->CL and 1 GL->CL switch per frame

Yeah, that’s understandable, but in the case I described OpenCL is running at different “framerate” than OpenGL.

Assume both threads are unrelated and share no buffers. Graphics is rendered at some framerate using OpenGL that is being handled by one CPU core, and physics runs at constant, high framerate using OpenCL and being controlled by second CPU core.

My question is all about influence of rendering on the ability to maintain constant framerate in OpenCL.

My case study is car simulation. Tire is a deformable object made of many small fragments. This seems like a good stuff to simulate using GPU (just like cloth simulation). The problem is - tires must produce force feedback effects on steering wheel.
Just imagine a car driving 200km/h on rumble strip - you need decent framerate in your physics to produce realistic FFB feeling. You also need some assurance that physics will not be put on hold for long when GPU is busy rendering.

So it’s not about synchronizing OpenGL with OpenCL - they do independent tasks anyway and don’t need to exchange any data, so I actually don’t want them to be synchronized.

We could even assume that rendering is done using Direct3D and physics is done using OpenCL. It doesn’t matter. What I’m interested in, is how will GPU react to being “bombarded” with two different tasks (rendering and computing) from two different application threads that don’t really care about each other. Will my NVIDIA/ATI drivers give me illusion of parallelism, or will it feel like rendering puts computing on hold and vice versa.

Wouldn’t it work like a normal CPU? Everything gets scheduled unless there is multiple cores.

Everything gets scheduled unless there is multiple cores.

There are multiple cores; that’s where GPUs get their performance from these days. They’re hugely multicore. Good GPUs can have 16+ cores, each of which is capable of 16-20 floating-point operations per cycle (not necessarily different operations, mind you).

The question is how drivers go about distributing the load for compute vs. rendering. The answer is… unknown. It’s going to depend on the driver. There is no way to know without just trying it, and even then, it can change on you.

Yeah, that is the essence of the question.
I haven’t found any information on this subject on the net so that’s what I assumed - “unknown / driver dependent”. Although this was just my assumption, so I asked here, hoping that someone actually knows it’s this way.

Thaks everyone for all the answers. Any additional info is of course welcome, but that more less confirms what I’ve been thinking :slight_smile:

current GPUs are not preemtive. so you can’t switch from one task to another. you can lock up screen refresh with long running OpenCL kernels. IMHO driver just sheldule commands for GPU so OpenCL computation can kick in any time during rendering. also currently OpenCL take whole GPU. there is cl_ext_device_fission which can divide OpenCL devices http://www.khronos.org/registry/cl/extensions/ext/cl_ext_device_fission.txt but it is currently supported only on CPU devices.

Did you mean “can’t”? :slight_smile:

one glDraw* call is most likely uninterruptible as it run as any other kernel. but i don’t think that opencl must wait for glFinish or swapbuffers and opencl kernel can by sheduled by driver in middle of frame rendering.

To my knowledge from some discussions with Nvidia engineers, switching from GL to CL/CUDA and vice versa will flush and stall the respective pipeline. There currently is no overlapping of GL and compute :p. One could use Nsight or GPUView in single cases to verify that…

Yes, but it depends on if the “cores” are able to operate independently (collisions aside) or not. Just clarifying my point. It’s an interesting topic.

Seems like hardware is tailored to the consumer application, unless it is exotic, in which case it was probably commissioned for some industrial application.

There currently is no overlapping of GL and compute.

On NVIDIA.

Seems like hardware is tailored to the consumer application

You mean consumer GPUs are tailored… for consumers? That’s crazy!

^Well, as opposed to optimized for consumer applications (ie. GPUs architectures are not (yet) as general purpose as a hacker/experimenter might like; applications are narrowly defined)

I ended up on this (http://en.wikipedia.org/wiki/Raster_Operations) page somehow today. It might be relevant to this inquiry.

It seems to suggest (my reading) that graphics hardware tends to be tightly coupled but is moving in the other direction over time opening up possibilities for more fluid scheduling.