PDA

View Full Version : OpenGL and OpenCL running simultaneously



k_szczech
01-18-2012, 03:36 PM
Hello after a long absence :)

I can't find the answer to this anywhere, so I'm posting here. Maybe I should register on OpenCL forums for that, but it seems pointles to create another account for one question.


Let's assume I have an application that renders some stuff "as fast as possible" and is also using OpenCL to do physics "in realtime", at some constant rate (let's say 500 FPS). I wonder how GPU time / compute units would be allocated in that case and if physics would really work "in realtime" this way.

So, a particular case:
CPU issues glDrawElements with a lot of polygons to render, and some more commands after that. GPU has allready started executing that glDrawElements command and of course has more pending commands in the queue.

Now another thread on the CPU tries to execute a short OpenCL program.

What will happen?

1. OpenCL will have to wait until glDrawElements is finished?
2. OpenCL command will be scheduled for execution after all commands currently in OpenGL queue have been executed?
3. OpenCL will kick-in immediately putting OpenGL on hold?

and

A. OpenCL will take some compute units letting OpenGL run parallelly
B. OpenCL will take all compute units


Of course OpenCL is independent from OpenGL, so I don't see why scenario 2 should be true. I assume scenario 1A or 1B is most likely but perhaps driver can split GPU job at lower level effectively allowing OpenCL to kick-in in the middle of glDrawElements execution?

awhig
01-18-2012, 04:02 PM
I am not experienced but there is an asyn mode in GPU that can allow this. It also depends upon the load on gpu. For fully loaded multiprocessors, scheduling would take place and scenario 2 will happen.

Dark Photon
01-19-2012, 05:36 AM
So, a particular case:
CPU issues glDrawElements with a lot of polygons to render, and some more commands after that. GPU has allready started executing that glDrawElements command and of course has more pending commands in the queue.

Now another thread on the CPU tries to execute a short OpenCL program.

What will happen?
I'm no OpenCL guru, but may be able to give you some key search terms to read on further.

This discussion assumes you want to use the same GPU for both GL and CL rendering (or in CL terms, your CL device is the one associated with your GL context). If different GPU, this doesn't apply.

IIRC, in the absense of ARB_cl_event (http://www.opengl.org/registry/specs/ARB/cl_event.txt) / cl_khr_gl_event (http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/gl_event.html), you have to glFinish() when swapping the device from GL to CL usage, and use a clFinish() when swapping from CL to GL usage. This flushes the command queue so the device is only doing one or the other at a time. The event extensions provide a more efficient way of sharing the GPU without a full CPU-coordinated pipeline flush, and may or may not result in CL and GL running exclusively on the GPU (appears that is up to the implementation).

(This from memory/experiences months ago adding some OpenCL code for GPU crunching to our renderer, so that each frame both OpenGL and OpenCL are used for GPU crunching. At that time, ARB_cl_event (http://www.opengl.org/registry/specs/ARB/cl_event.txt) / cl_khr_gl_event (http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/gl_event.html) weren't available, so I did explicit finishes, but it was still fast because I only had 1 GL->CL and 1 GL->CL switch per frame)

Some follow-up reading:
* ARB_cl_event (http://www.opengl.org/registry/specs/ARB/cl_event.txt)
* cl_khr_gl_event (http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/gl_event.html)
* OpenCL Programming Guide (http://books.google.com/books?id=M-Sve_KItQwC&pg=PA349&lpg=PA349&dq=ARB_cl_event+cl_k hr_gl_event&source=bl&ots=cIPoUtYii6&sig=7AOeJK-cT3ZQuBHv6_1BVzVBGuE&hl=en&sa=X&ei=Nh4YT8yHDoLn0QH 4odnOCw&ved=0CD4Q6AEwBA) (see section on "Synchronization between OpenGL and OpenCL)
* OpenCL Forums (http://www.khronos.org/message_boards/viewforum.php?f=27)

k_szczech
01-19-2012, 06:33 AM
This discussion assumes you want to use the same GPU for both GL and CL rendering
Precisely.


I only had 1 GL->CL and 1 GL->CL switch per frame
Yeah, that's understandable, but in the case I described OpenCL is running at different "framerate" than OpenGL.

Assume both threads are unrelated and share no buffers. Graphics is rendered at some framerate using OpenGL that is being handled by one CPU core, and physics runs at constant, high framerate using OpenCL and being controlled by second CPU core.

My question is all about influence of rendering on the ability to maintain constant framerate in OpenCL.


My case study is car simulation. Tire is a deformable object made of many small fragments. This seems like a good stuff to simulate using GPU (just like cloth simulation). The problem is - tires must produce force feedback effects on steering wheel.
Just imagine a car driving 200km/h on rumble strip - you need decent framerate in your physics to produce realistic FFB feeling. You also need some assurance that physics will not be put on hold for long when GPU is busy rendering.

So it's not about synchronizing OpenGL with OpenCL - they do independent tasks anyway and don't need to exchange any data, so I actually don't want them to be synchronized.

We could even assume that rendering is done using Direct3D and physics is done using OpenCL. It doesn't matter. What I'm interested in, is how will GPU react to being "bombarded" with two different tasks (rendering and computing) from two different application threads that don't really care about each other. Will my NVIDIA/ATI drivers give me illusion of parallelism, or will it feel like rendering puts computing on hold and vice versa.

michagl
01-19-2012, 06:42 PM
Wouldn't it work like a normal CPU? Everything gets scheduled unless there is multiple cores.

Alfonse Reinheart
01-19-2012, 07:40 PM
Everything gets scheduled unless there is multiple cores.

There are multiple cores; that's where GPUs get their performance from these days. They're hugely multicore. Good GPUs can have 16+ cores, each of which is capable of 16-20 floating-point operations per cycle (not necessarily different operations, mind you).

The question is how drivers go about distributing the load for compute vs. rendering. The answer is... unknown. It's going to depend on the driver. There is no way to know without just trying it, and even then, it can change on you.

k_szczech
01-19-2012, 11:00 PM
The question is how drivers go about distributing the load for compute vs. rendering. The answer is... unknown. It's going to depend on the driver.
Yeah, that is the essence of the question.
I haven't found any information on this subject on the net so that's what I assumed - "unknown / driver dependent". Although this was just my assumption, so I asked here, hoping that someone actually knows it's this way.

Thaks everyone for all the answers. Any additional info is of course welcome, but that more less confirms what I've been thinking :)

bugmenot
01-20-2012, 12:55 AM
current GPUs are not preemtive. so you can't switch from one task to another. you can lock up screen refresh with long running OpenCL kernels. IMHO driver just sheldule commands for GPU so OpenCL computation can kick in any time during rendering. also currently OpenCL take whole GPU. there is cl_ext_device_fission which can divide OpenCL devices http://www.khronos.org/registry/cl/extensions/ext/cl_ext_device_fission.txt but it is currently supported only on CPU devices.

k_szczech
01-20-2012, 01:06 AM
IMHO driver just sheldule commands for GPU so OpenCL computation can kick in any time during rendering.
Did you mean "can't"? :)

bugmenot
01-20-2012, 11:07 AM
one glDraw* call is most likely uninterruptible as it run as any other kernel. but i don't think that opencl must wait for glFinish or swapbuffers and opencl kernel can by sheduled by driver in middle of frame rendering.

Chris Lux
01-22-2012, 11:30 PM
To my knowledge from some discussions with Nvidia engineers, switching from GL to CL/CUDA and vice versa will flush and stall the respective pipeline. There currently is no overlapping of GL and compute :p. One could use Nsight or GPUView in single cases to verify that....

michagl
01-23-2012, 04:25 AM
Everything gets scheduled unless there is multiple cores.

There are multiple cores; that's where GPUs get their performance from these days. They're hugely multicore. Good GPUs can have 16+ cores, each of which is capable of 16-20 floating-point operations per cycle (not necessarily different operations, mind you).

The question is how drivers go about distributing the load for compute vs. rendering. The answer is... unknown. It's going to depend on the driver. There is no way to know without just trying it, and even then, it can change on you.

Yes, but it depends on if the "cores" are able to operate independently (collisions aside) or not. Just clarifying my point. It's an interesting topic.

Seems like hardware is tailored to the consumer application, unless it is exotic, in which case it was probably commissioned for some industrial application.

Alfonse Reinheart
01-23-2012, 06:59 AM
There currently is no overlapping of GL and compute.

On NVIDIA.


Seems like hardware is tailored to the consumer application

You mean consumer GPUs are tailored... for consumers? That's crazy!

michagl
01-25-2012, 10:29 PM
^Well, as opposed to optimized for consumer applications (ie. GPUs architectures are not (yet) as general purpose as a hacker/experimenter might like; applications are narrowly defined)

michagl
01-27-2012, 11:03 AM
I ended up on this (http://en.wikipedia.org/wiki/Raster_Operations) page somehow today. It might be relevant to this inquiry.

It seems to suggest (my reading) that graphics hardware tends to be tightly coupled but is moving in the other direction over time opening up possibilities for more fluid scheduling.