Finally, OpenGL 4.3 is out, that has Compute Shaders.
Finally, OpenGL 4.3 is out, that has Compute Shaders.
Now the only thing missing is the rasterizer stage exposed in OpenCL![]()
I don't want to disappoint you guys, but Compute Shaders are far away from both CUDA and OpenCL.
In a case you've missed the overview section of the spec, please read the following:
The last remark is crucial for me to stop further delving into the spec of CS, since GLSL precision is to "relaxed" for serious computation.Another difference is that OpenCL is more full featured and includes features such as multiple devices, asynchronous queues and strict IEEE semantics for floating point operations. This extension follows the semantics of OpenGL - implicitly synchronous, in-order operation with single-device, single queue logical architecture and somewhat more relaxed numerical precision requirements.
From what I've read of Nvidia's compute and graphics contexts, this is the price you pay for avoiding the context switch between compute and graphics. A compute shader seems to be adequate for effects like depth of field, which the extension appears to be targeting. In these cases, it's quick and convenient, and offers good performance.
For shaders where precision is an issue, you're pretty much stuck with the context switch -- at least until hardware reaches the point where compute and graphics can be run simultaneously on different processors on the same GPU (if ever). Then it might hide some of the latency cost of context switching.
May I ask for the reference? The precision depends on the way GLSL is implemented, not on the contexts.
One again, the precision problem is in GLSL implementation. It uses hardware accelerated functions that have finite precision much worse than their CPU counterparts. But the speed is tremendous. For example, trigonometric functions execute in a single clock. In fact, both sine and cosine you can get in the single clock. By the way, NVIDIA has a much better precision than AMD. But this is not the place and time for the discussion about the implementation.
Do you mean about usage of Hyper-Q? I'm not quite sure whether or not it can be used in mixing graphics and calculation. Do you have any reference that claims such possibility?Originally Posted by menzel
The point he's making is that the OpenCL specification requires a certain level of precision that the GLSL specification does not. Therefore, you cannot rely on getting OpenCL-level precision from GLSL code.One again, the precision problem is in GLSL implementation.
It is a matter of specification, not merely implementation. Because the specification defines what the implementation can do. The reason AMD gets away with lower precision sin/cos is precisely because the specification lets them. Currently, the spec says that the precision on trig functions is undefined. If it had specific limits on precision... well, odds are good that AMD would veto any such proposal, but failing that they would have to improve the precision on their sin/cos functions (in theory, of course. In practice, there's no conformance test, so there's no way to know for certain whether they're implementing the precision guarantees).
A loose specification leads to a lot of variation and lower precision. A tight specification does not. OpenCL is tighter than GLSL with regard to precision.
Yes, as I understand it, you can run multiple contexts from different applications concurrently. I would expect that mixing graphics and GPGPU was possible, however I have no hard proof that this is possible.
I don't see why running two graphics contexts from different programs should be doable but a graphics and a GPGPU context of the same program shouldn't.
Certainly:May I ask for the reference? The precision depends on the way GLSL is implemented, not on the contexts.
http://techreport.com/articles.x/17670/2
"Better scheduling, faster switching", near the end of that section
http://www.nvidia.com/content/PDF/fe...d_Graphics.pdf (pdf warning)
Page Figure 5 caption (page 9)
New Concurrency for Global Kernels (page 14,15)
These are for Fermi, I'm not sure what Kepler brings to the table. Fermi must flush caches when switching between compute & graphics tasks, and cannot run them at the same time. It's stated it can do a switch in as little as 25 microseconds, but that doesn't take into account the fact that you'll get tons of cache misses right after the switch, and probably some idling processers near the end of a compute/graphics task, right before the switch.
And yes, the precision issue itself has to do with OpenGL vs. OpenCL, so it's possible we could see a GL_ARB_shader_high_precision extension at some point. Then you wouldn't have to trade off performance (in the form of a context switch penalty) for precision like you do now, so the compute shader could do more heavy lifting without needing to jump to OpenCL/CUDA as often. Alternatively, future hardware could make this penalty so minimal that it becomes a non-issue, in which case a compute shader just becomes a programming convenience.
My point is that there are applications where precision isn't an issue that would benefit from the lack of a context switch, and this is what compute shaders currently appear to be designed for.
Thanks for the articles, malexander!
I didn't say CS are not useful, and certainly could be faster because of execution in the same context with graphics API. I just said that CS cannot be substitution for CUDA/CL in general.
I've been using VS/TF for calculation for a long time. The precision problems I've solved by implementing my own functions for critical operations and by passing more parameters to shaders.
When heard about CS in GLSL I thought it could save some gymnastics in my code, but... Who knows why it is better to stay as it is...![]()