Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 2 of 2 FirstFirst 12
Results 11 to 20 of 20

Thread: Compute Shaders in OpenGL

  1. #11
    Junior Member Newbie
    Join Date
    Jun 2011
    Location
    Erlangen, Germany
    Posts
    13
    Finally, OpenGL 4.3 is out, that has Compute Shaders.

  2. #12
    Member Regular Contributor
    Join Date
    Apr 2009
    Posts
    258
    Now the only thing missing is the rasterizer stage exposed in OpenCL

  3. #13
    Advanced Member Frequent Contributor Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    949
    I don't want to disappoint you guys, but Compute Shaders are far away from both CUDA and OpenCL.
    In a case you've missed the overview section of the spec, please read the following:
    Another difference is that OpenCL is more full featured and includes features such as multiple devices, asynchronous queues and strict IEEE semantics for floating point operations. This extension follows the semantics of OpenGL - implicitly synchronous, in-order operation with single-device, single queue logical architecture and somewhat more relaxed numerical precision requirements.
    The last remark is crucial for me to stop further delving into the spec of CS, since GLSL precision is to "relaxed" for serious computation.

  4. #14
    Junior Member Regular Contributor malexander's Avatar
    Join Date
    Aug 2009
    Location
    Ontario
    Posts
    249
    From what I've read of Nvidia's compute and graphics contexts, this is the price you pay for avoiding the context switch between compute and graphics. A compute shader seems to be adequate for effects like depth of field, which the extension appears to be targeting. In these cases, it's quick and convenient, and offers good performance.

    For shaders where precision is an issue, you're pretty much stuck with the context switch -- at least until hardware reaches the point where compute and graphics can be run simultaneously on different processors on the same GPU (if ever). Then it might hide some of the latency cost of context switching.

  5. #15
    Member Regular Contributor
    Join Date
    Jan 2012
    Location
    Germany
    Posts
    302
    Quote Originally Posted by malexander View Post
    at least until hardware reaches the point where compute and graphics can be run simultaneously on different processors on the same GPU (if ever).
    AFAIK Kepler can do that.

  6. #16
    Advanced Member Frequent Contributor Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    949
    Quote Originally Posted by malexander View Post
    From what I've read of Nvidia's compute and graphics contexts, this is the price you pay for avoiding the context switch between compute and graphics.
    May I ask for the reference? The precision depends on the way GLSL is implemented, not on the contexts.

    Quote Originally Posted by malexander View Post
    For shaders where precision is an issue, you're pretty much stuck with the context switch -- at least until hardware reaches the point where compute and graphics can be run simultaneously on different processors on the same GPU (if ever). Then it might hide some of the latency cost of context switching.
    One again, the precision problem is in GLSL implementation. It uses hardware accelerated functions that have finite precision much worse than their CPU counterparts. But the speed is tremendous. For example, trigonometric functions execute in a single clock. In fact, both sine and cosine you can get in the single clock. By the way, NVIDIA has a much better precision than AMD. But this is not the place and time for the discussion about the implementation.

    Quote Originally Posted by menzel
    AFAIK Kepler can do that.
    Do you mean about usage of Hyper-Q? I'm not quite sure whether or not it can be used in mixing graphics and calculation. Do you have any reference that claims such possibility?

  7. #17
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,728
    One again, the precision problem is in GLSL implementation.
    The point he's making is that the OpenCL specification requires a certain level of precision that the GLSL specification does not. Therefore, you cannot rely on getting OpenCL-level precision from GLSL code.

    It is a matter of specification, not merely implementation. Because the specification defines what the implementation can do. The reason AMD gets away with lower precision sin/cos is precisely because the specification lets them. Currently, the spec says that the precision on trig functions is undefined. If it had specific limits on precision... well, odds are good that AMD would veto any such proposal, but failing that they would have to improve the precision on their sin/cos functions (in theory, of course. In practice, there's no conformance test, so there's no way to know for certain whether they're implementing the precision guarantees).

    A loose specification leads to a lot of variation and lower precision. A tight specification does not. OpenCL is tighter than GLSL with regard to precision.

  8. #18
    Member Regular Contributor
    Join Date
    Jan 2012
    Location
    Germany
    Posts
    302
    Quote Originally Posted by Aleksandar View Post
    Do you mean about usage of Hyper-Q? I'm not quite sure whether or not it can be used in mixing graphics and calculation. Do you have any reference that claims such possibility?
    Yes, as I understand it, you can run multiple contexts from different applications concurrently. I would expect that mixing graphics and GPGPU was possible, however I have no hard proof that this is possible.
    I don't see why running two graphics contexts from different programs should be doable but a graphics and a GPGPU context of the same program shouldn't.

  9. #19
    Junior Member Regular Contributor malexander's Avatar
    Join Date
    Aug 2009
    Location
    Ontario
    Posts
    249
    May I ask for the reference? The precision depends on the way GLSL is implemented, not on the contexts.
    Certainly:

    http://techreport.com/articles.x/17670/2
    "Better scheduling, faster switching", near the end of that section

    http://www.nvidia.com/content/PDF/fe...d_Graphics.pdf (pdf warning)
    Page Figure 5 caption (page 9)
    New Concurrency for Global Kernels (page 14,15)

    These are for Fermi, I'm not sure what Kepler brings to the table. Fermi must flush caches when switching between compute & graphics tasks, and cannot run them at the same time. It's stated it can do a switch in as little as 25 microseconds, but that doesn't take into account the fact that you'll get tons of cache misses right after the switch, and probably some idling processers near the end of a compute/graphics task, right before the switch.

    And yes, the precision issue itself has to do with OpenGL vs. OpenCL, so it's possible we could see a GL_ARB_shader_high_precision extension at some point. Then you wouldn't have to trade off performance (in the form of a context switch penalty) for precision like you do now, so the compute shader could do more heavy lifting without needing to jump to OpenCL/CUDA as often. Alternatively, future hardware could make this penalty so minimal that it becomes a non-issue, in which case a compute shader just becomes a programming convenience.

    My point is that there are applications where precision isn't an issue that would benefit from the lack of a context switch, and this is what compute shaders currently appear to be designed for.

  10. #20
    Advanced Member Frequent Contributor Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    949
    Thanks for the articles, malexander!

    I didn't say CS are not useful, and certainly could be faster because of execution in the same context with graphics API. I just said that CS cannot be substitution for CUDA/CL in general.
    I've been using VS/TF for calculation for a long time. The precision problems I've solved by implementing my own functions for critical operations and by passing more parameters to shaders.
    When heard about CS in GLSL I thought it could save some gymnastics in my code, but... Who knows why it is better to stay as it is...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •