Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 102 of 173 FirstFirst ... 25292100101102103104112152 ... LastLast
Results 1,011 to 1,020 of 1724

Thread: OpenGL 3 Updates

  1. #1011
    Member Regular Contributor
    Join Date
    Apr 2004
    Location
    UK
    Posts
    420

    Re: OpenGL 3 Updates

    Right, so DX11 gets compute shaders as an extension to regular; as far as I can tell they aren't talking about "doing it all via compute shaders".

    In addition to a normal API, well no skin off my nose tbh (although don't expect it on anything pre-DX10 hardware wise), but it's the idea of 'hey! lets forget this is a graphics API and go GPGPU mad!' which is just dumb right now.

    The reason the user can't write 'generic C++ code with a few intrinsics' is that while GPUs are probably now more complex than a CPU they still don't have the same range of functionality. Also, C++ would be a dumb choice just because it's a pain in the arse language at the best of times. If it was going to be done it would really want to be a new language, although it'll probably be C-syntax based as for some reason the industry loves their C-syntax o.O
    (*hugs Lua*)

  2. #1012
    Junior Member Regular Contributor
    Join Date
    Aug 2007
    Location
    USA
    Posts
    243

    Re: OpenGL 3 Updates

    I wonder how a compute shader differs from stream out (transform feedback in GL).

  3. #1013
    Senior Member OpenGL Pro Zengar's Avatar
    Join Date
    Sep 2001
    Location
    Germany
    Posts
    1,932

    Re: OpenGL 3 Updates

    It probably has a scatter ability.

  4. #1014
    Junior Member Regular Contributor
    Join Date
    Aug 2007
    Location
    USA
    Posts
    243

    Re: OpenGL 3 Updates

    That would be quite awesome, Zengar. There are definitely some exciting times ahead of us.

  5. #1015
    Member Regular Contributor
    Join Date
    Apr 2004
    Location
    UK
    Posts
    420

    Re: OpenGL 3 Updates

    Hmmm, i wonder if scatter is still the performance shafter it was; GPUs and GDDR being designed for a certain write pattern which scatter totally fails at.

  6. #1016
    Senior Member OpenGL Guru knackered's Avatar
    Join Date
    Aug 2001
    Location
    UK
    Posts
    2,833

    Re: OpenGL 3 Updates

    isn't a GPU with 'scatter ability' basically a CPU?
    surely it's this inherent limitation that makes them so fast..

  7. #1017
    Senior Member OpenGL Pro Ilian Dinev's Avatar
    Join Date
    Jan 2008
    Location
    Watford, UK
    Posts
    1,290

    Re: OpenGL 3 Updates

    Well, if you have a nice big cache, then it's not much harder on the gddr bus to do than alphablending with 2-3 source-textures AFx16, imho. The problem would be when 100 shaders scatter data to the same pixel at once.

  8. #1018
    Member Regular Contributor
    Join Date
    Apr 2004
    Location
    UK
    Posts
    420

    Re: OpenGL 3 Updates

    The thing is alphablending a src texel to the frame buffer and reading 2 or 3 textures with AFx16 are two different things.

    The alphablending relies on the ROPs and some readback, however due to how coherant GPUs render you are still reading in blocks.

    The source textures also tend to be coherantly read in blocks and then use some nice large cache to store them in for other SPs to read from as required.

    But, the key point is, both operations can be done in blocks (although they cause their own contention issues which is part of why the XB360s GPU has that 10meg daughter board so that you weren't reading and writing from memroy at once).

    Now, while there is some caching on output this is pretty much to allow block writes; when you start scattering your block writing scheme goes out the window as the GPU can't just cache stuff all over the place as it doesn't know when it will be able to write back.

    The R500 from ATI could do scatter write, but the CTM docs indicated you would take a speed hit because the writes would be (iirc) uncached .

  9. #1019
    Junior Member Regular Contributor
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    163

    Re: OpenGL 3 Updates

    It will certainly be interesting to see what functionality is found in the DX11 compute shader. My hope is that SM5 adds something like memexport type functionality which is accessible from all shaders (somewhat like texture fetch is universal, excluding dx/dy, in SM4).

    As for the fixed function graphics hardware, a very GPGPU way to look at raster / triangle setup is as a very fast way to group threads into SIMD vectors for computation. Effectively serves a very important job setup function which would otherwise be very slow in "software". I still prefer GL/SM4 over CUDA and the like for this very reason. I think it is safe to say that the core fixed function hardware will stay as hardware for a long time (excluding Larrabee of course).

    Somewhat in the way that instancing has enabled GPUs to draw a huge number of objects, this GPGPU functionality is going to enable some of us to do even more scene traversal work GPU side, eventually to the point where we can manage level of detail and occlusion GPU side and solve the problem of huge view distances.

    Now if only GL catches up to DX in terms of supporting the functionality which has been in GPUs for some time now.

  10. #1020
    Junior Member Regular Contributor
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    163

    Re: OpenGL 3 Updates

    Quote Originally Posted by bobvodka
    Now, while there is some caching on output this is pretty much to allow block writes; when you start scattering your block writing scheme goes out the window as the GPU can't just cache stuff all over the place as it doesn't know when it will be able to write back.

    The R500 from ATI could do scatter write, but the CTM docs indicated you would take a speed hit because the writes would be (iirc) uncached .
    As for scattered reads/writes, can be ok as long as the granularity is similar to the bus width. Scattering FP32 vec4's (16 bytes) is much more bandwidth efficient (4x) than scattering INT8 vec4s. If your device memory granularity is 32 bytes, scatter of FP32 vec4's eats up double the bandwidth as the ideal non-scattered case. Which isn't as bad as 8x the bandwidth taken by a scattered INT8 vec4 write. The same goes for performance of random texture reads as well, texture cache hits will drop off quite considerably reading randomly from compressed textures...

    Of course you still have to be able to hide the latency of scatter reads, which is exactly what the GPU is designed for, and why it has such an advantage in solving problems which cannot fit into a cache.

    So in the case of general scatter/gather on both GPUs (and CPUs for that matter) to be efficient you need to think in terms of scatter/gather on full (bus width sized) "objects" instead of values. Then take and "transpose" between AOS (object) and SOA (vectorized object) to do efficient computation. FYI, this is exactly the point of CUDA's shared memory, fetch into shared memory in bus width sized chunks, then swizzle into a vector friendly format for computation.

    Sure is going to get interesting to see how this "compute shader" makes this work well on all platforms...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •