The new opportunity for the parallelization of OpenGL

It is high time to reconsider the parallelization and improvement of OpenGL. Some extraordinary features implemented in new NVIDIA’s GK110 Kepler architecture pave the way for that. The three most important features for the purpose are:

Hyper-Q – enables multiple CPU cores to launch work on a single GPU simultaneously (GK110 allows 32 simultaneous hardware-managed connections);

Dynamic Parallelism – allows the GPU to generate new work for itself, synchronize on results, and control the scheduling of that work via dedicated, accelerated hardware paths, all without involving the CPU;

NVIDIA GPUDirect™ – enables GPUs within a single computer, or GPUs in different servers located across a network, to directly exchange data without needing to go to CPU/system memory. The RDMA feature in GPUDirect allows third party devices such as SSDs, NICs, and IB adapters to directly access memory on multiple GPUs within the same system.

All those features currently target CUDA, but it would be a shame not to significantly improve graphics API either.

Does anyone here have more information about this Hyper-Q? My understanding so far is that multiple contexts can feed the GPU simulaniously (which gives you a advantage if those are feed by different threads/CPUs). This makes sense for massiv parallelization in virtualization (also a new feature of GK110) but would also be interesting for multi-context rendering apps. But I did not yet find documents from NVidia verifying or falsifying my understanding of what is meant by “multiple CPU cores to launch work on a single GPU”. Has anyone a better understanding?

Something can be found here: Page Not Found | NVIDIA

You seem to make a good case… for a series of NVIDIA proprietary extensions. But unless there’s good reason to expect AMD and Intel to follow suit with similar features, I don’t see this being anything more than proprietary extensions.

And there’s nothing wrong with that.

Also, some of those are very un-cross-platform, like direct contact with “third party devices”.

Hello mfort, I know that whitepaper and while it’s very interesting, I’m not quite sure that with “each CUDA stream” (Hyper-Q section) a stream of commands within one context is meant. I’m more into OpenGL than CUDA, so maybe it is clear for people more familiar with CUDA terminology.

If however Hyper-Q works this way I don’t see a reason why a graphics driver shouldn’t use that hardware feature to parallelize the execution of OpenGL or Direct3D commands coming from different contexts to better utilize the GPU. This wouldn’t need any extension at all.

@menzel, I agree. I have the same hopes about having this technology in OpenGL world. I am afraid, it will be one of the Quadro only features. (Just me feeling.)
It also looks like this technology is not finishes yet. The first Kepler board with Hyper-Q should be available in Q4 2012 (see - YouTube time 9:26)