The next hardware generation is not going to provide such kind of feature … I was really expecting it for few reasons:
- I’m annoyed by fixed blending setup.
- I saw on it a possible single pass deferred shading.
- I expect some post processing effects like blur to be done at bleed stage or in few passes.
- It could involve massive memory band wise saving.
However, we get a new player “OpenCL” which could provide a lot on those topics.
First, some definitions. I call a “sample” a fragment that passed the tests. A fragment is a sample candidate if you want. Then I call “sampling” all the depth, stencil, etc. that discard fragments.
Here is an UML drawing to show how I see the thing. I simplify it as most as possible to keep on the blend stage. If you fell it’s a personal interpretation of the hardware, it’s exactly what I was expecting as far as there is no real true on this topic and all hardware are not blend programmable capable (I believe that some are however ;)): .
The question: Is that insane? I don’t really know where should go the multisampling resolution on this model … maybe before the “sample shader” to match the current way of doing deferred shading. However, following the OpenGL specification, it is supposed to be after if it actually about a “sample shader”. Anyway this is done in a really specific manner on every processors and multisampling remains a tricky one on deferred engine.
For the “I expect some post processing effects like blur to be done” issue the feature would require texture access of binded render target why not with a limited offset range. The main issue with this would be that we have to be sure that all the fragments are actually processed before the blending stage. This is really not obviously the case and for quite some hardware could be a limitation. I don’t really know but for the other stages it never really matter that all data get processed. This seams especially a limitation for immediate rendering devices but maybe not the case for tiled rendering devices.
For the deferred shading, we could expect to no use render target textures but just varying variables which would be the values we would have usually write in the render targets. However, we would probably need to wait until every triangle gets processed to generate a sample. On tiled rendering GPUs it wouldn’t be an issue, on direct render GPU the amount of memory required will involved to save all the data on the graphics card memory like it’s done with render targets so we loss all the benefit in band wise saving of this method.
Furthermore: CUDA and OpenCL. With CUDA there is not direct way to use an OpenGL texture, you have to use PBO for render to buffer to then access to the buffer … It’s really not convenient and I don’t believe in any gain from the current two fragment passes way especially because we lose GPU 2D image cache. Fortunately OpenCL is a lot better than expected (by me at least) and we can directly access to 2D images so why not compute the lighting pass with OpenCL? For some reasons it fell that the way OpenCL is specified make this “sample shader” even more interesting, some kind of convergence.
Just let me know your thoughts about this!