Sampling from and rendering to the same texture and parallel sorting / hashing

As described in the OpenGL wiki (Framebuffer Object Feedback_Loops)
the result of sampling from and rendering to the same texture (render feedback loop) is undefined.

But if the result in every framebuffer pixel would be the value of one of the fragments which wrote to that pixel and not a combination of values from multiple fragments
this functionality would still be useful for parallel sorting / hashing algorithms implemented in glsl.

For example :
you want to store object ids of objects which are close to each other in the same bucket.
Every bucket is a small fixed range of locations in the draw framebuffer (different per bucket).

Algorithm :
The shader computes per object a target bucket address range
Scan the target buffer for the object id.
If object id not in bucket : write object id to free location in bucket.
If object id in bucket : discard

Execute this shader for maximum-nr-of-object-ids-in-same-bucket and all objects ids are bucket sorted at the end of the procedure.
At least always one of the fragment outputs will end up in the frame buffer, so the algorithm will end.

During the read of a pixel it could be that it will miss writes of other fragments to that location but before the next draw call all writes will have been completed.
So the next call will see the correct state after the previous call has completed and possible some changes due to the current call.

But because binding the same texture as input and output to a shader is not allowed,
you have implement the reading and the writing step in separate shaders.

Are there other technical arguments or performance issues why rendering to and sampling from the same texture
is not allowed ?

If not : would it be possible to define the result of reading the pixel from a texture which forms a render feedback loop to be one of :

  • the value of one of the fragments which write to that pixel
  • the initial value of that pixel (state of pixel after the previous call has completed) ?

Use case :
I am using the bucket sorting procedure to find objects close to each other here :
borbitsoft.com (Demo 1000 balls)
(implemented in the WebGL version of glsl)

I could reduce the draw call count for the broad phase by a factor of 2 if render feedback loops would work as described above.

[QUOTE=TiborDenOuden;1261856]Are there other technical arguments or performance issues why rendering to and sampling from the same texture
is not allowed ?[/QUOTE]

Read: “Why no fully programmable blend?” (and the surrounding articles… consider your algorithm’s read-write ordering with thousands of processors running in parallel.)

If not : would it be possible to define the result of reading the pixel from a texture which forms a render feedback loop…

This has been defined since ~2009, in NV_texture_barrier, which was recently promoted and is a core feature of GL 4.5 (2014).

On the mobile side, see EXT_shader_framebuffer_fetch.

Other options (where you manually manage the read-write ordering barriers) include ARB_shader_image_load_store (~2011), OpenCL (~2009), CUDA, etc.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.