PDA

View Full Version : Closed-loop FBO operation, practical application



CaptainSnugglebottom
10-16-2017, 01:17 PM
Hello,

I was wondering whether there's any 100% answer on the closed-loop FBO operations.

Some people say that closed-loop FBO operations don't work at all, while others say that it is possible as long as the read/write operations are done on different textures (texture attachments).

I tried experimenting with differed shading using single FBO. However, while rendering the pre-shading maps was done just fine, the shading rendering produced some strange results (random I should say). This was done using read of textures used in texture attachments 0-3, outputting into texture from the attachment 4, without any extra FBO operations except for calling step-related glDrawBuffers operation in order to enable specific texture outputs. The shading process is actually disabled for testing purposes, it simply outputs unchanged values from the input color map.

Is there any other step that needs to be done in order to allow that (assuming the question to the 1st question is "yes")?

I've read about the texture barrier, but there are very little actual application examples. Also, I believe it is not applicable to OGL 3.2 (I can switch to 4.5, but so far I had no reason to). I would like to avoid using 2 FBOs with the same textures for ping-ponging, altho it might be the only solution with decent performance.

The result:
2519

Alfonse Reinheart
10-16-2017, 05:23 PM
What is a "closed-loop FBO operation"? Google says nothing about this term, so it's unclear who the hypothetical "some people" and "others" are who talk about them.

If you're talking about FBO feedback loops (ie: rendering to an attached image while reading from it), that is well-covered (https://www.khronos.org/opengl/wiki/Memory_Model#Framebuffer_objects). Under standard GL 3.x rules, you cannot read from any image that is attached to the framebuffer, period. Under NV/ARB_texture_barrier/OpenGL 4.5, things are more relaxed.

Of course, NV_texture_barrier is widely implemented (http://opengl.gpuinfo.org/gl_listreports.php?listreportsbyextension=GL_NV_te xture_barrier), so you probably already have this capability.

CaptainSnugglebottom
10-16-2017, 06:40 PM
Yeah that's exactly it. Closed-loop = feedback operation.


If you're talking about FBO feedback loops (ie: rendering to an attached image while reading from it), that is well-covered. Under standard GL 3.x rules, you cannot read from any image that is attached to the framebuffer, period. Under NV/ARB_texture_barrier/OpenGL 4.5, things are more relaxed.

The page that you provided mentions "Similarly, if you wrote to an image, then want to read the data you wrote, you can issue the barrier instead of having to detach the image. You can use Write Masks or Draw Buffer state to prevent writing while you are reading.". This complements your comment about OpenGL 4.5 being more relaxed.

I am actually using drawBuffers(n, buffs) function between each operation, enabling different texture attachments for each step of my render. I was under the impression that it, by itself, would block writing to the texture I'm reading from (not that I'm doing it). However, since it results in random output, even with OpenGL 4.5 used I was wondering whether there's something else needed.

I need to do a lot more research on NV_texture_barrier before I figure out how to use it, so I would like to avoid it for now.


Of course, NV_texture_barrier is widely implemented, so you probably already have this capability.

Thanks for the link.

GClements
10-16-2017, 07:36 PM
I am actually using drawBuffers(n, buffs) function between each operation, enabling different texture attachments for each step of my render. I was under the impression that it, by itself, would block writing to the texture I'm reading from (not that I'm doing it).

The draw buffer state and write masks prevent the texture from being modified, but the pixels are still deemed to have been modified so far as the memory model is concerned. The implementation is free to discard any cached data for those pixels in all attached textures regardless of draw buffer state and write masks, so reading may yield garbage.

Alfonse Reinheart
10-16-2017, 08:39 PM
I am actually using drawBuffers(n, buffs) function between each operation, enabling different texture attachments for each step of my render. I was under the impression that it, by itself, would block writing to the texture I'm reading from (not that I'm doing it). However, since it results in random output, even with OpenGL 4.5 used I was wondering whether there's something else needed.

Well, there's no way to know what's really going on from just a description, but as stated in the wiki page:


All that it takes to trigger undefined fetches is for the image to be attached, even if you are not rendering to it. So the draw buffers state for the framebuffer is irrelevant. If it is attached to the FBO currently being rendered to, and you try to read from it, you get undefined behavior. Similarly, using Write Masks will also not prevent undefined behavior.

This remains unchanged even with 4.5:


Despite this relaxation of the rules, undefined behavior is still triggered whenever a shader attempts to read from texels written by a prior rendering call (so long as the texture remains a render target).

Visibility of previously written data cannot be achieved unless you actually detach the images/change which FBO is currently being rendered to, or issue a texture barrier.

kRogue
10-17-2017, 04:47 AM
Look to https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_texture_barrier.txt .

Another thing that can give similar functionality (but not using FBOs) is to use and abuse https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_shader_image_load_store.txt

CaptainSnugglebottom
10-17-2017, 07:00 AM
Visibility of previously written data cannot be achieved unless you actually detach the images/change which FBO is currently being rendered to, or issue a texture barrier.


Can glTextureBarrier() be used for the entire image? The wiki page only provides example regarding piece-wise rendering. Should it be called before each draw call, or just before the step (so before first draw call after the shader program has been changed)? Nevermind, it is mentioned, just lower....

How is glTextureBarrier() doing when it comes to performance, is it an improvement compared to FBO binding/unbinding?


Thanks.

Alfonse Reinheart
10-17-2017, 09:01 AM
How is glTextureBarrier() doing when it comes to performance, is it an improvement compared to FBO binding/unbinding?

If it wasn't cheaper than changing the FBO, why would they bother adding it?

CaptainSnugglebottom
10-17-2017, 11:03 AM
If it wasn't cheaper than changing the FBO, why would they bother adding it?


I wasn't even sure if I needed it. Now that I know I do, I wonder about its performance.

kRogue
10-20-2017, 02:16 AM
glTextureBarrier is essentially a cache flush; So essentially that means all render caches are flushed to video memory and the texture caches are invalidated.

Also note that doing glTextureBarrier() means that any given draw call must NOT have any overlapping pixels and if two draw calls overlap in screen space, then a glTextureBarrier needs to be between them.

If one wants to have one's fragment shader read from the surface of a framebuffer -at- the location of its invocation, I (personally) prefer to use GL_ARB_shader_image_load_store together with GL_ARB_fragment_shader_interlock, but that can have negative consequences as well (on some platforms, lossless color buffer compression is disabled on a surface if one access the surface through GL_ARB_shader_image_load_store and the interlock forces ordering in screen space which can be ungood too for performance).

There is also, for GLES, the extension GL_EXT_shader_framebuffer_fetch. If you are using Mesa with Intel hardware, one can enable this in GL (with a different extension name) if one is willing to hack the driver.

CaptainSnugglebottom
10-20-2017, 09:17 AM
glTextureBarrier is essentially a cache flush; So essentially that means all render caches are flushed to video memory and the texture caches are invalidated.

Also note that doing glTextureBarrier() means that any given draw call must NOT have any overlapping pixels and if two draw calls overlap in screen space, then a glTextureBarrier needs to be between them.

If one wants to have one's fragment shader read from the surface of a framebuffer -at- the location of its invocation, I (personally) prefer to use GL_ARB_shader_image_load_store together with GL_ARB_fragment_shader_interlock, but that can have negative consequences as well (on some platforms, lossless color buffer compression is disabled on a surface if one access the surface through GL_ARB_shader_image_load_store and the interlock forces ordering in screen space which can be ungood too for performance).

There is also, for GLES, the extension GL_EXT_shader_framebuffer_fetch. If you are using Mesa with Intel hardware, one can enable this in GL (with a different extension name) if one is willing to hack the driver.

So that basically means that I can't any of the reduced draw call techniques with a single FBO?

Alfonse Reinheart
10-20-2017, 03:32 PM
So that basically means that I can't any of the reduced draw call techniques with a single FBO?

If by "reduced draw call techniques", you're talking about things like instancing, AZDO and the like, sure you can. So long as all of the stuff between barriers never overlaps anything else in that series of draws.

Read/modify/write operations are not cheap. They're not things you should do willy-nilly, and they're usually not something you do . Most such techniques involve full-screen passes, where AZDO techniques are essentially irrelevant.

Also, ARB_fragment_shader_interlock exists for dealing with similar circumstances. But only some hardware supports it.

CaptainSnugglebottom
10-21-2017, 10:15 PM
Thanks for the heads up. I will keep that in mind.