FBO: Render to texture that is currently bound

jabe · February 7, 2006, 11:28pm

The fbo spec states that this configuration results in an undefinied behaviour.

What i tried is to minimize memory usage for float textures where only one color channel is needed and a ping-pong algorithm is used.

In a fragment shader i read from channel green and write to channel red. In the next ping-pong pass i swap the source and target channels to read from channel red and write to channel green and so on.
In my head it would make sense but not in the context of opengl.
And because single channel textures are not color renderable, i tried it this way.
My card is an ATI Radeon 9600 Pro Mobile and the rendering results are really undefined

Is there an approach that would allow to ping pong between single channel float or even 16bit integer textures?

My next idea is using two 16 bit stencil textures, but i really dont know if it would work. And before i shoot in the dark i would like to know if there is a chance to hit something.

Regards,
Jan

gybe · February 8, 2006, 7:16am

Your solution is very interesting.

I implemented a simple ping-pong method where I sample a texel and I override the same exact texel; for sure it didn’t work with a more complexe texture sampling kernel. However with your algorithm it could work fine.

When you say you write to a channel, do you change the write mask with glColorMask, cause Im very surprise your algorithm is not working.

jabe · February 8, 2006, 8:08am

Thanks…

Writing to a single channel can be done with glColorMask or in a fragment shader with gl_FragCol[targetChannel] where targetChannel is a uniform variable that can be set by the application.

When using the uniform approach you dont have to forget to read all the other channels from the source texture to preserve the already existing values and not to overwrite them.

Nevertheless, it does not work when using a texture as render target and as texture source in the same render batch. At least not on my radeon.
When i came up with this idea i wondered why i did not already read about it, but maybe nobody has thought in this direction because of hardware or api limitations. Reading the spec is not my strength

shelll · February 8, 2006, 8:20am

in my app i read and write from/to the same texel without any problems. but i have gf6600gt. maybe i should test ma app on some radeon…

i have single channel 32-bit float texture.

jabe · February 8, 2006, 8:31am

I have single channel textures, too, but i cannot render to them
Are you using nv_float_buffer with GL_FLOAT_R_NV internal format?

It would be great. if you could test your app on an radeon. I’ve got one…

Zengar · February 8, 2006, 8:35am

I am shure you guys know it, but you still shouldn’t be doing this if you expect your application to be portable

gybe · February 8, 2006, 9:01am

Writing to a single channel can be done with glColorMask or in a fragment shader with gl_FragCol[targetChannel] where targetChannel is a uniform variable that can be set by the application.

Sorry I was thinking about what I would need to do if I want to implement your algorithm with my method. I though you were using the same kind of algorithm. In my case I will need to set the color mask. If I don’t send the channel value for every components, maybe the GPU will just send a default value and override my old value.

Using just one texture as source and destination work fine for me on NVidia and ATI as long as I sample and write the exact same texel.

jabe · February 8, 2006, 9:24am

I am shure you guys know it, but you still shouldn’t be doing this if you expect your application to be portable
Thats why i’m asking whether there are portable ways for my approach or not.

Zengar · February 8, 2006, 8:42pm

Well, you should not render to a texture that is currently bound, as if you said it yourself - such behaviour is undefined. So you have to use two textures. If there is no way to use a single-channel textures, you can try to encode your floating-point-value in a plain 8-bit-per-channel rgba texture. of course, it is just a hack

shelll · February 8, 2006, 11:09pm

it is undefined because fragments are processed in parallel. so when you sample neighbour texels, you are in trouble, because results are unpredictable. but when you sample only te current texel, it should work fine

with this approach i make my own blending. on each channel different equation and even 32bit floating point with good performance normal 32bit floating point blending is done in SW on current HW.

Korval · February 9, 2006, 9:48am

it is undefined because fragments are processed in parallel. so when you sample neighbour texels, you are in trouble, because results are unpredictable. but when you sample only te current texel, it should work fine
Um, no.

Sampling the current texel can fail because two concurrently running shaders can be sampling the same texel. If you have two overlapping triangles in the same primitive, fragments from both can be running on the same pixel. So, one can do a read, then the other does a read, then the first does a write. Because the hardware is designed to write them out in the correct order, the first one gets to write first, but the second one will not necessarily read the data that the first one wrote.

shelll · February 9, 2006, 10:17am

Korval: never thought of that. but in my case, this won’t happen. so i should be safe.

Komat · February 10, 2006, 4:45am

Originally posted by shelll:
Korval: never thought of that. but in my case, this won’t happen. so i should be safe.
You are not safe as long as you are relying on undefined behaviour. Further improvements to hardware may cause that behaviour to change. For example. If your card would have sufficiently big texture caches, it may be possible that your texture will fit entirely into that cache. This is most likely for cards that divide scene into parts and render each part independently like Intel’s or PowerVR cards however with big texture caches it may theoretically happen on ordinary card. If this happens, texels may be not fetched again during next pass which will result in old values being read and your rendering will break.

Also if hardware caches output values in some internal buffer before it writes them into the texture, for example to conserve memory bandwith, you may read invalid values in next pass.

jabe · February 11, 2006, 8:48am

Ok, thanks for the replies. I got my algorithm to work now after struggling with some fbo settings.
Though i know that binding and rendering to the same texture is not well defined, i use this approach.

In one pass i write only to one color channel and read from another. So there should be no data interference problem while reading and writing.

I will test it on different hardware supporting fbo and if it works everywhere, i will assume my fbo configuration is a special case that will not produce undefined behaviour.

shelll · February 12, 2006, 1:48am

how is the blending handled then? fragments are processed in parallel, so what happens, when there are two fragments generated in same window location, one with blending on and second with blending off?

or, the fragments which are processed in parallel are always from the same batch (i.e. glDrawElements and others)? so my silly example can’t occur…