GL_NV_texture_barrier on ATI

I’m having problems getting the NV_texture_barrier extension working on my ATI Radeon HD 5570 (Catalyst 10.12, Win7x64). I tried to do the following:

  1. Create an FBO and bind it to a 2D integer texture rectangle whose contents have been randomly generated.

GLuint texture,fbo;
glGenFramebuffers(1,&fbo);
glBindFramebuffer(GL_FRAMEBUFFER,fbo);
glGenTextures(1,&texture);
unsigned short data[512*512] = {0};
for (int i=0;i<512*512;i++){
	data[i]  = rand();	
	data[i] = (rand()%2)<<15 + data[i];
}
	
glTextureImage2DEXT(texture,GL_TEXTURE_RECTANGLE,0,GL_R16UI,512,512,0,GL_RED_INTEGER,GL_UNSIGNED_SHORT,data);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_RECTANGLE, texture, 0);


2)Proceed to render a colored rectangle to the FBO, but attempt to read the contents of the fbo/rendertarget and make changes to the final color. The fragment shader that I use looks like this:


#version 330
in vec4 fcolor;
out uint FragColor;
uniform usampler2DRect rt_texture;
uint ConvertFloat4ToR16(const in vec4 color)
{
	ivec4 bias = ivec4(color * 31.0f);
	uint r = (bias.b<<10)|(bias.g<<5)|(bias.r);
	return r;
}
void main()
{
	uint srccol =  ConvertFloat4ToR16(fcolor);
	uint destcol = texelFetch( rt_texture, ivec2(gl_FragCoord.xy)).r;
	FragColor = destcol+srccol;
}

According to the specification of NV_texture_barrier, texelFetch2DRect could be used, but I get the following shader compiler errors:

ERROR: 0:14: error(#202) No matching overloaded function found texelFetch2DRect
ERROR: 0:14: error(#160) Cannot convert from ‘const float’ to ‘unsigned int’

thus, I use texelFetch instead. However, destcol is always zero when I test its value. Am I misunderstanding the texture barrier spec or is this simply a bug in ATI’s implementation?

2)Proceed to render a colored rectangle to the FBO, but attempt to read the contents of the fbo/rendertarget and make changes to the final color. The fragment shader that I use looks like this:

You seem to misunderstand what texture barrier is for.

It does not allow a shader to read and write to the same location in a texture at the same time. It isn’t some kind of magical panacea, where the mere presence of the extension allows shaders to do blending and multipass.

What it allows you to do is read and write to different locations of the texture at the same time. It allows you to do ping-ponging without having to change FBOs or texture bindings.

So the idea is that you allocate a render surface that is 2x the size you need, and use the viewport to adjust where you’re rendering to. You render to one location while reading from a different one. Then, when you need to swap them, you make the texture barrier call.

OK, thanks for the clarification. On reading the spec, it seemed to imply that you can read the destination of a pixel before writing to it:

  • There is only a single read and write of each texel, and the read is in
    the fragment shader invocation that writes the same texel (e.g. using
    “texelFetch2D(sampler, ivec2(gl_FragCoord.xy), 0);”).

It does not allow a shader to read and write to the same location in a texture at the same time. It isn’t some kind of magical panacea, where the mere presence of the extension allows shaders to do blending and multipass.

That is not true. NV_texture_barrier enables three new cases when rendering to the currently bound texture is allowed:

  • There is only a single read and write of each texel, and the read is in the fragment shader invocation that writes the same texel (e.g. using “texelFetch2D(sampler, ivec2(gl_FragCoord.xy), 0);”).

Also, issue #1 talks explicitly about NV_texture_barrier enabling limited programmable blending:

This can be used to accomplish a limited form of programmable blending for applications where a single Draw call does not self-intersect, by binding the same texture as both render target and texture and applying blending operations in the fragment shader.

OK, I got it working. There was a bug in my code: I had called

glBindMultiTextureEXT(GL_TEXTURE0,GL_TEXTURE_RECTANGLE,texture);

after rendering into the texture, rather than before. Thus the spec works as expected. Thanks for the feedback.