Blending bug with uintBitsToFloat

I’m running into a situation where I’m packing a specific bit pattern into a channel of a 32-bit float texture. When I go to retrieve this value in another shader, I’m finding that the bit pattern has changed. I’ve narrowed the condition down to whether or not blending is enabled. My problem is that I need blending enabled, but I need my bit pattern preserved. If my destination drawbuffer has 0, and my source texel has my bit pattern, and I’ve set up glBlendFunc(GL_ONE, GL_ZERO), I would like glBlendEquation(GL_FUNC_ADD) to preserve my bit pattern, but this is not what I am seeing.

Below is a minimal program to reproduce the problem, source files and shaders:

blend_bug.cpp

Compile the program like this:

g++ blend_bug.cpp -std=c++11 -g -Wall -O0 -lGL -lSDL -lGLEW -lGLU -o blend_bug

And run it like this:

To get correct behavior (green screen):

./blend_bug

To get incorrect behavior (red screen):

./blend_bug 1

I’ve tried to make this easy to reproduce. Any help is greatly appreciated!

System config:
Linux x86-64
Geforce GTX 680
NVIDIA Driver 352.30 (latest)
OpenGL 4.5

Fundamentally, you can’t expect this to work for all possible float values you might write.

Because in math, 1*N = N. But in computers, 1.0 * denorm = 0, depending on your hardware.

…in other words, the GL spec does not require preserving all possible floating point bit patterns.

As a possible workaround, you could write instead to a 32-bit integer render target, which will implicitly bypass blending (for that render target only, if you use MRT), and use float-int casts in the shaders reading and writing the data (at the high expense of manual filtering, if you need that when reading…)

I’m curious as to why you need blending enabled, when your blend function/equation is just “take the source”.

Even if you’re blending with a different function/equation for a separate output, just [i]turn off blending[/i] on the floating-point buffer. You can enable/disable blending on a per-buffer basis, just like you can set the blend function/equation on a per-buffer basis.

It’s also the case that the section titled “Invariance” in the GL spec typically contains the warning:

The OpenGL specification is not pixel exact

If you’re trying to do something that depends on OpenGL being what it is not, then maybe OpenGL is not the right tool for you.

So, working from the 4.5 core spec, Appendix A, section A3, we read that changes to the blend parameters (other than enable) are in the “strongly suggested” rather than the “required” category for state changes that have no side effects. Enabling or disabling blend is in neither category. This means that glBlendEquation (GL_ADD) with glBlendFunc (GL_ONE, GL_ZERO) is allowed to produce different results to just disabling blend, and the GL implementation remains conformant. So you don’t have a bug, you’ve got expected behaviour and you’ll have to find a workaround (as suggested - are you totally sure you can’t just disable blend?)

Thanks for the responses guys

@Alfonse, @mhagain: The “no-op” blending setup in the sample files were just meant to illustrate how a mostly-disabled blending config is still affecting the bits. In my application, I need blending to aggregate the deferred lighting passes into RGB, while writing 0 to A. Channel A contains 3 packed integer values written from the pass that draws the geometry, and I don’t want it to be modified by the deferred lighting passes, which is why I write 0 in those passes. The reason I’m keeping the output color and the blurring values in one texture is so a post-process shader makes fewer texture samples to do a few different kinds of blurring (DoF, velocity). So I can’t disable blending entirely, because I need the lighting passes to aggregate light contribution. Unfortunately, I can’t enable it only for a specific buffer either, since the output buffer needs blending for RGB, but not A.

What’s also interesting is, and I didn’t put this in the sample for simplicity, but if I turn off color writing with glColorMask for the channel with my bit pattern, the problem still persists. I guess this is still conformant with the invariance rules you pointed me towards @mhagain. As a user though, it is unexpected that the blending would affect a color I had masked off from writing altogether.

It seems like I’m kind of screwed here. It seems like maybe my optimization of packing values to do fewer texture samples is too aggressive if I need blending.

In my application, I need blending to aggregate the deferred lighting passes into RGB, while writing 0 to A. Channel A contains 3 packed integer values written from the pass that draws the geometry, and I don’t want it to be modified by the deferred lighting passes, which is why I write 0 in those passes. The reason I’m keeping the output color and the blurring values in one texture is so a post-process shader makes fewer texture samples to do a few different kinds of blurring (DoF, velocity).

That would imply that you’re rendering your colors into a 32-bit floating-point texture. That’s usually overkill, even for HDR colors. RGBA16F tends to be sufficient for those needs. Indeed, some have managed to get away with R11F_G11F_B10F with acceptable results.

If you use 16-bit floats (or less), then you’ll find that you can easily afford an additional R32UI texture. See, it’s not (just) the number of samples that’s the issue; it’s the bandwidth. By halving your color bandwidth from 128-bits to 64-bits, you should be able to maintain performance even with an additional fetch, since you’ve reduced your overall bandwidth from 128-bits to 96 (64 + 32 for the integers). This can be even better if you can manage to use the smaller format.

As a user though, it is unexpected that the blending would affect a color I had masked off from writing altogether.

Unexpected, perhaps. But not unreasonable.

Blending requires a read/modify/write of the pixel. And since each component’s data is stored right next to each other, it has to fetch all of the components, no matter what color mask is in play. Most importantly, the same is true of writing the pixel. So if you only mask off some of the components, the masked components will still have to be read, “modified”, and written.

During the reading and writing process, things can still theoretically happen.

That would imply that you’re rendering your colors into a 32-bit floating-point texture. That’s usually overkill, even for HDR colors. RGBA16F tends to be sufficient for those needs. Indeed, some have managed to get away with R11F_G11F_B10F with acceptable results.

If you use 16-bit floats (or less), then you’ll find that you can easily afford an additional R32UI texture. See, it’s not (just) the number of samples that’s the issue; it’s the bandwidth. By halving your color bandwidth from 128-bits to 64-bits, you should be able to maintain performance even with an additional fetch, since you’ve reduced your overall bandwidth from 128-bits to 96 (64 + 32 for the integers). This can be even better if you can manage to use the smaller format.

Very cool, I’ll give that a shot. I’m in the process of optimizing my GBuffer bandwidth too, so shuffling the formats around is perfect timing. I was rendering to 6 4-channel 32-bit textures in my draw pass, and performance was getting to be poor. I tried switching them all to 16-bit textures for fun, and saw my rendering time decrease by 20%. According to this (28.2.1 Raster Operations), which is in line with what you’re saying, it’s a bandwidth issue.