OpenGL float -> normalized integer truncation (need to add 0.5/255 to outgoing data?)

RenHoek · July 6, 2015, 3:45am

Hi guys

I maintain a deferred renderer.
And as such we write albedo (as well as other things) to a G-buffer which consists mainly of uint8 RGBA render targets.

So… for instance I may write albedo out like this…


out vec4 gbuffer_albedoAlpha;  // uint8 RGBA target
main()
{
...
vec3 albedo = vec3( 0.2, 0.3, 0.4 );   // for example...
...
gbuffer_albedoAlpha = vec4( albedo, 1.0 );
}

which will get converted to uint8 data by OpenGL automatically.

After reading the openGL spec (section 2.3.4.2 “Conversion from Floating-Point to Normalized Fixed-Point”) I see that the conversion formula is this…

uint8 uintvalue = uint8( clamp( fvalue, 0.0, 1.0 ) * 255.0 );

So this means that OpenGL is not rounding the data to the closest integer. It is simply truncating.
For example…

only 1.0 exactly will ever get mapped to 255
(eg)0.99999 will get mapped to 254

It would be much better if it rounded automatically for us.
eg

uint8 uintvalue = uint8( clamp( fvalue, 0.0, 1.0 ) * 255.0 + 0.5 );  // openGL does NOT do this.  :(

Of interesting note is that the packUnorm4x8 (etc) functions DO perform rounding automatically… sad to have this inconsistency… : (
ie
https://www.opengl.org/registry/specs/ARB/shading_language_packing.txt
“packUnorm4x8 fixed_val = round(clamp(float_val, 0, +1) * 255.0);”

Anyway. To the point…
It seems that in my deferred renderer I should add 0.5/255 to any outgoing data to ensure I get the most accurate result.

So the example code should instead be this…


out vec4 gbuffer_albedoAlpha;  // uint8 RGBA target
main()
{
...
vec3 albedo = vec3( 0.2, 0.3, 0.4 );   // for example...
...
gbuffer_albedoAlpha = vec4( albedo, 1.0 ) + vec4( 0.5/255.0 );   // round up explicitly ourselves
}

Does this seem correct to everyone?
Thanks very much
Ren

Alfonse_Reinheart · July 6, 2015, 6:21am

The conversion from float to fixed point does not state that it clamps (at least, not in any recent OpenGL version). It allows different implementations to decide on the behavior, saying only that it selects one of the two closest binary value. So there’s no guarantee that this code won’t be making the problem worse.

Equally importantly… what’s the point? In the worst case, your colors are off by 1/255. That amounts to ~0.004, or 4 hundredths of a percent. While it’s not insignificant, it’s also not that big of a deal. You’ll likely not notice it, so you’re probably worrying about nothing.

And if it is that big of a deal to you, OpenGL has better ways of resolving that. Rendering to sRGB colorspace images is one way; the error in the transformation is non-linear and makes the question of rounding more or less irrelevant. If you need even more error-free storage, you could render to RGB16F images; obviously this hurts performance, but it’s not nearly as bad as you might think.

RenHoek · July 6, 2015, 3:09pm

Thanks for the reply Alfonse.

You’re right. I was looking at the OpenGL4.4 spec.
Things changed between OpenGL4.4 to OpenGL4.5.

section 2.3.4.2
“…The conversion from a floating-point value f to the corresponding unsigned normalized fixed-point value c is defined by first clamping f to the range [0, 1], then computing f’=f*(2^b-1) f’ is then cast to an unsigned binary integer value with exactly b bits…”

section 2.3.5.2
“…The conversion from a floating-point value f to the corresponding unsigned normalized fixed-point value c is defined by first clamping f to the range [0, 1], then computing f’=convert_float_uint(f*(2^b−1),b) where convert_float_uint(r, b) returns one of the two unsigned binary integer values with exactly b bits which are closest to the floating-point value r (where rounding to nearest is preferred)…”

This is not true. It is significant. With a deferred renderer it is possible to be rendering many light sources. ( eg, not uncommon to be rendering into the 100’s if VPL methods are being used ). So this means a value in the GBuffer can be read, lit and accumulated many times during the course of rendering a frame. Also, some GBuffer values are sensitive to small change (eg)specRoughness. Final colorspace conversions can also amplify the problem. This small value of 1/255 can quickly become significant given everything that happens during the course of rendering a frame.

Not all values in a GBuffer make sense to store in sRGB colorspace. ( eg specRoughness, materialReflectance, etc… )
Maybe I confused things by using albedo as an example sorry.
But it is a good point that using sRGB on some GBuffer values does help precision.

[QUOTE=Alfonse Reinheart;1271858]
If you need even more error-free storage, you could render to RGB16F images; obviously this hurts performance, but it’s not nearly as bad as you might think.[/QUOTE]

Well… with the number of passes we’re doing I have noticed a performance penalty with halfFloats and uint16. This is why I’m experimenting with different storage mechanisms for our parameters (which is what prompted this post to start with ). I have plans to move to tiled-deferred sometime in the future, which should alleviate fill-rate performance issues, but that is far away and will not solve my immediate issue of lighting inaccuracy.

So getting back to the original issue.
1)
So on pre-4.5 spec cards it seems I should indeed perform explicit rounding myself within the shader as suggested in the original post ( for uint8 gbuffer values which do not make sense to be stored in sRGB space )
Is there agreement that my logic is correct there?

On 4.5 spec cards the problem is now worse, because rounding behavior is now not clearly defined. : (
Does anyone have knowledge of which classes of hardware perform rounding and which do not?
Does anyone know of any hardware docs which might clarify this for me?
I guess I could simply create an OpenGL4.4 context and use newer features via extensions… : / but this is not ideal…

Another option is that I use an integer-based render target ( eg GL_R32UI ) and perform the packing myself in the shader.
But this prevents me from performing any filtered sampling…

thoughts?
Thanks a lot
Ren

Alfonse_Reinheart · July 6, 2015, 7:08pm

You’re right. I was looking at the OpenGL4.4 spec.
Things changed between OpenGL4.4 to OpenGL4.5.

Yes, specific language about the conversion was added. But it did not change the overall meaning; or at least, not in the way you think it did.

The exact wording from GL 4.4 is that “f’ 0 is then cast to an unsigned binary integer value with exactly b bits”. The fact that it says “cast” says nothing about rounding vs. clamping. So 4.4 guarantees you nothing about how the conversion is done.

Really, all the GL 4.5 wording does is clean things up a bit. By declaring the conversion to be a “function” of sorts, it implicitly means that the function must behave identically everywhere. The 4.4 wording of “cast” did not make it clear that the cast operation had to work the same everwhere. The 4.5 wording also makes it clear that the only viable values you could get are the two nearest f’. The wording of “cast” didn’t make that explicit.

So no, you should not assume that pre-4.5 implementations always round down.

If you really, truly, absolutely need to ensure rounding behavior, then you need to do the normalization yourself. That means using integer texture formats and doing the normalization as you output it, and when you read it in your shader.

This is not true. It is significant. With a deferred renderer it is possible to be rendering many light sources. ( eg, not uncommon to be rendering into the 100’s if VPL methods are being used ). So this means a value in the GBuffer can be read, lit and accumulated many times during the course of rendering a frame.

Yes, deferred rendering does accumulation. That doesn’t mean that the difference will be significant.

Take the albedo as an example. The maximum possible error for an 8-bit unsigned normalized value if it was clamped is 1/255 or ~0.004. So let’s say that the red channel is off by exactly that: ~0.004. So the computed value was actually 0.004 higher than the value stored value. Now, let’s say there are 1000 lights in the scene, and the light intensity that reaches the point, for all of them, is 1.0 for them. And let’s just multiply the light intensity directly to the albedo, just to make the math as simple as possible.

So, after doing all of the multiplications and additions, the total lighting intensity you get will be 4 units lower than the lighting intensity you should have computed. That sounds significant. But what do you do next?

Well, having done all of this lighting, you now need to employ tone mapping, since the maximum intensity from 1000 lights could be up to 1000 units. If we do tone mapping by a simple division by the maximum intensity, then we divide by 1000, leaving us with an error of… 0.004. If you allow for 10x overbrightening at this point (maximum intensity is 100), then the tone mapping will cause the error to still be only 0.04. You would have to use lower maximum intensities to make the error get to even 5% of the maximum value.

Even if you use a different tone mapping algorithm, the error gets tone-mapped right along with it. And thus, while the error will seem to have a high absolute value, after tone-mapping, the significance of it will be proportional to your tone mapping.

Oh, and let’s not forget that 0.004 is the maximum error; the average error will be 0.5/255 or ~0.002. So it will be even less significant.

To be sure, other kinds of terms can have different error characteristics. If you’re using an exponential specular “shininess”, obviously the error will vary with the exponent. But generally speaking, no matter how many lights accumulate into the value, tone mapping will render the absolute error relatively insignificant.

If you want evidence of this, just try it. Do what I suggested about normalizing the values manually. See whether you can tell the difference between flooring and rounding normalization, in a scene with hundreds of lights.

As long as you are maintaining precision during your lighting passes, then you should be fine.

Well… with the number of passes we’re doing I have noticed a performance penalty with halfFloats and uint16. This is why I’m experimenting with different storage mechanisms for our parameters (which is what prompted this post to start with ).

You could also use GL_RGB10_A2. That makes the average error even more insignificant: 0.5/1024 or ~0.0005. Or you can use GL_RG16F if you have a few values that truly need higher-precision; you shouldn’t lose much performance from using that. Even GL_R11F_G11F_B10F is an option, if the values have a large(ish) range, but don’t need a lot of mantissa precision.

But again, you shouldn’t be trying these things unless you can actually see the difference in precision, not just if you think it could be a problem.

Another option is that I use an integer-based render target ( eg GL_R32UI ) and perform the packing myself in the shader.
But this prevents me from performing any filtered sampling…

You should not be doing filtering in your lighting passes. The neighboring texels in your map do not necessarily correspond to neighboring fragments from the same object. So blending with them is decidedly inappropriate.

RenHoek · July 6, 2015, 8:29pm

Ahhh ok.
This makes it more clear (and awkward) for me. : (
But great to know. Thanks!

Yes! good points.
This makes sense for the albedo channel and the many-lights issue you’re right.
I think it was a bad of me using albedo as an example, as it has skewed the conversation somewhat. (also, as you say the quantity of lights is not such a big deal after all )

But we have other parameters which are much more sensitive to small inaccuracies. ( eg hair parameters, specReflectancePower, etc… )
And given that we need to keep parity to our offline renderer, we’re seeing noticeable differences due to quantization.

It may be that I need to increase the number of bits I assign these sensitive parameters, but I want to at least have our quantization predictable.

Yes good idea. I’ll try some of this and see how we go.

Ha! I wish it were this easy. We also use our GBuffer information for impostoring (which does require filtering)
But instead of using our GBuffer data directly, I might convert the data out of our HQ GBuffer textures and into some more LQ textures instead.
This is because once something is in impostor form then it is no longer expected to be 100% accurate to the original material.
And also, we have memory considerations at this point too… : /

So it seems I have two options

try some of the other formats. ( eg GL_RGB10_A2, GL_R11F_G11F_B10F etc… )
use an integer buffer and do the packing/unpacking myself.

Thanks a lot for your feedback Alfonse!
Ren