GLSL: packing a normal in a single float

I’m using a GL_RGBA_FLOAT32_ATI texture (accessed in a vertex shader), out of which the RGB part is being used, but the A part is free for other uses.
I want to calculate a normal on the CPU, and somehow pack it into that single float (A).
Now, I know I’ve got 32 bits to play with, but because GLSL doesn’t have bitwise operator support, I’m a bit stuck on how I’d extract a 3 byte normal vector from this single 32 bit float.
Any ideas?

Have not tried this personally, but I recently had to pack two values into a 8 bit texture lookup.
(So this is more psudo code than something you can cut and paste)

C++ packing into a float in the 0…1 range

 
//x, y z is your 8 bit normal
// (This assumes the the x,y,z are unsigned chars that have been bias and scaled to the 0..255 range  - like a bump map normal)
uint32 packedColor = (x << 16) | (y << 8) | z;
float packedFloat = ((double)packedColor) / ((double) (1 << 24));
 

GLSL:

 
vec3 normalVector(1.0, 256.0, 65536.0);
normalVector = fract(normalVector * texture.a);

//Unpack to the -1..1 range
normalVector = (normalVector * 2.0) - 1.0;
 

Thanks! I’m having a hard time working out how that code works, or indeed if it works, but I’ll dry run it on paper and see if all becomes clear.
If it does work you’ve just saved me uploading another huge floating point texture every frame!
The restrictions imposed when sampling a texture in a vertex shader are quite harsh, aren’t they?
Not just this ‘floating point formats only’ thing, but also the ‘just point sampling’ thing. For example, I want to use this dynamic normal map in the vertex shader AND in the fragment shader (at a different scale), but because it’s point sampled I have to manually do bilinear filtering in the vertex shader AND in the fragment shader - the only alternative is to create two textures with the same data in them, but with different filtering parameters…which is an incredible waste of bandwidth and memory. If only filtering could be a texture environment parameter in special circumstances…or if only the vertex shader could ignore the filtering mode on a texture rather than throwing a fit if it’s not point sample.

If you do this “packing” you will have to do manual sampling anyway. (Cannot interpolate a packed normal properly)

Anyway I was bored and feeling generious so here is a complete C program to test packing and unpacking of normals:

(the precision is fairly good, but I am willing to bet it could be improved as I wrote this pretty rough)

#include <stdio.h>
#include <math.h>

//Helper method to emulate GLSL
float fract(float value)
{
  return (float)fmod(value, 1.0f);
}

//Helper method to go from a float to packed char
unsigned char ConvertChar(float value)
{
  //Scale and bias
  value = (value + 1.0f) * 0.5f;
  return (unsigned char)(value*255.0f);
}

//Pack 3 values into 1 float
float PackToFloat(unsigned char x, unsigned char y, unsigned char z)
{
  unsigned int packedColor = (x << 16) | (y << 8) | z;
  float packedFloat = (float) ( ((double)packedColor) / ((double) (1 << 24)) );  

  return packedFloat;
}

//UnPack 3 values from 1 float
void UnPackFloat(float src, float &r, float &g, float &b)
{
  r = fract(src);
  g = fract(src * 256.0f);
  b = fract(src * 65536.0f);

  //Unpack to the -1..1 range
  r = (r * 2.0f) - 1.0f;
  g = (g * 2.0f) - 1.0f;
  b = (b * 2.0f) - 1.0f;
}

//Test pack/unpack 3 values
void DoTest(float r, float g, float b)
{
  float outR, outG, outB;

  printf("Testing %f %f %f
",r, g, b);

  //Pack
  float result = PackToFloat(ConvertChar(r), ConvertChar(g), ConvertChar(b));
  
  //Unpack
  UnPackFloat(result, outR, outG, outB);

  printf("Result %f %f %f
",outR, outG, outB); 
  printf("Diff   %f %f %f

",r-outR, g-outG, b-outB); 
}


int main(int argc, char* argv[])
{
  
  DoTest(1.0f,1.0f,1.0f);
  DoTest(0.0f,0.0f,0.0f);
  DoTest(0.5f,0.5f,0.5f);

  DoTest(-1.0f,-1.0f,-1.0f);
  DoTest(-0.5f,-0.5f,-0.5f);

  DoTest(-0.2f,-0.3f,-0.4f);
  DoTest(0.2f,0.3f,0.4f);

  return 0;
}

Results are:

Testing 1.000000 1.000000 1.000000
Result 1.000000 0.999969 0.992188
Diff 0.000000 0.000031 0.007813

Testing 0.000000 0.000000 0.000000
Result -0.003922 -0.003937 -0.00781
Diff 0.003922 0.003937 0.007813

Testing 0.500000 0.500000 0.500000
Result 0.498039 0.498016 0.492188
Diff 0.001961 0.001984 0.007813

Testing -1.000000 -1.000000 -1.0000
Result -1.000000 -1.000000 -1.00000
Diff 0.000000 0.000000 0.000000

Testing -0.500000 -0.500000 -0.5000
Result -0.505882 -0.505890 -0.50781
Diff 0.005882 0.005890 0.007813

Testing -0.200000 -0.300000 -0.4000
Result -0.208212 -0.302368 -0.40625
Diff 0.008212 0.002368 0.006250

Testing 0.200000 0.300000 0.400000
Result 0.200369 0.294495 0.390625
Diff -0.000369 0.005505 0.009375

Thank you so very much, sqrt(-1) !
That works beautifully, and is now nestling in my render code, with a big comment giving thanks to the great sqrt(-1).
I still haven’t thought through how it actually works…I’ll save that for another day when I’m not too busy.

I haven’t tried it out, but wouldn’t packing in the -1…1 range give you twice the (overall) precision of packing in the 0…1 range? Probably a little more math, but might be worth it…

Actually, 32-bit floats can represent any integer up to 255^3. That means any 3-byte integer (including bump map normals) can be passed to the shader as a float without any loss of precision.

Here’s an edited version of sqrt[-1]'s code to do it:

#include <stdio.h>
#include <math.h>

//Helper method to emulate GLSL
float fract(float value)
{
    return (float)fmod(value, 1.0f);
}

//Helper method to go from a float to packed char
unsigned char ConvertChar(float value)
{
    //Scale and bias
    value = (value + 1.0f) * 0.5f;
    return (unsigned char)(value*255.0f);
}

//Pack 3 values into 1 float
float PackToFloat(unsigned char x, unsigned char y, unsigned char z)
{
    unsigned int packedColor = (x << 16) | (y << 8) | z;
    return (float)packedColor;
}

//UnPack 3 values from 1 float
void UnPackFloat(float src, float &r, float &g, float &b)
{
    // Unpack to the 0-255 range
    r = floor(src/65536.0f);
    g = floor(fmod(src, 65536.0f)/256.0f);
    b = fmod(src, 256.0f);

    //Unpack to the -1..1 range
    r = (r/255.0f * 2.0f) - 1.0f;
    g = (g/255.0f * 2.0f) - 1.0f;
    b = (b/255.0f * 2.0f) - 1.0f;
}

//Test pack/unpack 3 values
void DoTest(float r, float g, float b)
{
    float outR, outG, outB;

    printf("Testing %f %f %f
",r, g, b);

    //Pack
    float result = PackToFloat(ConvertChar(r), ConvertChar(g), ConvertChar(b));

    //Unpack
    UnPackFloat(result, outR, outG, outB);

    printf("Result %f %f %f
",outR, outG, outB); 
    printf("Diff   %f %f %f

",r-outR, g-outG, b-outB); 
}


int main(int argc, char* argv[])
{

    DoTest(1.0f,1.0f,1.0f);
    DoTest(0.0f,0.0f,0.0f);
    DoTest(0.5f,0.5f,0.5f);

    DoTest(-1.0f,-1.0f,-1.0f);
    DoTest(-0.5f,-0.5f,-0.5f);

    DoTest(-0.2f,-0.3f,-0.4f);
    DoTest(0.2f,0.3f,0.4f);
    return 0;
}

The results show the same precision errors that you’d get by using a char[3]:
Testing 1.000000 1.000000 1.000000
Result 1.000000 1.000000 1.000000
Diff 0.000000 0.000000 0.000000

Testing 0.000000 0.000000 0.000000
Result -0.003922 -0.003922 -0.003922
Diff 0.003922 0.003922 0.003922

Testing 0.500000 0.500000 0.500000
Result 0.498039 0.498039 0.498039
Diff 0.001961 0.001961 0.001961

Testing -1.000000 -1.000000 -1.000000
Result -1.000000 -1.000000 -1.000000
Diff 0.000000 0.000000 0.000000

Testing -0.500000 -0.500000 -0.500000
Result -0.505882 -0.505882 -0.505882
Diff 0.005882 0.005882 0.005882

Testing -0.200000 -0.300000 -0.400000
Result -0.200000 -0.301961 -0.403922
Diff 0.000000 0.001961 0.003922

Testing 0.200000 0.300000 0.400000
Result 0.200000 0.294118 0.396078
Diff 0.000000 0.005882 0.003922

Yes but keep in mind that my method was designed to be unpacked fast in GLSL.
(as show by the GLSL code above - should compile to 3 ASM instructions MUL FRC MAD)

By a quick look at your code it looks as if it would require quite a few more instructions to unpack.

I´d like to know if it is possible to use this method for storing only 2 components of the normal vector (retrieving the 3rd in the shader) in order to get even more prcision?

Sure you could just store 2 components for extra precision. However, you will also need to store a “sign bit” to multiply by the 3rd generated component. (unless you already know that the component always faces one way)

Thank you this works great for 3 components!

Can it work for 4?

Seemed to zero out the last component for me not sure if this is due to something with how floats work? Here is my code:

	float TestPackUnsignedNormalizedFloat4ToFloat(float* aValues)
	{
		assert(	aValues[0] <= 1.0f && aValues[1] <= 1.0f && aValues[2] <= 1.0f && aValues[3] <= 1.0f &&
			aValues[0] >= 0.0f && aValues[1] >= 0.0f && aValues[2] >= 0.0f && aValues[3] >= 0.0f);

		unsigned long packedColor = (UNFloatConvertChar(aValues[0]) << 24) | (UNFloatConvertChar(aValues[1]) << 16) | (UNFloatConvertChar(aValues[2]) << 8) | UNFloatConvertChar(aValues[3]);

		float packedFloat = (float) ( ((double)packedColor) / ((double) (0xFFFFFFFF)) );  

		return packedFloat;
	}


	//Helper method to emulate GLSL
	float frac(float value)
	{
		return (float)fmod(value, 1.0f);
	}

	//UnPack 3 values from 1 float
	CVec4 UnPackUnsignedFloat(float src)
	{
		return CVec4(		frac(src),
										frac(src * 256.0f),
										frac(src * 65536.0f),
										frac(src * 16777216.0f) );
	}

What would you multiply by in the unpacking code if you just had 2 components?

NM Figured it out I think:

	float TestPackUnsignedNormalizedFloat2ToFloat(float* aValues)
	{
		assert(	aValues[0] <= 1.0f && aValues[1] <= 1.0f &&
						aValues[0] >= 0.0f && aValues[1] >= 0.0f );

		unsigned long packedColor = (UNFloatConvertShort(aValues[0]) << 16) | UNFloatConvertShort(aValues[1]);

		float packedFloat = (float) ( ((double)packedColor) / ((double) (0xFFFFFFFF)) );  

		return packedFloat;
	}


	//Helper method to emulate GLSL
	float frac(float value)
	{
		return (float)fmod(value, 1.0f);
	}

	//UnPack 3 values from 1 float
	CVec2 UnPackUnsignedFloat2(float src)
	{
		return CVec2( frac(src),
									frac(src * 65536.0f) );
	}

Well since this code snipit helped me out so much. I thought I would post another related bit.

Here is the link to my original post in case I add any updates:
http://www.caffeinatedgames.com/profiles/blogs/stop-using-so-much-memory-and

I decided I wanted to pack normals into my GBUFFER as well. It is a 16F buffer and I want to pack the x and y into a single 16 bit float but this time on the GPU so no |. I asked out on twitter and @paveltumik was kind of enough to reply with what should have been obvious looking at all these posts but I think at that point my eyes had glazed over :). Basically I just store one number in the fractional part and one number in the non fractional part. I get two bits of precision which seem to be just fine for my normals.

It assumes:

  1. You are compressing numbers between -1 and 1.
  2. You only care about two digits of precision which is good since I am packing into a 16bit float format
  3. I deal with the 1.0 case by using a range 0.0 - 0.8f.
//Thanks @paveltumik for the original code in comments

//pack:	f1=(f1+1)*0.5; f2=(f2+1)*0.5; res=floor(f1*1000)+f2; 
inline float PackFloat16bit2(float2 src)
{
	return floorf((src.x+1)*0.5f * 100.0f)+((src.y+1)*0.4f);
}

//unpack:	f2=frac(res);	f1=(res-f2)/1000;	f1=(f1-0.5)*2;f2=(f2-0.5)*2;
inline float2 UnPackFloat16bit2(float src)
{
	float2 o;
	float fFrac = frac(src);
	o.y = (fFrac-0.4f)*2.5f;
	o.x = ((src-fFrac)/100.0f-0.5f)*2;
	return o;
}

Also some of the numbers I didn’t combined just to make it more readable. Hope this helps someone else.

Cheers,
Greg

Just thought I would like to point out that if you are only targeting Geforce 5+ cards, you can probably use the Nvidia OpenGL Cg extensions:

pack_2half()
Converts the components of a into a pair of 16-bit floating point values. The two converted components are then packed into a single 32-bit result. This operation can be reversed using the unpack_2half() function.

(to use this, you will probably have to have something at the top of your shader file saying “enable Cg extensions”)

If you are targeting Shader Model 4 and above cards, you can probably just use bit masks directly.

Why are you “packing” floats in a 16-bit float format? Why not just use an 8-bit format and skip the packing? Are you actually getting any more accuracy this way?

The reason for packing is usually not for accuracy, it is typically that you have more data variables than output/input channels.

Eg. Your render target might be RGBA16F (4 components) but you want to store 6 pieces of output data.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.