PDA

View Full Version : Floating point cube maps?



wimmer
01-02-2003, 05:07 PM
Floating point textures require NV_texture_rectangle to be used, but those don't support cube maps...

Somehow I though using HDR light-probe cube maps would be one of the more obvious uses for floating point textures, so is this going to be implemented soon or do we have to work around this?

If I have to go back to using HI-LO textures, I can as well do it completely using texture shaders, but I'd like to do some more stuff in the fragment program with the result...

Or should I burn tens of instructions in the fragment program trying to emulate cube map access?

Michael

pbrown
01-02-2003, 05:15 PM
Using NV_fragment_program (or ARB_fragment_program), you can access HILO textures stored in a cube map. TEX fragment program instructions extract the HI and LO components to the x and y components of the destination register.

You are correct that the current NV_float_buffer does not allow the use of floating-point data in cube map textures.

ehart
01-02-2003, 05:34 PM
ATI_texture_format_float completely supports all texture types including cubemaps and 3D textures.

It is supported on the Radeon 9700 and 9500 cards. The one gotch is that you cannot use linear filtering or border colors with these texture formats, or you will be forced back to SW.

-Evan

jwatte
01-02-2003, 07:24 PM
If you have floating point capable hardware, you also have hardware that can bind 16 (sixteen) texture targets. Just stuff the high order 8 bits in one cube map, and the low order in the other. Your "emulation" then ends up being 2 additional instructions (a TEX and a MAD).




PARAM param1 = { .00390625 }; # 1/256

TEX temp1, fragment.texcoord[0], texture[0], CUBE; # low order
TEX temp2, fragment.texcoord[0], texture[1], CUBE; # high order
MAD output, temp1, param1, temp2.xxxx;

wimmer
01-03-2003, 01:03 AM
jwatte, pbrown:

Ok, but with "emulation" I meant emulating cube maps using the fragment program, and this is probably more expensive. Hmmm, back to dual paraboloid maps (cube maps would have been so nice)...

--> oops, this should have beeen "I meant emulating floating point cube-map access using the fragment program"

Those light probes can have quite a high dynamic range, and if possible, I would like to use them as floats, and HILO is 16 bit fixed precision...


ehart:
I might have considered using a Radeon 9700, but I can't remember seeing any documentation about such an extension, or about floating point pbuffers - can you enlighten me here? Also, would they be available in an ARB_fragment_program (I'm using Cg for convenience)?

Michael

[This message has been edited by wimmer (edited 01-03-2003).]

wimmer
01-03-2003, 10:57 AM
So I found the ATI extension specs: http://www.ati.com/developer/atiopengl.pdf

The ATI floating point extensions seem much more practical - no special case texture target, no restrictions on filtering, no fiddling around with texture rects...

It seems that I now have a choice between a limitation of 4 dependent texture reads and fewer instructions (Radeon) or no cube maps (NV30)...

Michael

ehart
01-03-2003, 11:35 AM
Please note that while the extension provides no restriction on filtering, using a filter type of GL_LINEAR on a 9700 will cause it to either operate in SW or silently sample nearest in certain cases.

-Evan

jwatte
01-03-2003, 12:31 PM
I think it makes sense to restrict your requirements to some common level of support for each hardware generation. For example, I think 4 dependent reads is quite sufficient -- note that this is the length of the dependency chain. You can do many more than 4 texture reads, as long as they're not one long dependency chain.

Also, running very long fragment programs don't really perform very well. Suppose you run in 800x600 resolution, and get an overdraw of exactly 1 (once you've done your depth/ambient pass sorted near to far) you'll still invoke your shader 480000 times per frame per light. On 8-pipeline cards that can do 320 million instructions per pipe per second, a 64 instruction shader would limit you to 83 fps for a single light per pixel; 41 fps for two lights per pixel, etc. This is before factoring in the cost of the ambient/depth fill, and any stencil volumes or shadow buffers you may want to do.

Anyway, for me, the choice has been simple: one card is available; the other is not. Once the GeForceFX comes out (and possibly drops < $300) I'll have to update my wife's machine :-)

cass
01-03-2003, 01:24 PM
Originally posted by wimmer:

It seems that I now have a choice between a limitation of 4 dependent texture reads and fewer instructions (Radeon) or no cube maps (NV30)...

Michael

One other approach you might consider with NV30 is using pack/unpack. If you're rendering to a 32-bpp target, you can "unpack" a scalar float into 4 components as the last operation:

UP4 o[COLR], R0.x;

There's an implicit PK4 that happens to convert this back into a 32-bpp quantity when you're rendering to a 32-bpp surface.

Then you can use this texture (with nearest sampling only) by doing:

TEX R0, f[TEX0], TEX0, CUBE;
PK4 R0.x, R0;

The reason this side of it works is that there's an implicit UP4 that happens on a texture read.

Anyway, except for filtering, this works anywhere you could use an RGBA8 texture or frame buffer, so it supports mipmap minification, 3D textures, etc.

Just something you might want to consider...

Thanks -
Cass

davepermen
01-03-2003, 02:54 PM
you can actually store the float in a normal rgba texture, check humus shadowsthatrock demo. that is a simple way, has 24bit precious fixed point then stored in actually.. stores in the range 0..1 (but that is just a scaling factor http://www.opengl.org/discussion_boards/ubb/biggrin.gif)

compression is 2 instructions, and decompression is 1.. or so..

PH
01-03-2003, 06:02 PM
Also, running very long fragment programs don't really perform very well. Suppose you run in 800x600 resolution, and get an overdraw of exactly 1 (once you've done your depth/ambient pass sorted near to far) you'll still invoke your shader 480000 times per frame per light. On 8-pipeline cards that can do 320 million instructions per pipe per second, a 64 instruction shader would limit you to 83 fps for a single light per pixel; 41 fps for two lights per pixel, etc. This is before factoring in the cost of the ambient/depth fill, and any stencil volumes or shadow buffers you may want to do.


I've noticed that using longer programs ( around 35-40 instructions ) to implement lighting calculations rather than than using textures as lookup tables is much more efficient. Of course, shorter is always better but texture lookups ( especially dependent lookups ) really kill performance.

wimmer
01-04-2003, 03:02 AM
cass: so this means if I need an RGB float cube map, I just store it as three "unpacked" RGBA textures? When rendering, I need 6 instructions to get the full floating point RGB color (3 texture lookups, 3 PK4 instructions) instead of one? Well, this sounds like an option...

But then this rules out using the Cg runtime, because there is no PK4 instruction in Cg, I guess...

(btw, the cube map would be a pre-acquired HDR light probe)

davepermen: yes, this is what jwatte suggested. It leaves me without the exponent of a float, but it is of course an option.

BTW, I noticed that arbfp1 generates fewer instructions than fp30, probably due to CMP representing two fp30 instructions...

Michael

simongreen
01-05-2003, 12:41 PM
Another alternative you might want to consider for HDR stuff is Greg Ward's RGBE format.
http://www.graphics.cornell.edu/online/formats/rgbe/

Many lightprobes images are in this format already, and it's trivial to decode in Cg:

// Lookup in RGBE-encoded cube map texture
vec3 texCUBE_RGBE(uniform texobjCUBE tex, vec3 t)
{
float4 rgbe = f4texCUBE(tex, t);
float e = (rgbe[3] * 255) - 128;
e = exp2(e);
return rgbe.xyz * e;
}

You still need to do the filtering yourself, but that's true for floating point textures too.

-S.


[This message has been edited by simongreen (edited 01-05-2003).]