Floating point mipmapping performance

I’m experimenting with floating point mipmapping on a nv6800 GT.

Enabling automatic mipmap generation for unsigned byte textures usually results in a 50 to 100% performance overhead.

When using GL_RGBA_FLOAT16_ATI however, enabling the automatic mipmap generation is +/- 10 times slower.

Is this performance drop to be expected for floating point textures (because they are rather new and not widely used), or can I do something to overcome this problem?

Greetz,

Nico

Just to clarify, are you saying you get a performance drop by just having auto-gen mipmapping enabled when rendering. (ie not subloading/loading data)

I always though there was no over head(when rendering) as it should only kick in when loading texture data?

It happens when I’m loading data to level 0 of the floating point texture. I wrote a small testapp that allocates a floating point texture and then repeatedly loads some texture data. When I turn on automatic mipmap generation it is +/- 10 times slower for floating point textures compared to no mipmap generation.
If I change the internalformat to rgba8, it is +/- 2 times slower compared to disabling automatic mipmap generation.

I’m using the GPU for general purpose computation to implement an iterative algorithm which requires many texture loads/subloads to set the output of a previous rendering/iteration as input to the next. This requires many texture loads depending on the number of iterations. I also need the floating point textures for numerical stability.

Nico

I just tried a simple nvidia example to demontrate the problem.

http://download.nvidia.com/developer/SDK/Individual_Samples/DEMOS/OpenGL/simple_fp16_blend.zip

adding:
glTexParameteri(target, GL_GENERATE_MIPMAP_SGIS, GL_TRUE);

to the create_texture function results in very low FPS.

I also tried removing the:
#define USE_FP16 1

so that it doesn’t use floating point.
It still results in the same performance drop. Am I doing something wrong here?

Its my first tryout with pbuffers. In the past i used copytexsubimage with the backbuffer, but right now i really need the floating point precision.

Nico

That’s strange. It should take twice as long to generate FP16 mipmaps, because FP16 bilinear filtering rate is exactly half of the RGBA8 filtering rate on NV40.

If the mipmaps are generated using the CPU however, such a performance drop is very likely because of the lack of native FP16 support.