Mipmapped glCopyTexImage2D() s-l-o-w. why?

I am using Nvidia GeForce hardware under Linux. I don’t have easy access to any other system configuration, so this might be Nvidia/Linux specific, but I don’t know. Any help with this is greatly appreciated.

I’m rendering a dynamic texture using OpenGL, updating it for each frame. I render the texture image to a region in the back buffer, read the pixel data back using glCopyTexImage2D(), clear the back buffer and use the generated texture on subsequent objects in the scene.

This works fine, as long as I don’t use mipmapping. Drawing all the mipmap levels takes only a few milliseconds, and reading them back one by one is also quick, but once all mipmap levels are read back, something kicks in behind the scenes that takes ages (tenths of a second) to execute, slowing my application down from 200+ fps to 10 fps or less. This slowdown occurs even if I never even activate a mipmap mode for GL_TEXTURE_MIN_FILTER.

If I don’t blit back mipmaps, only blit back a single image texture, everything speeds along fine, with mere milliseconds spent on the call to glCopyTexImage2D().

Is this normal? Should I expect the API to take this long to activate a mipmap pyramid when it is finished?

Stefan G

I’m not aware of anything that would cause this…

  • Matt

Thanks for the response. I really like the presence Nvidia has in these forums.

I have a 20k tarball demonstrating the problem, and I will send it to anyone who wants it. I won’t post it here, though. Not just yet. I want to fiddle around with the code some more myself before I let it out in public.

Stefan G

Stefan: You don’t explain how you come up with your mipmaps. Do you call GetTexImage and scale it yourself? Do you render multiple times? Do you draw a down-scaled version of one image and then call CopyTexImage? Do you enable GENERATE_MIPMAP (1.4) or GENERATE_MIPMAP_SGIS (SGIS_generate_mipmap)?

Probably the fastest method it to enable GENERATE_MIPMAP. (I may have the token wrong.) Don’t know if I’ll have much time to do much detailed analysis of your code, but I can probably take a quick peek if you email it to me.

For everybody’s information: this problem has now been tracked down to a small bug in the OpenGL driver versions from Nvidia. The 40.XX builds have fixed the bug. A Linux release of the 40.XX generation drivers is not yet available, but I trust it will happen soon.

Many thanks to Matt Craighead and Pat Brown at Nvidia for being extremely responsive and helpful!

Stefan G

I’ve got a similar problem unter Win2k and 30.x and 40.x driver: if I render some primitives to a pbuffer and glCopy(Sub)Texture2D() the contents into a texture, from time to time this copy takes more than 1(!) or sometimes even more than 7(!!!) seconds! I’ve timed the glCopy(Sub)Texture2D() call: all the time gets wasted there. I’ve tested serveral “internal formats” (GL_RGB, GL_RGBA8, GL_R3_G3_B2, GL_RGB5_A1, …); only the GL_RGB5_A1 format does not have this strange performance behavior. Any suggestions?

The internal format must match what’s in the source AND what’s in the destination, else format conversion needs to happen. If the CPU reads the data back and converts it, you lose.

Note that (Copy)TexImage() is usually slower than (Copy)TexSubImage(), as it has to re-allocate internal data structures pertaining to the texture, whereas subimage just has to update bits in place.

Agreed, texture memory allocation and pixel format conversion takes time, but converting a 256x256 pixel image between any two formats shouldn’t require seconds. Spending even a tenth of a second on a 256x256 image means hundreds of clock cycles in the GPU for every pixel, and thousands of cycles in the CPU.

Optimum performance on consumer level hardware requires matching of the frame buffer and the texture memory formats, I can agree to that, but a mismatch shouldn’t have such grave consequences.

As stated above, this is a confirmed 3x.xx driver bug. Versions 40.xx have it fixed.