Hello All,
I’ve recently implemented a simple wavelet transform in OpenGL/Cg. It works, but it’s nowhere near as fast as I need it to be and I’m attempting to optimize it. I can see where the slowdowns are, but I lack the knowledge of how to fix them.
Here is how the system works: An input image and a lookup table (LUT) are loaded into two textures. Two pbuffers (TEMP and FINAL) are created. A result texture (to hold the resulting wavelet coefficients) is created.
During the transform process, the input image is bound as the source texture (texture unit 0) and the LUT is bound to texture unit 1. The X-pass of the transform renders to the TEMP pbuffer, which is bound as a texture and used as the input to the Y-pass. The Y-pass renders to the FINAL pbuffer, the result texture is bound, and the contents of the FINAL pbuffer are copied into the result texture using glCopyTexSubImage2D().
My MAJOR slowdown is the call to glCopyTexSubImage2D(). The copy is much slower than the rendering passes, and the time taken is directly proportional to the size of data being copied.
All textures (and pbuffers) are formatted GL_RGBA, type GL_UNSIGNED_INT_8_8_8_8. The LUT texture is GL_TEXTURE_2D, all others are GL_TEXTURE_RECTANGLE_EXT. Pbuffers are GL_TEXTURE_RECTANGLE_EXT as well. As far as I can tell, all textures are resident. A check using the standard OpenGL calls reports GL_FALSE the first time each texture is bound, and GL_TRUE after that.
The pixel format used to create the CGL context is apparently being hardware accelerated (kCGLPFAAccelerated = GL_TRUE, kCGLPFARendererID = 0x00021802).
My current system is a 1.3 GHz 17" PowerBook G4 with a Radeon 9600 XT chipset.
Any ideas as to the cause of the slowdown? What can I do to make this faster?
Thanks!
-Josh Senecal