View Full Version : Convolution performance w/ non-separable filters

03-24-2007, 11:29 PM
I have a fragment shader that uses 2 textures. One is the base image and the other a convolution kernel. The filter that the kernel is generated from in non-separable. Therefore I'm just using a simple brute force approach of a nested loop to iterate over the image/kernel and summing the results. The kernel can be up to 256x256. Rendering is quite slow on my 7800 GTX. Any shader 'tricks' that I should try to improve performance?

Bruce Merry
03-25-2007, 06:29 AM
If you don't want to go the full FFT route (which I think can and has been hardware accelerated BTW), consider just a brute force Fourier Transform. It's a separable process, so by computing the FT of the image and kernel, multiplying them and then taking the inverse FT you're avoiding any 2Dx2D loops.