Convolution performance w/ non-separable filters

Foxbat · March 24, 2007, 11:29pm

I have a fragment shader that uses 2 textures. One is the base image and the other a convolution kernel. The filter that the kernel is generated from in non-separable. Therefore I’m just using a simple brute force approach of a nested loop to iterate over the image/kernel and summing the results. The kernel can be up to 256x256. Rendering is quite slow on my 7800 GTX. Any shader ‘tricks’ that I should try to improve performance?

bmerry · March 25, 2007, 6:29am

If you don’t want to go the full FFT route (which I think can and has been hardware accelerated BTW), consider just a brute force Fourier Transform. It’s a separable process, so by computing the FT of the image and kernel, multiplying them and then taking the inverse FT you’re avoiding any 2Dx2D loops.

system · October 19, 2021, 7:39pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.