PDA

View Full Version : CPU vs. GPU on convolutions



tomodachi
03-27-2005, 03:36 PM
Hi,
I tried blurring an image using GLSL, but i find the result is too slow, because if I want to average 8 texels, I must acces 8 times to the texture. The fact that a fragment shader can't send information to the next execution of itself in the next pixel makes me think that accessing the texture data directly from OpenGL and do calculations on CPU is faster. Am I wrong?

carl_lewis
03-27-2005, 03:50 PM
actually, I've found that the GPU out performs the CPU on 2D convolution.
simply averaging a 3x3 area should be performed very much faster than whatever screen refresh rate you care to assign, but for more complex filters you may need to encode the kernel as a 2D texture (with multi texture support)

texture lookups are one of the fastest instructions on the GPU (assumed by most to be a single cycle)

experimenting with a 9x9 area (with texture encoded kernel) i still get around 50 fps (compared to around 20 fps on the CPU)

Korval
03-27-2005, 05:35 PM
texture lookups are one of the fastest instructions on the GPU (assumed by most to be a single cycle)Huh? If a texture lookup takes only 1 cycle, you're lucky. A texture lookup is, effectively, a memory access. And, as you should already know, memory access == slow.

Indeed, texture lookups are one of the slowest operations you can do on a GPU. You can do a full 4-vector floating-point dot-product in one cycle, but there's no way you can expect a texture lookup to take only that long. The memory fetches that are required to satisfy the lookup request are what makes it take time.

That being said, you'll still easly outstrip your CPU by comparison. The GPU will, typically, do a 2x2 array of tiles all at once, thus mitigating the memory fetch and opcode costs. A CPU cannot. GPU's typically have much faster memory compared to CPU's. And GPU's have caches specially designed for dealing with texture images, as well as storing images in a format specifically designed to make texture fetching faster. CPU's do not.

tomodachi
03-28-2005, 02:42 AM
Imagine I'm doing a 1D convolution averaging 8 pixels. I make a sum of the first 8 pixels, save it, and divide it by 8 so I get the pixel averaged. Then, for the next pixel, I take the result of the prevoius sum, subtract the first pixel of the range, and sum the next first pixel out of the range. This way I've got the second pixel averaged with only two operations. This thing cannot be done with GLSL, and it's what I was talking about.