PDA

View Full Version : Dct



hlewin
01-04-2015, 05:52 AM
I've a short not-really gl-related question.
Why are the DCT-function and the like used in computer graphics? For example as scaling filter?
From what I understand those do not really seem appropriate for pictures. For example the sharp boder in an image half-black and half-white is hardly expressible by a sum of sine-waves. When resampling sounds using FFT makes kind of sense, but in graphics? I wonder these work at all, not speaking of the fact that JPEG-compression utilizes such transformations with great success?

Agent D
01-04-2015, 07:51 AM
Imagine you watch a signal on a oscilloscope screen. If we sample that signal and turn it into a time-discretized, amplitude-discretized sequence
of values, how can you tell if the sequence of values represents audio or a sequence of image pixels without knowing where the signal comes from?

The theory of signal processing still applies, no matter whether it's color values of an image or amplitude values of an audio signal. An image
simply has an additional dimension we have to deal with, but signal theory can easily be extended to more dimensions in the input signal.

Of course, a high frequency signal, like a black image with a white line in it can be represented by a weighted summation of infinitely many
sine waves. For practical uses, we have the same problems here as with audio containing high frequency components. We have to decide when
to cut off our infinite series and take into accout the errors that this introduces. For audio, we have a relatively low cutoff frequence in the human
perception that we can use, but in both cases, we will introduce an error with DFT/DCT and inverse.

Take your favourite image editing program and draw a picture with solid color rectangles and straight lines and export it as a JPEG image. You will
notice blury artifacts around the high frequency patterns and some sort of blockyness since the JPEG compressor processes 8x8 blocks. Photographs
usually don't have high frequency components and contain a lot of noise, so the error will go unnoticed, even if our JPEG compressor starts removing
some of the higher frequency coefficients.

An image is just another form of a signal. On the one hand side, we have the usual problems of signal processing (sampling & aliasing, cut-off frequencies,
ringing artifacts and so on....), on the other hand we can use the signal processing tools on images. While we can use FIR filters on images just like we use
them on audio signals, we can also transform an image to the frequency domain, do interesting things with it and transform it back (if we don't care about
the error we introduce).

hlewin
01-04-2015, 08:11 AM
Imagine you watch a signal on a oscilloscope screen. If we sample that signal and turn it into a time-discretized, amplitude-discretized sequence
of values, how can you tell if the sequence of values represents audio or a sequence of image pixels without knowing where the signal comes from?
I would have said this is "quite easy" as the audio signal would look more like a smooth curve with hills and valleys without any sharp discontinuities.

GClements
01-04-2015, 03:13 PM
I've a short not-really gl-related question.
Why are the DCT-function and the like used in computer graphics? For example as scaling filter?
From what I understand those do not really seem appropriate for pictures. For example the sharp boder in an image half-black and half-white is hardly expressible by a sum of sine-waves. When resampling sounds using FFT makes kind of sense, but in graphics? I wonder these work at all, not speaking of the fact that JPEG-compression utilizes such transformations with great success?
They don't work particularly well for signals (whether 1D or 2D) with discontinuities (e.g. hard edges in images). E.g. using JPEG for text or line-art tends to result in "ringing", just like if you apply a low-pass filter to a perfect square wave or step function.

But infinite derivatives don't exist in reality. A lot of effort goes into creating worlds which don't have perfectly flat surfaces with zero-radius corners illuminated solely by a finite number of perfect point lights.

hlewin
01-05-2015, 01:30 AM
They don't work particularly well for signals (whether 1D or 2D) with discontinuities (e.g. hard edges in images). E.g. using JPEG for text or line-art tends to result in "ringing", just like if you apply a low-pass filter to a perfect square wave or step function.
This seconds my observations - sharp edges aren't expressible by sums of sines at all.


A lot of effort goes into creating worlds which don't have perfectly flat surfaces with zero-radius corners illuminated solely by a finite number of perfect point lights.
So it is really because of the color-gradients in the image that such methods work well? This again makes kind of sense, altough I would not have seen the sinoidal base-shape in the gradients.
Could one - in theory - simply exchange the base-function to get something that is closer to the real gradients seen in the image?

GClements
01-05-2015, 04:35 AM
This seconds my observations - sharp edges aren't expressible by sums of sines at all.
"At all" is an overstatement. A DFT or DCT is invertible. Any sequence of samples can be converted to a sequence of DFT/DCT coefficients, and those coefficients can be converted back to the original sequence. But as soon as you try to achieve compression by discarding high-frequency coefficients, sharp edges will suffer the most. If you discarded the low-frequency coefficients, areas of solid colour would suffer.


So it is really because of the color-gradients in the image that such methods work well?
It's because the kind of images for which JPEG was designed (mainly photographs) have limited bandwidth. You won't find a pure white pixel adjacent to a pure black pixel in a photograph, or even in synthetic images which attempt to be "photorealistic".


Could one - in theory - simply exchange the base-function to get something that is closer to the real gradients seen in the image?
Which image(s)?

In the general case, you need as many outputs as inputs for the transformation to be reversible (only square matrices have an inverse). Different transforms would result in different types of signal producing zero (or near-zero) coefficients. E.g. the Fourier and Discrete Cosine transforms result in bandwidth-limited signals producing near-zero values for the high-frequency coefficients.

In theory, any square matrix can be used as a discrete transform. The Fourier transform (or variants) is commonly used because it's a convolution (http://en.wikipedia.org/wiki/Convolution) transform, so any operation on the coefficients can be interpreted as a convolution operation on the samples and vice versa.

dorbie
01-05-2015, 05:53 AM
This seconds my observations - sharp edges aren't expressible by sums of sines at all.

With enough frequency domain coefficients you can do sharp edges, however because this is used as a common compression technique by lopping off higher frequency coefficients after some kind of entropy compression you see ringing at sharp edges. Ringing is the spatial domain SINC residuals of the edge you introduced in the frequency domain due do sharp truncation of frequency components.



Could one - in theory - simply exchange the base-function to get something that is closer to the real gradients seen in the image?

That would be wavelets. If that interests you there is a lot of published work. It is not a magic bullet.

dorbie
01-05-2015, 06:08 AM
I've a short not-really gl-related question.
Why are the DCT-function and the like used in computer graphics? For example as scaling filter?

They are used most commonly in image and video compression. DCT works well as a constrained FFT on small image tiles of fixed size e.g. 8x8. To explain briefly, images are converted into smaller tiles of luma and chroma components (the chroma might be at lower frequency, larger tile size). These components are independently converted into frequency domain coefficients using a DCT often with a diagonal spatial scan pattern (so it may not be a tile raster you're operating on) These are then compressed in some way to minimize their storage, e.g. entropy encoding, then a quality factor truncates high frequency components, hopefully low amplitude coefficients. Sometimes the image data on the way in is pass band filtered in some way for better results at this latter truncation stage.