24/05/2018, 23:56

Fundamentals of Image Processing

Lecture part 00 Lecture part 01 Lecture part 02 Lecture part 03 Lecture part 04 Lecture part 05 Lecture part ...

Lecture part 00 Lecture part 01 Lecture part 02 Lecture part 03 Lecture part 04 Lecture part 05 Lecture part 06 Lecture part 07 Lecture part 08

Continuous-domain, discrete-domain, and finite-size images

An image is a spatially varying signal s(x,y) where x and y are two spatial coordinates. The signal value s(x,y) at each spatial location (x,y) can be either a scalar (e.g. light intensity for gray scale images) or a vector (e.g. 3 dimensional vector for RGB color images, or more general P-dimensional vector for multispectral images). In the latter case, we could treat each vector component separately as a scalar image (referred to as a channel), sometimes after a certain transformation in the P-dimensional vector space.

In digital image processing, images are discretized into samples at discrete spatial locations that are indexed by integer coordinates [m,n]. Typically, a discrete-domain image s[m,n] is related to a continuous-domain image s(x,y) through the sampling operation

s [ m , n ] = s ( m Δ x , n Δ y ) ,

where Δx and Δy are sampling intervals in x and y dimensions, respectively. More general, a discrete-space image is obtained through the generalized-sampling operation

w [ m , n ] = ∫ - ∞ ∞ ∫ - ∞ ∞ s ( x , y ) φ m , n ( x , y ) d x d y ,

where φm,n(x,y) is the point-spread function of the image sensor (e.g. a photometric sensor in a digital camera) at the location indexed by (m,n). Typically, point-spread functions at different locations are simply shifted versions of a single function as

φ m , n ( x , y ) = φ ( x - m Δ x , y - n Δ y ) ,

and φm,n(x,y) is called the sampling kernel.

Furthermore, a discrete image s[m,n] is often of finite size; for example 0≤m≤M-1,0≤n≤N-1. Then s[m,n] can also be treated as an M×N matrix. The image sample s[m,n] and the corresponding location [m,n] is often called a pixel, or picture element.

Fourier transforms and sampling theorem

It is often very effective, conceptually and computationally, to represent images in the frequency domain using the Fourier transform. For a continuous-domain image s(x,y), its Fourier transform is defined as

S ( u , v ) = ∫ - ∞ ∞ ∫ - ∞ ∞ s ( x , y ) e - j 2 π ( x u + y v ) d x d y .

Here, u and v denote frequency variables and they have reciprocal unit with x and y. For example, if the spatial coordinate x has unit in mm, then the corresponding frequency variable u has unit in mm-1. Under certain conditions, the image s(x,y) can be exactly recovered from its frequency-domain S(u,v) by the inverse Fourier transform

s ( x , y ) = ∫ - ∞ ∞ ∫ - ∞ ∞ S ( u , v ) e j 2 π ( x u + y v ) d u d v .

We denote this pair of signals related by the Fourier transform (FT) as

s ( x , y ) ⟷ FT S ( u , v ) .

For a discrete image s[m,n] the discrete-space Fourier transform (DSFT) relation

s [ m , n ] ⟷ DSFT S ( u , v ) .

is defined as

S d ( u , v ) = ∑ m = - ∞ ∞ ∑ n = - ∞ ∞ s [ m , n ] e - j 2 π ( m u + n v ) , s [ m , n ] = ∫ - 1 / 2 1 / 2 ∫ - 1 / 2 1 / 2 S d ( u , v ) e j 2 π ( m u + n v ) d u d v .

It is easy to see that Sd(u,v) is a periodic function

S d ( u + k , v + l ) = S d ( u , v ) , for all k , l ∈ Z ,

and thus we only need to consider the function in one period; e.g. Sd(u,v) with |u|≤1/2,|v|≤1/2.

Theorem 1 (Sampling) Suppose that the discrete-domain image s[m,n] is related to the continuous-domain image s(x,y) through the sampling operation [link]. Then their Fourier transforms are related by

S d ( u , v ) = 1 Δ x Δ y ∑ k ∈ Z ∑ l ∈ Z S u + k Δ x , v + l Δ y .

(Sketch) One way to prove this is to express s[m,n] using [link] by substituting x=mΔx,y=nΔy and then “match” with the right-hand side of [link].

The summation on the right-hand side of [link] consists of S(u/Δx,v/Δy) and its translated copies in frequency by (k,l). These copies with (k,l)≠(0,0) are called alias terms. If s(x,y) is bandlimited such that

S ( u , v ) = 0 , for | u | ≥ 1 / ( 2 Δ x ) , | v | ≥ 1 / ( 2 Δ v ) ,

then these alias terms do not overlap with S(u/Δx,v/Δy), and thus S(u,v) can be exactly recovered from Sd(u,v) simply by

S ( u , v ) = Δ x Δ y rect ( Δ x u ) rect ( Δ y v ) S d ( Δ x u , Δ y v ) .

Here the rectangular function is defined as

rect ( x ) = 1 if | x | ≤ 1 / 2 0 else.

We will show later that [link] in the spatial domain is equivalent to

s ( x , y ) = ∑ k ∈ Z ∑ l ∈ Z s [ m , n ] sinc ( t / Δ x - m ) sinc ( t / Δ y - n ) ,

where the sinc function is defined as

sinc ( x ) = sin ( π x ) π x .

For the discrete image s[m,n] of finite size M×N with 0≤m≤M-1,0≤n≤N-1, we have the discrete Fourier transform (DFT) relation

s [ m , n ] ⟷ DFT S [ k , l ] ,

which is defined as

S [ k , l ] = ∑ m = 0 M - 1 ∑ n = 0 N - 1 s [ m , n ] e - j 2 π ( m k / M + n l / N ) , s [ m , n ] = 1 M N ∑ k = 0 M - 1 ∑ l = 0 N - 1 S [ k , l ] e j 2 π ( m k / M + n l / N ) .

Therefore the DFT maps an M×N image in the spatial domain into an M×N image in the frequency domain; both images can have complex values.

Relating [link] to [link], we see that if the M×N image s[m,n] is zero padded outside its support [0,M-1]×[0,N-1] then S[k,l] is a sampled image of Sd(u,v),

S [ k , l ] = S d ( k / M , l / N ) .

In summary, we have seen the following three Fourier transforms

continuous-domain ⟷ FT continuous-domain discrete-domain ⟷ DSFT continuous-domain discrete-domain ⟷ DFT discrete-domain

Among these transforms, only the last one, the DFT, is computationally feasible (i.e. with summations of finite terms). Moreover, the DFT can be implemented efficiently with fast Fourier transform algorithms. In moving from the FT to the DSFT and then to the DFT, we first discretize the spatial domain and then the frequency domain. Therefore, it is important to understand [link] and [link] so that we can relate the computational results and images by the DFT to the frequency representation of the original image in the real world.

Vector-space framework

In a more abstract framework, we can view each image as a vector in an appropriate vector space (i.e. for continuous-domain, discrete-domain, or discrete-domain of finite support). The associate Fourier transform is a linear mapping or linear operator that maps a vector in the spatial domain into a vector in the frequency domain. We can express the Fourier transform and its inverse using the matrix-vector multiplication notation

Forward or Analysis: S = F s , Inverse or Synthesis: s = F - 1 S ,

In the vector-space framework, it is particularly useful to view the inverse Fourier transform as a basis expansion. For example, the inverse DFT in [link] can be written as

s = ∑ k = 0 M - 1 ∑ l = 0 N - 1 S [ k , l ] f k , l ,

in which the image s is expanded as a linear combination of basis imagesfk,l where

f k , l [ m , n ] = 1 M N e j 2 π ( m k / M + n l / N ) .

Other transforms such as the wavelet transform simply provide other basis expansions.

Problems

  1. Complete the proof of the sampling theorem.
  2. Suppose that s(x,y) is a bandlimited image with its Fourier transform S(fx,fy)=0 for |fx|≥1/(2Δx),|fy|≥1/(2Δy). This image is captured by an CCD array where sensors are placed on a rectangular grid with spacing Δx×Δy and each sensor measures the integral of light intensity falling in an area of size Δx×Δy on this grid. Derive a formula to recover s(x,y) exactly from these discrete CCD measurements.
  3. Prove that the inverse DFT perfectly recovers the image. That is, suppose that S[k,l] is given in [link] then show that the right-hand side of [link] indeed returns s[m,n].
  4. An 3 in by 4 in photo is discretized by a 200 dpi (dots-per-inch) scanner and results in a 600 by 800 digital image. Assume that alias is negligible. Suppose that we want to filter out all spatial high frequency above 50 inch-1 in both horizontal and vertical dimensions out of the image. One way is to zero-pad the image to the size of 1024 by 1024 so that we can use a fast Fourier transform (FFT) algorithm to compute the DFT of size 1024 by 1024 of the zero-padded image. Find the indexes of the DFT coefficients that need to be zero out before taking the inverse DFT to achieve the desired filtering effect.

Convolution operations

The image (linear) filtering operation in the continuous-domain is defined by convolution

r ( x , y ) = ( s * h ) ( x , y ) = ∫ - ∞ ∞ ∫ - ∞ ∞ h ( x ' , y ' ) s ( x - x ' , y - y ' ) d x ' d y ' .

And similarly in the discrete-domain:

r [ m , n ] = ( s * h ) [ m , n ] = ∑ m ' = - ∞ ∞ ∑ n ' = - ∞ ∞ h [ m ' , n ' ] s [ m - m ' , n - n ' ] .

The two-dimensional signal h(x,y) or h[m,n] is called filter, mask, or point-spread function.

Examples

Example 1 (First-order derivatives) The first-order derivatives in the x and y directions of a discrete image s[m,n] can be approximated by finite differences

∂ s ∂ x = s [ m + 1 , n ] - s [ m , n ] = ( s * h x ) [ m , n ] ∂ s ∂ y = s [ m , n + 1 ] - s [ m , n ] = ( s * h y ) [ m , n ] ,

which are convolutions with the following filters

h x ( 1 ) = 1 - 1 , h y ( 1 ) = 1 - 1 .

Here in the matrix form, row and column indexes correspond to x (first) and y (second) dimensions, respectively; and the sample in the box corresponds to the original (i.e. (m,n)=(0,0)).

Example 2 (The Laplacian and image sharping) The Laplacian of a two-dimensional signal s(x,y) is defined as

Δ s = ∂ 2 s ∂ x 2 + ∂ 2 s ∂ x 2

Extending the definition of the first-order derivatives above to the second-order derivatives, we have

∂ 2 s ∂ x 2 = s [ m + 1 , n ] - 2 s [ m , n ] + s [ m - 1 , n ] ∂ 2 s ∂ y 2 = s [ m , n + 1 ] - 2 s [ m , n ] + s [ m , n - 1 ] .

It follows that the Laplacian can be computed as

Δ s [ m , n ] = s [ m + 1 , n ] + s [ m - 1 , n ] + s [ m , n + 1 ] + s [ m , n - 1 ] - 4 s [ m , n ] = ( s * h Δ ) [ m , n ] ,

which is convolution with the following filter

h Δ = 0 1 0 1 - 4 1 0 1 0 .

Sometimes, the Laplacian is extended by adding two more terms for two diagonal directions leading to the following filter

h Δ ' = 1 1 1 1 - 8 1 1 1 1 .

Since the Laplacian is a derivative operator, it highlights intensity discontinuities (or edges) in an image. We can sharpen an image by adding the negative of the Laplacian image to the original

r [ m , n ] = s [ m , n ] - Δ s [ m , n ] .

Using the Laplacian filter [link] we can write the resulting sharpening operation as a convolution

r [ m , n ] = ( s * h sharp ) [ m , n ] ,

with the following filter

h sharp = - 1 - 1 - 1 - 1 9 - 1 - 1 - 1 - 1 .

Example 3 (Gausian smoothing filter) The two-dimensional Gaussian filter, which is often used for image smoothing, is defined as

h Gauss (2D) ( x , y ) = 1 2 π σ 2 e - x 2 + y 2 2 σ 2 .

The 2D Gaussian filter is separable, which means it is a product of 1D filters in each dimension

h Gauss (2D) ( x , y ) = h Gauss (1D) ( x ) h Gauss (1D) ( y ) , where h Gauss (1D) ( x ) = 1 2 π σ e - x 2 2 σ 2 .

Discrete-domain Gaussian filters used in practice are sampled and truncated versions of the above continuous-domain filters.

Example 4 (Sobel edge detector) The Sobel edge detector is obtained by smoothing the image in the perpendicular direction before computing the directional derivatives. The associate Sobel edge detector filters are given by

h x (Sobel) = 1 2 1 0 0 0 - 1 - 2 - 1 , h y (Sobel) = 1 0 - 1 2 0 - 2 1 0 - 1 .

Edges are detected as pixels [m,n] where the magnitude of the gradient is above a certain threshold T; i.e.

| ( s * h x (Sobel) ) [ m , n ] | + | ( s * h y (Sobel) ) [ m , n ] | ≥ T .

Frequency response of a filter

A key result in signal and image processing is that convolution in the space domain becomes multiplication in the frequency domain

r ( x , y ) = ( s * h ) ( x , y ) ⟷ FT R ( u , v ) = S ( u , v ) H ( u , v ) , r [ m , n ] = ( s * h ) [ m , n ] ⟷ DSFT R d ( u , v ) = S d ( u , v ) H d ( u , v ) .

Therefore, the Fourier transform H(u,v) of the filter, called frequency response, indicates how certain frequency components of the input image s(x,y) are amplified or attenuated in the resulting filtered image r(x,y).

However, multiplication in the DFT domain corresponds to circular convolution in the space domain

r [ m , n ] = ( s ⊛ M , N h ) [ m , n ] ⟷ DFT R [ k , l ] = S [ k , l ] H [ k , l ] .

The circular convolution operation for images of size M×N is defined as

( s ⊛ M , N h ) [ m , n ] = ∑ m ' = 0 M - 1 ∑ n ' = 0 N - 1 h [ m ' , n ' ] s [ ⟨ m - m ' ⟩ M , ⟨ n - n ' ⟩ N ] ,

where ⟨n⟩N denotes modulo N of n.

Problems

  1. Prove the property [link].
  2. Prove the property [link].
  3. Find and sketch the frequency response of
0