Lab 3.2 Frequencies and Fourier analysis
In this lab we continue our discussion of time series data. Here, we will focus on frequency analysis.
You have probably heard of people speaking about the frequency content of sound or other signals previously. On your music player, you can modify how the "bass" frequencies (that is, the lower frequencies) or the "treble" frequencies (the higher frequencies) are played. What do we mean by frequency, and how can we examine information at different frequencies in a signal?
Continuous functions (or continuous signals) can be written as sums of sines and cosines
The concept of frequency depends upon a fundamental and beautiful result of mathematics. We can express any time-varying, continuous signal as a sum of sines and cosines. (More generally, we can express any continuous function as a sum of sines and cosines. If our signal is based on space, such as in an image, then we can think of frequency in space rather than in time. But let's focus on time here.)
Sines and cosines
Remember sines and cosines? Let's take a look at a few examples to remind ourselves.
Suppose we have a small sample that runs from time 0 to time 10 in steps of 0.01 (that is, a sample interval of 0.01 s or a sample rate of 100 Hz). Let's plot a few sines and cosines of different frequencies.
The function sin(2*pi*f*t) produces a periodic wave with frequency f that varies between -1 and 1. The function cos(2*pi*f*t) also produces a periodic wave with frequency f that varies between -1 and 1, but the cosine function is "advanced" in phase, such that cos(2*pi*f*(t-pi/4)) = sin(2*pi*f*t).
Now let's plot these, but we'll multiply the signals that are at 5Hz so they are easier to see:
Q1: For each variable y2s, y2c, y5s, and y5c, how many complete cycles are present in each second of data on the graph?
Approximating signals with sines and cosines
Let's look at an example signal, and how we can approximate it with sines and cosines. Let's generate a square wave and look at how we'll represent it:
(Leave this figure open, we'll go back to it.)
We can approximate this signal with sums of several sines and cosines. In this example, I'll just give you the coefficients and frequencies to use; in a few minutes, I'll show you how to figure out which coefficients and frequencies to use for any signal.
Copy and paste these coefficients and frequencies. We'll let sc represent the sine coefficients and cc represent the cosine coefficients.
Let's write the sum of sines and cosines for the first few terms. Why don't you write out the first couple, and copy and paste the rest:
Q2: As we include more sines and cosines, does the approximation get better or worse?
The Fourier Series (for discrete signals):
We can write any discrete signal (that is, a signal made up of discrete samples such as those that we work with in Matlab) as a sum of sines and cosines as follows:
where signal is our discrete signal that depends on time, N is the number of data points in the signal, and the frequencies
where SR is the sampling rate of the digital signal in Hz.
This series is named for the mathematician Joseph Fourier. You might think that in order to write such an equation, the coefficients an and bn would all depend on each other, but the beauty of the Fourier series is that the coefficients (all an's and bn's) are totally independent of one another. The equations for these coefficients are
Calculating an and bn for a signal
There are 2 steps to understanding how we can use Matlab to calculate the an's and bn's. The first step is to understand what frequencies of sines and cosines we are going to use to fit the function. Here is a little demo function that slowly displays the various frequencies, running from very low to very high; as n gets high, the effective frequencies actually start to get lower again (you could save this to a new folder in tools called the signals folder; remember to add the path):
The next step is to actually calculate the an's and bn's. Let's do this for 2 particular values of n for the pulse we drew above (that is, S). The first value we'll look at is n=0. When n=0, then the frequency fn = n/N = 0, so we just have cos(0), which is 0. So
So a0 is just the mean of the signal. Since sin(0) is 0, b0 has to be 0 (it's not interesting).
Now let's calculate a1; we could really have picked any non-zero index here, the principle is the same. We want to calculate
which we can do with the following Matlab code:
Now we want to perform an element-by-element multiplication of our signal S with cos(2*pi*f1*t). Let's plot these things:
We can accomplish the element-by-element multiplication with the dot (.) operator, and using the function mean to add up all the numbers and divide by N:
Now we have a1.
Here is a function that computes the Fourier coefficients an and bn and frequencies fn (you could put this in signals):
We can calculate an's, bn's, and fn's for our signal S:
Reconstructing the signal from an's and bn's:
We can reconstruct our signal perfectly using matrix multiplication and the equation for signal(t) above: (we'll go over how this brief expression works in the homework!)
Q3: How does the reconstruction look against the original?
Examining the frequencies that are present in a sample
Now we can examine the frequencies that are present in our time series data sets.
Let's plot the an's and bn's as a function of frequency for the pulse data. Since we care about the magnitude of the signal at each frequency, and not so much whether the signal is carried by the sine or cosine portion, let's look at what is called the signal power; this is the square of the vector magnitude for the 2 components at each point. (For example, the power at frequency 1, P1, is a1^2 + b1^2.)
Let's look at a more realistic example of data. Make sure you have downloaded the data from Lab 3.1:
Let's look at the frequency information present in this sample:
Q4: Recall from the last lab that the heart rate of this sample was around 95 beats per minute. How many beats per second is this? Look at the power at the frequency that corresponds to the frequency of the ECG in beats per second. What do you notice?
You may wonder why you see several peaks at frequencies that are equal to even multiples of the fundamental ECG beat frequency. These are called harmonics. Data that is highly periodic but not perfectly sinusoidal will often exhibit harmonics. (An intuitive example: a nonsinusoidal signal at 10Hz will also have some power at 20Hz, 30Hz, etc.) The power at each harmonic typically is reduced relative to the previous harmonic.
We can also look at the temperature data; for this, we will have to isolate some data that was sampled continuously (that is, we'll ignore regions where there are gaps in the samples). I happen to know that the first 500 data points of the Blue Hills temperature data set is continuous:
Q5: At what frequency is the peak power on the left side of the graph for the Blue Hills data?
The higher frequencies in a discrete sample are nearly mirror images of the lower frequencies
Note: You may have noticed that the higher frequencies of the Fourier series are nearly mirror images of the lower frequencies. For this reason, one typically shows the left half of these frequencies.
Why are these frequencies mirror images of one another? This is simply a consequence of the mathematics. Let's look at the waves that underlie the Blue Hills data.
Let's find the exact locations where the Blue Hills power is greater than 10 (the peaks):
So why do these have the same frequency? Let's look at how these 2 frequencies map onto the cosine function. Let's consider successive time points; without loss of generality, we'll look at the increments from point 0 to 1/12 to 2/12. What goes into the cosine function for these values?
The cosine function is symmetric and periodic; so cos(x) = cos(2pi-x) and cos(y)=cos(2*pi*y). What is happening in the higher frequency case is that the phase input to cosine is stepping so fast that it is stepping '2pi-x' compared to the step of 'x' in the lower frequency. We can illustrate this graphically using a new trick, the text function. The text function places some text at a given x,y point on the graph.
In general, for frequencies above 1/2 of the sampling rate (in this case, frequencies above 6 per year since our sampling rate is 12 samples per year), the phase advance between successive time points is so great that the "effective" frequency of SR*(N/2+m)/N is equal to SR*(N/2-m)/N. In this example,
So this is why the frequencies on the right of a plot of Fourier coefficients approximately mirror the left, and further why we can leave the right side out of a summary plot.
Many signals in biology, such as speech, exhibit changes in frequency content over time. If we performed a frequency analysis of the sound of someone speaking an entire sentence, we would have a hard time picking out the components of the different subsounds, which are called phonemes. We can do this analysis by analyzing little portions of the data (called "windows"), and analyzing the frequency components of each portion.
Consider the Zebra Finch song clip generously provided by Malu Murugan from Rich Mooney's lab at Duke.
wave is the waveform of the sound and fs is the sampling frequency (that is, the sampling rate). We can play this sound using the command sound (see help sound):
Let's look at the power spectrum over the whole waveform:
Q6: Which frequencies carry a lot of the signal?
This power spectrum averages over all of the interesting structure in the song. We can look at the structure of the song with the function spectrogram, which divides the data into little windows, and performs Fourier analysis on each window. The resulting figure is an image with time on the x axis, frequency on the y axis, and power represented by color. Check out help spectrogram to see the meaning of the input arguments we are giving:
Red colors indicate high power, blue colors indicate low power.
Q7: Which frequencies are strongly present during the bird's first syllable, at around 0.35 seconds? Which are powerful during the syllable at 0.6 seconds?
Appendix note - the Fast Fourier Transform
In this lab, we calculate Fourier coefficients using sines and cosines. The most common way of calculating Fourier coefficients when speed matters is to use the complex number form exp(i*theta), which is equal to cos(theta)+i*sin(theta). If you type help fft, you will see how to use the fft function that computes the Fast Fourier Transform, which is a lot faster than the fourier_coefficients.m function above. The bummer of fft is that it doesn't also return the frequencies of the coefficients. So for my own work I wrote the following function, fourier_coeffs.m, to calculate Fourier coefficients using fft but also to return their associated frequencies. You'll get the same answers as above, but the math is a little less intuitive for most, but the code runs faster. For the class I wanted to make the math easier rather than the code fast (you could put this in signals).