Everyday DSP for Programmers: Spectral Peak Detection

For the last couple of weeks we've been looking at how the DFT works and some example DFTs. Throughout those posts we've been using signals with frequencies that fit nicely into the frequency bins of the DFT. Now we're going to see what happens when the signal frequency isn't so clean and how to deal with it. We want a simple algorithm for detecting the main frequency values of the signal without paying too high of a price in computation. This type of algorithm is called spectral peak detection. Let's get started.

Frequency Aligned with Sample Rate

When the frequency of a signal is aligned with the sample rate, the sine wave associated with that frequency will have an integer number of periods within the block of samples used for the DFT. That frequency will then show up as a single peak contained within one bin in the frequency domain. Mathematically, the frequency will satisfy the equation that describes the frequency bins of the DFT:

f_k = f_s·k/N for k = [0..N-1]

where f_s is the sample rate, N is the number of DFT samples, and k is the bin number. As long as the frequency is one of these f_kfrequencies, all of the energy from the frequency will appear in a single frequency bin, like for this frequency of 32 Hz for a 512-point DFT of a signal sampled at 512 Hz:

You can click the graph to set the signal in motion, but it doesn't matter what the phase of the signal is. The frequency is steady at 32 Hz. (This and the rest of the frequency graphs is zoomed in on the range of 0 to 64 Hz to show more detail, if you're wondering.) What happens when the frequency is not in one of these increments?

Frequency Misaligned with Sample Rate

When the frequency of the signal doesn't line up with a frequency bin, some of it leaks out into the surrounding bins. Remember that each frequency in the frequency domain is actually a sinc function with a central peak equal to the frequency's magnitude. A frequency that's centered on a frequency bin will have its peak in the bin corresponding to its frequency value, and all other bins will be where a zero from the sinc function occurs.

If the frequency is not centered on a bin, then an off-peak value from the main lobe will show in the bin closest to the frequency's value, and all other bins will have values from the other lobes of the sinc function. We can better see what's going on with a plot of the underlying sinc function for various frequencies:

Click the graph to have it start sweeping through the frequencies, and notice how the magnitude repeatedly spreads out and then contracts back into a peak as the frequency increases. The grey plot of the sinc function is generated by taking a much longer FFT (equivalent to a DFT, but calculated faster) of the signal that's padded with zeros. The zero padding increases the resolution of the FFT, revealing the sinc function behind the frequencies. The sinc function appears to bounce up and down at first because the left-hand side of it is being reflected back onto itself from the y-axis.

The points on the blue frequency plot are always on the grey sinc function. It's as if the sinc function is sweeping through the frequency bins, and the bins mark where the sinc function is sampled for the magnitude values. The frequency magnitude is at a maximum when the sinc function lines up with the frequency bins, and the leakage is at a maximum when it is halfway between bins.

Now that we see what's happening, how do we calculate what the frequency value is when it falls between bins? We could increase the size of the FFT to increase its resolution by increasing the sample rate, sampling the signal longer, or zero-padding the signal. Then we have a better chance of finding the frequency peaks, but we may not have the resources to do any of these things. There is another way.

The Correction Factor

Another way to calculate spectral peaks is to find the maximum peak, and then apply a correction factor using the values around the peak to estimate where the true peak of the sinc function is. One correction factor that's commonly used is

C = (|X_k+1| - |X_k-1|)/(4|X_k| - 2|X_k+1| - 2|X_k-1|)

Where |X_k| is the magnitude of the kth frequency bin, which should be the bin with the maximum peak. This correction factor is pretty easy to calculate, and all you have to do is add it to the magnitude of the maximum peak to get the estimated frequency. There's just one problem: if you use a normal FFT, the correction is going to be pretty inaccurate!

The problem with this correction factor is that it is assuming it is fitting to the central lobe of the sinc function, but with a normal FFT's frequency bins, most of the time one of the magnitudes will come from one of the side lobes. To force all of the magnitudes to come from the central lobe of the sinc function, dramatically increasing the correction factor's accuracy, the FFT should at least be zero-padded up to the next power of two. That means if you are doing a 512-point FFT, you should pad it with 512 zeros and do a 1024-point FFT. It's still better than having to go to a 16384-point FFT (or larger) to get the necessary resolution without the correction factor.

Now that we have a method of calculating the spectral peak, here is what it looks like as a JavaScript function:

function SpectralPeakDetect(X) {
  var k = 0;
  for (var i = 0; i < X.length; i++) {
    if (X[k] < X[i]) k = i;
  }
  return k + (X[k+1] - X[k-1])/(4*X[k] - 2*X[k-1] - 2*X[k+1]);
}

The function simply scans through the provided frequency spectrum X to find the maximum peak, and then it applies the correction factor to return the fractional bin number corresponding to the estimated peak. The calling code would then convert the bin number into a frequency using the FFT size and sample rate. Here's how the correction factor performs over the frequency sweep we saw before:

You can click the graph to start and stop the sweep, and inspect how the calculated frequencies compare to the actual frequencies. It's nearly always within ±0.02 Hz, which is quite an improvement over the original ±0.5 Hz without the correction factor. Now that we can do spectral peak detection, what can we do about signals with multiple frequencies?

Peak Detection of Multiple Frequencies

The correction factor stays the same if there are multiple peaks, and it should perform nearly as well as long as there are at least two frequency bins between peaks. The problem with our previous algorithm boils down to finding the maximum peak to use. Now there's more than one, so we can't just go looking for the maximum.

It turns out that arbitrary peak detection is not such a simple problem because in general there can be a lot of noise in the frequency spectrum. Differentiation is a possibility, but finding peaks by differentiating doesn't always work because the noise can also be 'peaky.' Thresholds are another option, but peaks that we want to find can occur on top of different levels of noise floors, making fixed thresholds problematic. Having additional knowledge about the signal and constraints on its frequency response can help in specific applications.

For this example, we'll assume there is no noise floor, so we can use a fixed threshold for determining if we're near a peak or not. In some cases it may also be possible to use a relative threshold with a moving average of the frequency spectrum to eliminate the noise floor before finding the peaks. In our case, we'll make sure that the peaks cross a given threshold before marking them as real peaks. Here's what the algorithm looks like:

function SpectralPeakDetect(X, threshold) {
  var freqs = [];
  for (var i = 1; i < X.length-1; i++) {
    if (X[i] >= X[i-1] && X[i] >= X[i+1] && X[i] > threshold) freqs.push(i);
  }
  return freqs.map(function(k) { return k + (X[k+1] - X[k-1])/(4*X[k] - 2*X[k-1] - 2*X[k+1]) });
}

It ends up looking pretty similar to the simple frequency correction function, but instead of scanning for a single maximum, we check if the frequency magnitude crosses the threshold. If it does, the criteria for a peak is met when the frequencies on either side of it are less than or equal to it. If a frequency passes the criteria, it's added to the list, and then the correction factors are applied with a map. Here's what the result looks like visually:

Again, you can start and stop the animation by clicking the graph. The zero-padding is not displayed, but it's still used for the FFT to improve the accuracy of the peak detection. The threshold shown by the grey line is sufficient to allow only detection of the two main peaks while ignoring all of the smaller peaks from the lobes of the sinc functions. It's clear that if the peaks were farther apart in magnitude, then it would be more difficult to choose a good threshold, and it may be necessary to move to a relative threshold scheme.

The peak detection algorithm also does a good job of differentiating between the two peaks right up until they merge together in the center of the frequency spectrum. You can enjoy the mesmerizing sine waves while you wait for the peaks to get to the center in the animation. The frequencies have to get very close together before the smaller peak disappears into the larger peak, and the algorithm can only see one combined peak.

That about wraps up another DSP algorithm that's built on basic tools and turns out to be fairly straightforward in the end. Next week we'll dig into filters, a topic we've touched on a number of times, but now we have the tools to go into more depth.

Other DSP posts:
Basic Signals
Transforms
Sampling Theory
Averaging
Step Response of Averaging
Signal Variance
Edge Detection
Signal Envelopes
Frequency Measurement
The Discrete Fourier Transform
The DFT in Use
Spectral Peak Detection
FIR Filters
Frequency Detection
DC and Impulsive Noise Removal