Everyday DSP for Programmers: Signal Variance

After covering averaging in the last two posts, the natural next thing to look at is how much the signal varies around the average. This property of a signal is called signal variance, and there are a couple different ways to calculate it, depending on how the average has been calculated. Let's see how variance can be calculated and what it tells us about a signal.

Variance

Real-world signals have noise and other short-term features that obscure the underlying behavior of the signal. Taking an average of the signal removes some of that noise, but it may still be desirable to know how much the signal was changing to get a sense of the volatility of the signal. One way to get an idea of how quickly a signal is changing is by looking at how quickly the averaged signal changes, but this can be misleading if there is a lot of noise but the underlying signal isn't changing.

A better way to calculate how much a signal is changing is to find the difference between the original signal and its average. The average of the squared differences is called the signal variance, and it is closely related to the concept of variance in statistics. In fact, signal variance over an entire signal is calculated exactly the same way that it is in statistics. The unbiased variance of a signal can be calculated as

Var_s = ∑_i=1..n (s[i]-s_avg)² / (n-1)

Where s[i] is the ith sample, and n is the number of samples in the signal. The way the equation is currently written, we need to compute the average of the whole signal before we can compute any of the differences. This calculation flow can present a problem in a real-time system, or when the entire signal can't be loaded into memory at once, because the samples are needed twice—once for the average calculation and once for the variance calculation. We can get around this by reorganizing the calculation and expanding the inner polynomial:

∑_i=1..n (s[i]-s_avg)² = ∑_i=1..n s[i]² - 2∑_i=1..n s[i]·s_avg + ∑_i=1..n s_avg²

The last term reduces to n·s_avg² because there are n samples and the average is the same for each sample. The middle term reduces to

-2∑s[i]·s_avg = -2s_avg·∑s[i] = -2s_avg·n·s_avg = -2n·s_avg²

Plugging all of that back into the original equation, we get

Var_s = (∑_i=1..n s[i]² - n·s_avg²) / (n-1)

Which can more easily be calculated on the fly. Both terms in the numerator can be accumulated separately and then combined at the end of the signal or at regular intervals.

Normally, we talk about standard deviation—the square root of the variance—when looking at signals because the standard deviation is in the same units as the signal and its average. But the square root operation can be expensive in embedded applications, so the variance is commonly used for internal computations in those cases. When we look at the variance in graphs, I'll plot the standard deviation so that the units will agree with the original signal's units.

We can compute the variance with a JavaScript function as follows:

function Variance(ary) {
  var mean = Mean(ary)
  var ss = ary.reduce(function(a, b) { return a + b*b }, 0)
  return (ss - ary.length*mean*mean) / (ary.length - 1)
}

This function assumes the array has more than one sample in it, otherwise we would have a divide by zero problem. Let's see what the variance looks like for the historical gas prices that we were using before. We get the following signal variance (click to see the mean and one standard deviation):

This plot shows the range of a single standard deviation. Usually two standard deviations would contain 95% of the samples, but it's clear from this plot that two standard deviations would more than cover 100% of the samples, so what's going on? The reason for the discrepancy is that the gas prices aren't following a normal distribution, and the 95% of samples within two standard deviations interpretation assumes a normal distribution of samples. So, the interpretation of the standard deviation is wrong in this case. We can't assume that the percentages of samples covered by one, two, or three standard deviations applies here, but the standard deviation still gives us a good idea of the range of values that the signal will attain. We can compare the standard deviation of the gas prices to sub-regions of the same signal or to other similar signals, like gas prices in individual states, for example.

Variance for Moving Average

Now that we know how to calculate signal variance, can we do it for other types of averages? Indeed, we can. Calculating variance for a block average is the same as it is for the full average we just did, but the calculations are done on a block-by-block basis. The same is true for the moving average, with the variance calculated for each step of the moving average, but the graph starts to get more interesting. Here it is:

Notice how when the prices are not changing much, the standard deviation gets narrower, and when prices fluctuate more, the standard deviation widens dramatically? The variance gives a good indication of how constant a signal is, even after the averaging has smoothed out the signal. At the beginning of the price history, in the 1996-2004 time frame, the average is moving slowly and the variance is small, meaning that prices were not changing much at all. Near the end of the price history, the average is also moving slowly, but the variance is much wider, meaning that the prices are changing quite a bit, but they're oscillating around a mean, so they're more volatile without shifting much in the longer term. Depending on your application, this can be useful information.

Variance for Exponential Average

Bringing the variance calculation to the moving average was fairly straightforward, but it's not quite the same for the exponential average. The exponential average doesn't have a history of samples to use for calculating the sum-of-squares term, so we need to figure out a different way to approximate the variance of the signal. Since the variance is a way to measure the difference in the current sample of a signal from the signal's average, we can calculate that difference using the exponential average as the average:

Var[i] = |s[i] - s_exp-avg|

This kind of works, but it doesn't look quite right because the variance reacts immediately to each sample. It doesn't use any of the older samples in the calculation, so it ends up not looking very smooth. One way to fix this is to apply the same method of exponential averaging to the variance:

Var[i] = w·|s[i] - s_exp-avg| + (1-w)·Var[i-1]

This modification has the effect of smoothing out the variance and giving it an exponential behavior as the variance widens or narrows. It can be translated into one simple line of code as well:

variance = w*Math.abs(s[i] - exp_avg) + (1-w)*variance;

Let's see what its effect is on the gas price data:

When gas prices change suddenly, the variance grows exponentially, and when prices stay constant, the variance shrinks exponentially. It doesn't have as nice of a behavior as the moving average, but if you need to do a quick and dirty calculation with limited resources, the exponential variance may be just what you need.

The variance calculation for an FIR filter is also possible, but it's going to look similar to the moving average calculation because those two types of averaging are so similar. The FIR filter is just a moving average with different scale factors for each sample in the block. To calculate variance for an FIR filtered signal, simply use the filter output as the s_avg value for each sample, and run the same calculation.

With that, we've pretty much covered the basics of signal variance. The variance is useful when you need to know how volatile or noisy a signal is without needing to know its exact value at every point in time. This type of measurement is especially useful when you're evaluating the quality of a signal or if it could be too noisy to be of any use. Next week we'll look at a more concrete application of averaging by solving a common DSP problem, edge detection.

Other DSP posts:
Basic Signals
Transforms
Sampling Theory
Averaging
Step Response of Averaging
Signal Variance
Edge Detection
Signal Envelopes
Frequency Measurement
The Discrete Fourier Transform
The DFT in Use
Spectral Peak Detection
FIR Filters
Frequency Detection
DC and Impulsive Noise Removal

Lucid Mesh

Search This Blog