exponential filtering algorithm
Here I'll briefly describe the algorithm for filtering data with an exponential convolution.
First, I'll define the following:
So using these definitions, I want to convert from the unsmoothed series yn to the smoothed series zn.
First, if an initial point is encountered, we start things rolling with the following:
z0 = y0
Then assuming uniformly spaced data, a naive approach is the following:
zn = zn‒1 exp(‒|Δt| / τ) + yn [1 ‒ exp(‒|Δt| / τ)].
The above formula can be extended to data where the time points aren't uniform simply enough:
zn = zn‒1 exp(‒|Δtn| / τ) + yn [1 ‒ exp(‒|Δtn| / τ)].
This works fairly well (but not always). Convolution, which is what we're doing here, is a continuous operation, where variables are defined for all values of time. However, the data we have are discrete: the data are described only at particular values of time. In general, one first converts the discrete data stream to a continuous function, then applies the convolution operation (smoothing function) to the continuous data, then converts back to a discrete time-series.
In this case, the above formula assumes the continuous data series is:
Depending on the nature of the samples, this assumption may or may not be a good one. Suppose, for example, an Edge computer is reporting speed. It counts the number of wheel rotations in a second (interpolating for fractional rotations, perhaps) and divides by the time since the last data point, multiplying by the rolling distance per rotation. Then speed reported is actually the average since the last reported number, not the speed at a particular point in time. On the other hand, suppose it reads a temperature or altitude exactly at the time point it is reporting. Then this isn't an average over the last interval, but rather an estimation of the point now. In this case, the above assumptions may not be the best.
Suppose instead I assume the following about the continuous function:
This seems more reasonable for an intrinsically continuous function like altitude.
With this assumption, exponential convolution gets a bit tricker, but still well within the domain of high school calculus. Here's the result I got:
z1 = y1
zn = zn‒1 exp ‒|Δun| + yn (1 - exp ‒|Δun|) +
(Δyn / Δun) [ (Δun + 1) exp ‒|Δun| - 1 ], n > 1
where I've for the first time used the "u" variable as a substitute for t / τ. A quick check: if Δyn = 0 it gives the same result as the previous version, which is of course what's expected, because in each case the assumptions lead to the function y being constant over the interval from tn-1 to tn.
I like this one a lot better, since for a given point yn and tn, I'm no longer assuming the data is going in the forward-time direction. With the old assumptions, I assumed the point yn was an average over the real-time value over the preceding interval. If I were to flip the time axis, this becomes an average over the next time interval. With this newer, linear interpolation assumption, I can flip the time axis without any issue.
A quick comment on that: there's two types of filters: causal and noncausal. Causal filters need to evaluate a smoothed value at time tn using only data from times t < tn. For example, if I'm writing firmware for a Garmin Edge computer, and I want it to display power, but want the displayed value to jump around less than the power meter reports, I can write a causal filter to smooth the data. I don't have the luxury of looking at future data. On the other hand, if I'm programming Golden Cheetah to display power data from a ride after that ride is over, it would be foolish to restrict myself to causal filters, since I have access to all of the data from the ride at once. In this case, my goal is to estimate power after a ride is finished, so there's no reason to restrict myself to a causal filter. Simple exponential convolution filters are excellent causal filters, but relatively poor when the causality constraint is absent. So I'll go beyond simple exponential filtering when I continue on this subject.
First, I'll define the following:
- n: an index for the point in the series. The first point is n = 1, the next n = 2, the next n = 3, etc.
- yn: the series of unfiltered points data values, for example altitudes.
- tn: the series of unfiltered points time values.
- Δt: the time separation of the data for uniformly spaced time points.
- Δtn: when time is not necessarily uniformly spaced, Δtn ‒Δtn‒1.
- τ: the smoothing time constant.
- un: normalized time values, tn / τ.
- Δu: Δt / τ (for uniformly spaced time points).
- Δun: Δtn / τ.
- zn: the series of smoothed time points.
So using these definitions, I want to convert from the unsmoothed series yn to the smoothed series zn.
First, if an initial point is encountered, we start things rolling with the following:
z0 = y0
Then assuming uniformly spaced data, a naive approach is the following:
zn = zn‒1 exp(‒|Δt| / τ) + yn [1 ‒ exp(‒|Δt| / τ)].
The above formula can be extended to data where the time points aren't uniform simply enough:
zn = zn‒1 exp(‒|Δtn| / τ) + yn [1 ‒ exp(‒|Δtn| / τ)].
This works fairly well (but not always). Convolution, which is what we're doing here, is a continuous operation, where variables are defined for all values of time. However, the data we have are discrete: the data are described only at particular values of time. In general, one first converts the discrete data stream to a continuous function, then applies the convolution operation (smoothing function) to the continuous data, then converts back to a discrete time-series.
In this case, the above formula assumes the continuous data series is:
- y = y1 for all time ≤ t1.
- y jumps immediately to y2 as t passes t1.
- y = y2 for t1 < t ≤ t2.
- y jumps immediately to y2 as t passes t1.
- y = y3 for t2 < t ≤ t3.
- etc.
Depending on the nature of the samples, this assumption may or may not be a good one. Suppose, for example, an Edge computer is reporting speed. It counts the number of wheel rotations in a second (interpolating for fractional rotations, perhaps) and divides by the time since the last data point, multiplying by the rolling distance per rotation. Then speed reported is actually the average since the last reported number, not the speed at a particular point in time. On the other hand, suppose it reads a temperature or altitude exactly at the time point it is reporting. Then this isn't an average over the last interval, but rather an estimation of the point now. In this case, the above assumptions may not be the best.
Suppose instead I assume the following about the continuous function:
- y = y1 for all time ≤ t1.
- y varies linearly from y1 to y2 as t goes from t1 to t2.
- y varies linearly from y2 to y3 as t goes from t2 to t3.
- etc.
This seems more reasonable for an intrinsically continuous function like altitude.
With this assumption, exponential convolution gets a bit tricker, but still well within the domain of high school calculus. Here's the result I got:
z1 = y1
zn = zn‒1 exp ‒|Δun| + yn (1 - exp ‒|Δun|) +
(Δyn / Δun) [ (Δun + 1) exp ‒|Δun| - 1 ], n > 1
where I've for the first time used the "u" variable as a substitute for t / τ. A quick check: if Δyn = 0 it gives the same result as the previous version, which is of course what's expected, because in each case the assumptions lead to the function y being constant over the interval from tn-1 to tn.
I like this one a lot better, since for a given point yn and tn, I'm no longer assuming the data is going in the forward-time direction. With the old assumptions, I assumed the point yn was an average over the real-time value over the preceding interval. If I were to flip the time axis, this becomes an average over the next time interval. With this newer, linear interpolation assumption, I can flip the time axis without any issue.
A quick comment on that: there's two types of filters: causal and noncausal. Causal filters need to evaluate a smoothed value at time tn using only data from times t < tn. For example, if I'm writing firmware for a Garmin Edge computer, and I want it to display power, but want the displayed value to jump around less than the power meter reports, I can write a causal filter to smooth the data. I don't have the luxury of looking at future data. On the other hand, if I'm programming Golden Cheetah to display power data from a ride after that ride is over, it would be foolish to restrict myself to causal filters, since I have access to all of the data from the ride at once. In this case, my goal is to estimate power after a ride is finished, so there's no reason to restrict myself to a causal filter. Simple exponential convolution filters are excellent causal filters, but relatively poor when the causality constraint is absent. So I'll go beyond simple exponential filtering when I continue on this subject.
Comments