Monday, February 20, 2012

SmoothData : a simple Ruby data smoothing class

Ruby self-teach continues, so I put together a data smoothing class for Ruby. Nothing new here: another in a series of codes I've used for comparing Java to Perl to Ruby. Data smoothing is an important part of anything I do with cycling data. So this is a critical component for me on any sort of project I might want to accomplish. For example, suppose I wanted to write a code to identify and rate climbs in an activity. I know -- Strava already does this, but I've got my own ideas about algorithms. Well, if I were to take altitude data raw, a small glitch yielding a 50% grade between two closely-spaced points might generate a huge spike in the climb rating for just those two points. Instead it would be important to smooth the altitude out, for example with a 50 meter characteristic smoothing distance. 50 meters is a fairly good number on climbs this length and shorter you can generally use momentum to blunt the blow. It also covers up small errors in position and altitude. So here it is.... I also added the file to a Google Docs folder to avoid formatting issues with this Blogger page:
#! /usr/bin/ruby

class SmoothData
  def self.smoothData(x, y, tau = 0)
    ys = Array.new(y.length)

    return ys if (ys.length == 0)

    # if no smoothing is requested, simply copy over numbers
    if (tau == 0)
      ys y.clone
      return ys
    end

    # otherwise run filter in both positive or negative directions
    [-1, 1].each do |d|
      warn "direction = #{d}"

      xold = 0
      yold = 0
      ysold = 0

      # along each direction, we run an exponential convolution
      (0 ... x.length).each do |i|

        # if this is the negative direction, use points starting
        # from the last, otherwise go in forward order.
        # So i is an initial counter but n is the point we're going
        # to process
        n = (d == 1) ? i : x.length - 1 - i


        # for either the first point in the sequence or if there is
        # a gap much larger than the smoothing constant, use the unsmoothed
        # point
        if ((i == 0) || (x[n] - x[n - d]).abs > 100 * tau)
          ys[n] = ys[n].nil? ? y[n] : (ys[n] + y[n]) / 2
          xold  = x[n]
          yold  = y[n]
          ysold = y[n]

        else
          # calculate the proper contribution of the new point with a running
          # average of old points
          # u is a normalized difference between this point and the preceding one
          u = (x[n] - xold).abs / tau

          # apply the exponential decay to z
          z = Math.exp(-u)

          # dy is the difference between the present point and the previous point
          dy = y[n] - yold

          # the following works with non-uniform point spacing
          ysdir = y[n].zero? ? ysold : ysold * z + y[n] * (1 - z) + (dy / u) * ((u + 1) * z - 1)
          
          # update ys
          ys[n] = (d == -1) ? ysdir : (ys[n] + ysdir) / 2
          

          # update "previous" points before going to next point
          xold = x[n]
          yold = y[n]
          ysold = ysdir
        end
      end
    end
    
    return ys
  end
end

2 comments:

andre said...

Hi,

to smooth out glitches you should check out Median filters/smoothers. They're perfect for that and should be easy to imlement. Keep up the work!

djconnel said...

Thanks, Andre! Key issue here is the x-spacing of points may be non-uniform. I test the filter by using a cosine function with normally distributed x-spacings and a normal random perturbation applied to y-values. Then I reflect it to make it purely symmetric. I run this through the filter and (1) the result should preserve the symmetry of the original data, (2) there need to be no artifacts associated with the non-uniform spacing in x (the first and second derivatives of the result, not just the smoothed result, should look smooth). This test code is also in that Google folder.

A challenge with smoothing is how to treat the end-points. I apply a series of two exponential convolutions, one forward, one backward, so the end points get smoothed half as much as the middle, since exponential smoothing in a given direction can't do anything to the first point encountered (unless you assume some value like zero for all points prior, but this fails a translational invariance test).