Sunday, January 30, 2011

exponential filtering algorithm

Here I'll briefly describe the algorithm for filtering data with an exponential convolution.

First, I'll define the following:
  1. n: an index for the point in the series. The first point is n = 1, the next n = 2, the next n = 3, etc.
  2. yn: the series of unfiltered points data values, for example altitudes.
  3. tn: the series of unfiltered points time values.
  4. Δt: the time separation of the data for uniformly spaced time points.
  5. Δtn: when time is not necessarily uniformly spaced, Δtn ‒Δtn‒1.
  6. τ: the smoothing time constant.
  7. un: normalized time values, tn / τ.
  8. Δu: Δt / τ (for uniformly spaced time points).
  9. Δun: Δtn / τ.
  10. zn: the series of smoothed time points.

So using these definitions, I want to convert from the unsmoothed series yn to the smoothed series zn.

First, if an initial point is encountered, we start things rolling with the following:

z0 = y0

Then assuming uniformly spaced data, a naive approach is the following:

zn = zn‒1 exp(‒|Δt| / τ) + yn [1 ‒ exp(‒|Δt| / τ)].

The above formula can be extended to data where the time points aren't uniform simply enough:

zn = zn‒1 exp(‒|Δtn| / τ) + yn [1 ‒ exp(‒|Δtn| / τ)].

This works fairly well (but not always). Convolution, which is what we're doing here, is a continuous operation, where variables are defined for all values of time. However, the data we have are discrete: the data are described only at particular values of time. In general, one first converts the discrete data stream to a continuous function, then applies the convolution operation (smoothing function) to the continuous data, then converts back to a discrete time-series.

In this case, the above formula assumes the continuous data series is:
  1. y = y1 for all time ≤ t1.
  2. y jumps immediately to y2 as t passes t1.
  3. y = y2 for t1 < t ≤ t2.
  4. y jumps immediately to y2 as t passes t1.
  5. y = y3 for t2 < t ≤ t3.
  6. etc.

Depending on the nature of the samples, this assumption may or may not be a good one. Suppose, for example, an Edge computer is reporting speed. It counts the number of wheel rotations in a second (interpolating for fractional rotations, perhaps) and divides by the time since the last data point, multiplying by the rolling distance per rotation. Then speed reported is actually the average since the last reported number, not the speed at a particular point in time. On the other hand, suppose it reads a temperature or altitude exactly at the time point it is reporting. Then this isn't an average over the last interval, but rather an estimation of the point now. In this case, the above assumptions may not be the best.

Suppose instead I assume the following about the continuous function:
  1. y = y1 for all time ≤ t1.
  2. y varies linearly from y1 to y2 as t goes from t1 to t2.
  3. y varies linearly from y2 to y3 as t goes from t2 to t3.
  4. etc.

This seems more reasonable for an intrinsically continuous function like altitude.

With this assumption, exponential convolution gets a bit tricker, but still well within the domain of high school calculus. Here's the result I got:

z1 = y1

zn = zn‒1 exp ‒|Δun| + yn (1 - exp ‒|Δun|) +
(Δyn / Δun) [ (Δun + 1) exp ‒|Δun| - 1 ], n > 1

where I've for the first time used the "u" variable as a substitute for t / τ. A quick check: if Δyn = 0 it gives the same result as the previous version, which is of course what's expected, because in each case the assumptions lead to the function y being constant over the interval from tn-1 to tn.

I like this one a lot better, since for a given point yn and tn, I'm no longer assuming the data is going in the forward-time direction. With the old assumptions, I assumed the point yn was an average over the real-time value over the preceding interval. If I were to flip the time axis, this becomes an average over the next time interval. With this newer, linear interpolation assumption, I can flip the time axis without any issue.

A quick comment on that: there's two types of filters: causal and noncausal. Causal filters need to evaluate a smoothed value at time tn using only data from times t < tn. For example, if I'm writing firmware for a Garmin Edge computer, and I want it to display power, but want the displayed value to jump around less than the power meter reports, I can write a causal filter to smooth the data. I don't have the luxury of looking at future data. On the other hand, if I'm programming Golden Cheetah to display power data from a ride after that ride is over, it would be foolish to restrict myself to causal filters, since I have access to all of the data from the ride at once. In this case, my goal is to estimate power after a ride is finished, so there's no reason to restrict myself to a causal filter. Simple exponential convolution filters are excellent causal filters, but relatively poor when the causality constraint is absent. So I'll go beyond simple exponential filtering when I continue on this subject.

Saturday, January 29, 2011

smoothing Garmin altitude profiles for power estimation

Recall that I was contemplating using power in lieu of speed for determining whether a Garmin FIT file contains motorized segments. The speed criterion was fairly simple: if the data indicate the user got 500 meters ahead of a 14 m/sec pace over any segment, that segment was judged to be a portion of motorized travel. The entire region between load/unload opportunities was thus tagged and pruned. While cyclists may be able to sustain a high speed for short periods, by requiring they get a certain distance ahead of a threshold pace requires they exceed that pace for an extended time, or vastly exceed it for a shorter time.

critical power example
Example of critical power model applied to more realistic curve

In the power-time domain,the critical power model accounts for that ability to sustain high intensity for short times, but only a lower intensity for longer times. The critical power model is very similar to the criteria used for speed: it says a rider, given sufficient time, can do a given amount of work (the anaerobic work capacity, or AWC) above a critical rate of work production (the critical power, or CP). Taking a critical power typical of a world class cyclist in the golden era of EPO use, CP = 6.5 W/kg (normalized to body mass) makes a good threshold for suspicion of motorized assistance. A fairly typical AWC is 90 seconds multiplied by the critical power, so an AWC of near 0.6 kJ/kg would be applicable. Given these choices, if a rider can produce morethan 0.6 kJ/kg above 6.5 W/kg × Δt then the segment is judged to be motorized.

So we need a way to estimate power. We could use power meter data, but of course when a rider is in a car or train the power meter isn't registering. So instead we need to estimate power from the rider's speed and the altitude data. To provide a margin of error, we can make reasonably optimistic (low) assumptions about wind and rolling resistance. We then do as we did with speed: find data points where the rider crosses the 6.5 W/kg threshold, then look for segments connecting low-high and high-low transitions which are more than AWC above CP multiplied by the duration.

However, here's the rub: to estimate power at any given time we need to be able to estimate road grade. The iBike,for example, has accelerometers configured to directly measure grade,but with the Garmin Edge 500 you need to take differences of altitude and divide by differences in distance.

Unfortunately the Garmin is anything but smooth in its reporting of altitude versus time (and by implication versus distance). Here's some data measured on a stationary trainer... the altitude remains constant, then increases, then is constant, then increases. The jumps in altitude will tend to produce rapid spikes in the grade. So we need to smooth the data.

Garmin trainer data
Garmin Edge 500 data from riding a trainer

So what sort of smoothing do we need? Assume the Garmin can jump ±1 meter at any time. If this happens over a 1-second interval, that's a rate of climbing of 1 meter per second, or 3600 meters per hour, which has a minimim energy cost of 9.8 W/kg (assuming the acceleration of gravity is 9.8 m/sec²). But the rider is likely lifting at least 1.15 kg/kg of body mass, and drivetrain efficiency is no better than 98%, so the actual cost of this climbing is closer to 11.5 W/kg. So every time the altitude jumped upward, we'd assume the rider blew past that 6.5 W/kg threshold. If I want to reduce the instantaneous error from this jump in altitude to be no more than 0.1 W/kg, a smoothing time constant of 120 seconds should do the trick, assuming an exponential convolution filter for computational efficiency.

So why is an exponential filter so computationally efficient? With an exponential filter, I calculate a running average. But instead of summing all of the samples which contribute to the average, I simply attenuate my sample I had for the previous point, and add in the component for the newest sample. I'm only processing one point per point. Consider instead the case with a convolution with another function, for example cosine-squared. With cosine-squared I need to sum the contribution from multiple points each time I move to an additional point. There's no way (at least no way I know) to reuse the sum calculated for the previous point.

For example, if I have points uniformly spaced in time with spacing Δt, and I have a smoothing time constant τ, and I want to go from an original time series yn to a smoothed time series zn as follows:

zn = zn‒1 exp(‒|Δt| / τ) + yn [1 ‒ exp(‒|Δt| / τ)]

It can't get much simpler than that. I'll revisit this formula later, as it has some weaknesses, in particular when data aren't uniformly spaced.

There's also the simple running average. Running averages can also be done efficiently, but they aren't so smooth. A point has no effect if it's outside the box, full influence if it's in. With an exponential average, a point has a sudden influence when it is first reached, but beyond that its influence slowly fades. I'll stick with exponential filters here.

More discussion next time.

Wednesday, January 26, 2011

Bike Nüt closed

I heard today that Bike Nüt closed a few weeks ago.

Sad news. I always enjoyed stopping there. It was a good destination for a long run, or a good place just to stop and say hello and see what new bike jewelery Huseyin, the owner, had inside his glass case.

Plenty of shops sell high-end stuff to build up a bike to the highest standards of the boutique fashion weenies. The number of shops selling custom Seven, Parlee, or Serotta cycles in or near San Francisco is nothing short of staggering. Bike Nüt, however, didn't sell custom frames. Huseyin's philosophy was that frames are commodity: that the Taiwanese factories have produced a high, uniform standard, so save your dollars there. Instead you want to spend your money on good wheels and quality parts.

On the frame area, until a year ago he was buying frames from Giant, then sending them out to have the paint stripped and clearcoat applied to reduce them to the raw, black bare carbon. In a way this seems silly: Giant was spending time and resources painting the frames, then Bike Nüt was spending time and resources to remove it. But the result, in my view, was a huge improvement.

The Bike Nüt Umlaut, produced by Martec

Later, though, he arranged with Martec in Taiwan to send him unpainted frames directly. These were under a kg without compromising reliability, BB30 bottom brackets, tapered head tubes, thin vertically compliant seat stays. Don't confuse them with shady eBay price-too-good-to-be-true deals which were salvaged from the dumpster after failing quality control checks: they were legitimate orders from the factory similar to those Leopard Cycles or other small-scale operators might make. They had all the right features which are in fashion now for "lateral stiffness, vertical compliance". After applying his "Umlaut" Bike Nüt stickers, he could sell these for not much more than $1k, allowing the customer to pick and choose parts, rather than relying on the set component groups provided by the big companies, who in many cases do little more than he did: buy a frame from China, apply graphics, and bolt on a few parts.

On the parts side, Bike Nüt sold a lot of Shimano, but where they really set themselves apart was on weight-weenie bits like EE, Camino, Tune, and the odd piece of AX Lightness. Nobody else this side of FairwheelBikes in Arizona dipped into this stuff, the sort of parts which set apart the sect of the Weight Weenie from those who simply wanted to drop their credit cards on whatever the pros ride. I'd been eyeing a very nice set of red-anodized EE brakes there. Sure, I could get them on-line, but isn't it better to deal with a local shop where I can see them before I commit? I waited a bit too long, it turns out.

While he certainly was up to the task of providing super-top-end parts to those with unlimited budgets, Huseyin seemed to really enter his element when someone not sure what they wanted would come in with a budget target. With the game thus defined, he would then put together the best bike he could for that person with that budget, focusing the most on the parts which were more important to function, cutting back on parts where the primary advantage was only weight. Honestly I think he got way more satisfaction out of the customers with limited budgets than those ready to drop whatever it takes to get "the best" of everything.

Sigh. There's other excellent shops in the City. Roaring Mouse, who sponsors my cycling club, is certainly at the top level, with customer support which is universally acknowleged to be outstanding. But I'll miss Bike Nüt for sure. Huseyin took a truly creative approach to running his shop, and nobody else has a jewelry case which comes close to the one which was there. I was impressed enough I even asked Cara to get me one of their kits for a birthday present a few years ago. Even though it's from a jersey manufacturer which doesn't fit me well, I like to wear it, and will still do so. I wonder if I'll ever see another shop like it in San Francisco.

Tuesday, January 18, 2011

Garmin FIT activity splitter to eliminate large time gaps

I've gotten useful code on my third project for processing FIT files using Kiyokazu Suto's Garmin::FIT package for Perl. First, I described fit_to_cols, which extracted selected data from FIT files and formatted it in a space-delimited file. Next I described fit_filter_motor_segments which attempted to identify segments where the Garmin was accidently left running in a fast car or train. That project's still being refined, as I described last post.

Here is perhaps the most useful of the three, fit_split_on_gaps, which finds gaps of some specified minimum duration (default 8 hours, or otherwise specified with the -tgap option) and splits the FIT data into multiple sub-files at any gaps found which meet or exceed this threshold.

This and the other codes can be found here, on Google Docs.

It's fairly common in my experience to forget to "reset" my Garmin between activities, despite having set it to warn me at the start of a ride if I have not. GoldenCheetah, for example, contains some nice code to deal with this problem, allowing rides to be split at selected gaps. However, since I've not been using a power meter since last summer, I've been using Strava a lot more than GoldenCheetah. And Strava lacks and ride-splitting capability.

Enter fit_split_on_gaps. Its default behavior is to take each FIT file and, if it finds any gaps between meaningful data (ignoring the "power down" and "power up" records which sometime pollute idle periods), it partitions the original file into sub-files, naming each according to the Garmin convention of "yyyy-mm-dd-hh-mm-ss" with a "" suffix (although the suffix can be changed) in the same directory. Alternately the files can be written to a separate directory with or without (default) a suffix.

There's a few options. The most important is "-h" which prints help. I won't try and describe the other options, as the code may still be modified and these may change. However, so far there's two primary modes:
  1. To take one or more input file, and produce output files (by default only if gaps are found) of individual segments.
  2. To write one file, or standard input, to standard output, retaining only one segment, and discarding the others

The project took longer than I expected, and honestly I wonder how robust it is. Presently I assume every record without a time stamp is a "definition" and should be included in each split file. So these are always written. Records with time stamps are included in a given file only if the record falls between the gaps which delimit the segment. It's assumed the records are in chronological order, so if gaps are found, everything in between is written to the same file (assuming it's not discarded as a "fragment").

Thanks again to Kiyokazu Suto for doing such nice work on Garmin::FIT.

Saturday, January 15, 2011

adding altitude to motorized segment identification?

A friend of mine, Brian, took his Garmin with him in his car, drove on local roads, did a run, got back in his car, and drove back home. He was careful never to drive much over 30 mph the whole time. He then loaded the data into my motorized FIT filter and..... nothing. It didn't identify any of the segments as motorized.

The reason for this is the criterion I used, that the units must record progress which is at least 500 meters ahead of a 14 kph pace for some time interval, is designed to avoid tagging segments where a rider was descending a relatively fast mountain descent. Descending on a bike is faster than driving on local streets, so it would be relatively hopeless to expect, using crude speed measures only, that I'd be able to pick up local driving as suspicious.

I was discussing this with Bill, another friend, when the idea came up of throwing altitude into the mix. Now this is hardly a new idea: it'd already been proposed on Strava forums that calculated power could be used to pick up suspicious segments. An issue is, short term, that calculated power uses road grade, and since the Garmin Edge 500 doesn't have any accelerometers for grade measurement, it needs to differentiate the altitude signal, and since the altitude signal is temperature-compensated, and since the temperature measurement reacts to changing conditions only very slowly, if the bike is taken from a warm indoor environment into a cold outdoor environment the altitude can be very confused for awhile. Here's an example from DC Rainmaker:

from DC RainMaker Edge 500 review

In the plot you can see it took around 30 minutes for the Edge to accurately record the outdoor temperature. The result is it takes awhile for the altitude to settle out. Here's DC RainMaker's plot of altitude over a short loop:

from DC RainMaker Edge 500 review

Maybe things have improved with subsequent firmware upgrades, but these data show a 100 foot swing in altitude over the course of a ride lasting less than an hour. So in any sort of considering of altitude in the mix, it's important to not place too much relevance on altitude changes on this order.

The good news is while the plot looks pretty bad, cyclists can typically climb at in excess of 3000 vertical feet in an hour, and descent at around twice this, so over enough time that little fluctuations in altitude aren't important, using the altitude signal to gain added insight into the pragmatic speed limits of cycling is a good idea.

So assuming a valid altitude signal is available, what's a good modified criterion to use?

One option is to forget about speed and just focus on altitude. 1800 meters per hour (0.5 meters / second) is about as fast as the best riders in the world can climb, so I could say if data shows progress in excess of 30 vertical meters (about 100 feet) ahead of an 0.5 meter / second pace then the transport is probably motorized. I picked the 30 meters based as a number pulled from that DC RainMaker plot, but it also accounts for the ability of riders to climb faster for shorter duration.

But a better approach is to combine the climbing and speed. Here I can go to that idea of a power threshold. Work done is proportional to "effective" vertical meters gained added to a proportionality constant times the square of speed multiplied by distance traveled (force is proportional to speed squared and work is proportional to force times distance). In this calculation "effective" vertical meters equal actual vertical meters plus a component equal to a rolling resistance coefficient times distance traveled. For example, 0.3% is a low-end estimate for a rolling resistance coefficient.

So referring back to my distance/speed approach, the approach which might be taken here is to say the best riders in the world can ride at up to 6.5 W/kg. So if a rider can produce work in excess of, for example, 600 J/kg above an average of 6.5 W/kg then I tag the segment. This would require 26.5 W/kg for 30 seconds, 16.5 W/kg for a minute, 8.5 W/kg for 5 minutes, up to 6.67 W/kg for an hour. This is a high threshold, but it's important to consider the rider may be in a pack or riding with a strong tailwind, and we really want to avoid "false positives" of tagging legitimate riding as motorized.

The 600 J, by the way, I got from the critical power model assuming a time constant AWC/CP of around 90 seconds.

For the power-force relationship, a crude approximation can be made. Consider, for example, a rider weighing 70 kg with CdA = 0.3 m² with a density of air of 1 kg/m³. Then traveling @ 1 m/sec takes 0.15 W, or traveling at 14 m/sec takes 411 W, close to 5.9 W/kg from wind resistance only. A coefficient of rolling resistance of 0.3% adds around 0.4 W/kg to this total, or 6.3 W/kg total. Add in 2% of power for drive train losses brings it to 6.4 W/kg, approximately.

With these numbers, inspired by a highly competitive time trial position, the 14 m/sec threshold on the flats looks like it will be hard to budge. So if you really want to avoid false positives from top-level racers, you need to set the bar high. Where this sort of approach will really help is where there's substantial altitude gain. There the car is much more likely to be flagged.

So how's the 500 meters compare with my 600J? With the old criterion, consider that 14 m/sec over 300 seconds would take you 4200 meters, so you'd need to go 500 more than this, or 19.0% faster. With the power criterion, the the threshold becomes 14.07 m/sec on flat ground, and for 300 seconds you'd need to go 2 W/kg more, which results in a speed faster by (8.5/6.5)1/3 = 8.8% faster. So for times on order 5 minutes, on the flat, the new criterion is still easier to trigger.

I'll give it a try and see what I get.

Thursday, January 13, 2011

Garmin FIT motorized segment filter in Perl

Finally, squeezing in work in my train commute between San Francisco and Mountain View (which unfortunately hasn't been a bike commute as much as I'd like due to the pressures of two big projects at work, but digression opportunities ate limited in the middle of a sentence), I've managed to finish a working draft of my "motorized segment filter" in Perl.

Like the fit_to_cols code I described recently, this code uses Kiyokazu Suto's Garmin::FIT package for Perl. Unlike fit_to_cols, this one needs to be able to write as well as read FIT data. The Garmin:FIT perl module allows this, but to figure out how you really need to dig into the provided example, fitsed. Fitsed is uncommented and does a lot more than just read and write FIT files: it also has a parser for selecting and/or changing fields in records. All good stuff, but a simple equivalent of hello.c, an example which minimally demonstrates writing FIT data, would have been useful. But I worked it all out, well most of it anyway, and my code seems to work.

So without further discussion, you can get it here (from Google Docs).

When downloading the code and placing it in your execution path, possibly typing "rehash" if you use a csh variant shell, the first thing to do is to type:

fit_filter_motor_segments -h

This prints help information.

Some examples, where "% " is the command prompt in these examples and is not typed:
  1. Check all files in the current directory for motorized segments. If any are found, then create a new FIT file for that file with ".fit" replaced with "" :
    % fit_filter_motor_segments *.fit
  2. The same as the previous example, but create new FIT files with the same name but in directory /tmp with segments identified as motorized, if any, stripped:
    % fit_filter_motor_segments -a -dir /tmp *.fit
  3. From within Perl on a Linux or Unix based system, read an existing FIT file $ffit, filtering out the motorized segments (this is not a shell command, but rather Perl code):
    open FP, "fit_filter_motor_segments < $ffit |" or
         die("error opening FIT file $ffit: $!\n");
Next on my plate is a code to split FIT files at breaks which exceed a certain threshold, such as 8 hours. This is to correct the error of neglecting to reset a Garmin computer at the beginning of a ride. This turns out to be a bit tricker than I originally thought, since if a computer is left on the bike with the Garmin on, it has a tendency to generate "power up" and "power down" records at odd times. So how best to deal with these? Running Kiyokazu's fitdump is instructive. But I'm getting ahead of myself...

Friday, January 7, 2011

a Perl Garmin-FIT to space-delimited table converter

I've been working on a code to filter motorized segments from FIT files, using Kiyokazu Suto's Garmin::FIT package for Perl. However, to help with that work, I needed a way to quickly plot the data in a FIT file. So I wrote a script, fit_to_csv, to create a readable table from FIT data.

The code produces a space-delimited file with each column describing a field either contained in the FIT file or derived from data in one or more fields of the FIT file. I have a series of scripts for handling such files and generating plots from them, or they can be trivially loaded into a spreadsheet like oocalc or even Excel, or almost any other plotting package, using the "CSV" format. When importing the CSV, make sure spaces and not commas are selected as the delimiter.

The code's available here.

On Linux, simply save the file, make sure it has executable permission, then run it in the standard Linux fashion. On Windows, well, you're on your own. I avoid Windows whenever possible.

An example, where "% " is the command prompt, and where it is assumed "fit_to_cols" is in the execution path:

% fit_to_cols ~/Library/GoldenCheetah/djconnel/

This plots data from a FIT file stored in my GoldenCheetah data directory.

% fit_to_cols /media/GARMIN/Garmin/Activities/

This plots data directly off my Edge 500, which is plugged into the USB port.

% fit_to_cols -fn /media/GARMIN/Garmin/Activities/2011-01-*.fit

This plots all of my files from January 2011, identifying each file with a column "fn". Had I not used the "-fn" option, files would have been identified with a column "file" which contained the file name.

% fit_to_cols -h

This prints help information.

The major weakness of the code is it assumed fixed data in the FIT file. FIT is a lot more sophisticated than this: the data can vary from device-to-device. I've hard-coded my script with the data I'm most interested in from my Edge 500. Honestly I'm still trying to figure it all out.

After I'd gotten a functional version of this code together I found someone else had done something similar: I think my code is somewhat more sophisticated than that one, since the latter assumes a particular order for data fields to be stored (I use the hash tables to look up the order) and it doesn't handle undefined values well. My code also allows for multiple files to be listed at once and has several derived parameters like dx, dy, and r.

Thanks to Kiyokazu Suto for doing such nice work on Garmin::FIT.

Thursday, January 6, 2011

Thomas Novikoff

Thomas during week 3 of the 2010 Low-Key Hillclimbs (Judy Colwell photo)

I was really shattered, yet not surprised, to read of Thomas Novikoff's death. Dead from cancer at only 29.

I'd last seen Thomas on Mount Hamilton at the final Low-Key Hillclimb. He was looking thin and pale. He told me, in a forced-cheery voice which seemed to hide a deep depression, even fear, that his cancer was back. It was in his liver, he said. The Whipple procedure he'd received the preceding November hadn't worked as hoped. He'd wanted to climb, but instead was going to volunteer. The stomach pain was too great, he told me. Every time he rode hard his breathing caused too much pain.

I was shocked. What do I say to that? Thomas had been riding the Low-Keys since 2007, first with Team Cambio, and later Webcor/Alto Velo. I was to later learn he'd finished third in the Race to the Sun, up mighty Haleakela, just 15 months before, and finished a very close second in 2003. Yet in the few climbs he'd done in this year's series he'd been only an average climber. He had been badly anemic from his radiation therapy. Yet he'd thought he was past that, when the stomach pain started. And now his cancer had spread.

I asked what sort of treatment he was going to receive. Recurrent cancer is notoriusly difficult: it's already survived the gauntlet of chemo & radiation. More of the same wasn't going to help.

He said he wasn't in shape for more therapy until his condition improved. An unspoken message was there, filling my silence. It didn't look good for poor Thomas.

Yet he drove to the top, carrying people's jackets so they would be able to descend from the summit in the near-freezing air. It was Thomas' chance to give back. Maybe, I was thinking, his last chance.

Two weeks later was the Low-Key awards ceremony. The last few years we've awarded a "Spirit Award" to the rider who demonstrates the "Low-Key" spirit. Low-Key is about friendly competition, of giving what you have no matter what you have, and helping others do the same. Sure, we had some great candidates this year, but I didn't think at all about who would get the award this time. Thomas was the obvious choice.

Thomas wasn't there; he was at UCSF Medical Center. But we passed around some Lance Armstrong Foundation dedication cards for people to write a message to Thomas. Cara collected them and later delivered them to UCSF for delivery to Thomas.

He reponded by email. His prognosis, he wrote, had gotten worse. Worse?

So I wasn't surprised to read this blog post. Thomas had his own blog, ecshewing the shallowness of Facebook which has largely replaced the blogosphere as an outlet for people's expression. I read the posts from his final four months, and looked at his Strava record from his last rides. Really hard stuff. We leave a wake of electronic data behind us as we go through our lives. Then, one day, it freezes in time.

Anyway, I'll miss Thomas. But he certainly won't be the last. 2011 is still very young. Who's next?

Monday, January 3, 2011

tagging motorized segments in FIT files: results

Last time I described my proposed algorithm for detecting ride segments. The only detail I was missing was how I find where to find the transitions between motorized and nonmotorized transport. I'd set a threshold speed of 14 meters per second, but obviously one doesn't ride a bike up to that speed and then hop onto a passing vehicle. So instead, starting from where the speed crosses the 14 m/sec threshold, I search for the first interval of at least 10 seconds during which there are either no points (GPS off or out of signal range, for example) or else the speed fails to exceed a 2 m/sec threshold. This seems to work fairly well, although it can be tricky, as it may not take long to get off a train after it's stopped. Car-bike transitions probably tend to be slower. I use the time at the mid-point of these points as the threshold time for motorized versus non-motorized.

I then scanned all stored activities for Oct, Nov, and Dec 2010. I found five which showed clear evidence of motor transport: four train, one car. At least one car ride, descending Hicks and Mt Umunhum Roads after the Low-Key Hillclimb, was not tagged. This is because the descent was at a speed consistent with cycling, so it is right that it wasn't marked.

Here's perhaps the most interesting example, an activity where I rode from work to the train station in Mountain View, got a large hole in my tire approaching the station, got off in Palo Alto to buy a new tire, then rode to Menlo Park to catch a train back to San Francisco. The algorithm worked perfectly. Train rides are particularly easy to pick up, because the train builds up speed then sustains it between stations. There's little stop-and-go.


So not perfect, but a lot better than nothing. Next up I need to finish my code to generate a revised FIT file without the motorized segments.

Sunday, January 2, 2011

finding motorized segments in FIT files: describe algorithm

Well, I've made some progress on the Garmin::FIT module, and so am coding my "motorized segment" filter.

My algorithm isn't tested yet, but here's what I have in mind, slightly simplified:
  1. Find all segments where the speed is no less than some threshold for driving, for example 14 m/sec. This step is for computational efficiency only, to assist with the following step.
  2. Using this list of "fast" segments, find segments in which the Garmin reported progress at least some threshold distance ahead of the threshold for driving. For example, the make a list of all segments where the distance covered was at least 500 meters greater than the distance which would be covered @ 14 mps for the same time period. These segments may include points which are less than the speed threshold, and will exclude some points which exceed the speed threshold, since bicycles are capable of going fast when descending, but are generally unable to go sufficiently faster than 14 mps to get 500 meters ahead of that pace.
  3. Using these "motorized segments" as a start, extend them to points in which progress was sufficiently small to allow for a bike to be loaded onto a bike or train, or to the start or end of the data, or to points where enough data are missing to have allowed for a bike to be loaded or unloaded while the Garmin wasn't recording.

Hopefully this sort of thing will go a long way towards me avoiding the mistake of grabbing Strava segments due to forgetting to shut off my Edge 500 when I'm on the road or rail.