Tuesday, October 4, 2011

Improving on Strava's Segment Timing for Low-Key Hillclimb?

Strava, as I've written many times, has been a real paradigm shift in cycling in the San Francisco Bay area. Riders compete for rankings on "segments", typically climbs, of various degrees of obscurity. As we've now entered the Low-Key Hillclimb season, I've gotten a striking metric of how much of a market share Strava is grabbing here. More than a third of the 121 riders in week one's Montebello Road climb uploaded their data to Strava.

This brings up the possibility of using Strava for timing. Well, as I've described before, Strava's timing is far less reliable than a good-old-fashioned stopwatch. However, I have been batting around the idea of doing something for the 2012 series.

The issue is some climbs require permits for events: in particular state parks and open space. "Event" is always clearly defined, and it's always a dance to have that definition include things like the Low-Key Hillclimbs but not include groups of friends riding a hill together. Permits aren't an issue if they are willingly issued. The problem is the state parks have a very low threshold for inconveniencing auto traffic, their primary constituency, while the open space just flat out denies permits to any event where riders are timed. Yet there's no restriction against informal groups of riders timing themselves. That's fine. Strava, for example, which reports times of riders who climb a hill is not an "event".

For the full history of the Low-Keys, it's been suggested we allow riders to self-time on these climbs and report their results. But self-timing is a burden. You need to have a watch, remember to start it at the precise time, then remember to stop it precisely at the finish. If riders are late to stop the timer, there's a tendency to overestimate how much time passed since the finish, and underreport actual time. We want Low-Key to be an accurate archive of times. So relying fully on self-reported times is something we've avoided.

But now GPS is becoming virtually ubiquitous. Using GPS to time riders is now becoming, for the first time, a feasible option for climbs for which organized timing is prohibited. Sure, we'd need to let riders without GPS self-time, but most riders can just mindlessly attack the hill and upload their data.

The problem is the timing: Strava's timing algorithm is far from optimal. The reason is Strava uses course points. There's a start point for a course, a finish point, and points along the way. How many actual races have a start point and finish point? None I know of. They all use lines for the start and finish.

Now a point of clarification: in geometry, lines are infinite, while "line segments" are generally finite. So when I say "line" here I actually mean "line segment". I avoid the term "segment" because with Strava, "segment" means something different: it's used to describe "courses". So I'll stick with "line".

So to improve Strava's timing, it's been proposed on the Strava support forums that Strava adapt a "start line" and "finish line" model for timing. A start line is defined by a center point and a vector to one of the end-points of the line. A finish point is similarly defined. The choice of the end-point for each implicitly defines the direction through which the line should be crossed (defined end point must be to the right). So timing is then the minimum time elapsed between an interpolated crossing of the start line in the correct direction and an interpolated crossing of the finish line in the correct direction. For example, suppose I have the following start line and finish line crossings, in the appropriate directions:

SSSFSFFFSSFSS

You can see there are three timing events here, three instances of a start crossing immediately followed by a finish crossing.

This may be all I need to define a course. Some routes, like Old La Honda Road, have no reasonable short-cuts and therefore if you cross the start line at the bridge and then the finish line at the stop sign, the best times will all be from riders who stayed on Old La Honda Road. But many potential climbs have potential short-cuts.

Here's where an option for a "gates" comes in. Gates can be defined by additional lines. So the ride must cross, in order, the start line, the first gate, the second gate, etc, until finally crossing the finish line. Each of these gate crossings would need to be in the appropriate direction.

It's possible a rider will cross a gate more than once, or even backtrack and repeat multiple gate crossings, but as long as the rider crosses the start line, the gates in order, then the finish the course can be considered to have been completed.

So if I am going to define a segment for a complex route like Lomas Cantadas via Alta Vista out of Orinda in the Berkeley Hills of California, I'd define multple gates to make sure the rider didn't take El Toyonal to bypass Alta Vista, for example. But for the purpose of writing timing code for Low-Keys, I'd use only enough gates to make sure no short-cuts were taken.

Fortunately Strava provides an API by which I can download raw rider data. So if I am provided a list or ride numbers for riders who wish to participate, I can download the data and do my course timing using a Perl script. Then I can use any timing algorithm I want rather than relying on Strava's native timing.

3 comments:

Josh said...

I don't have a horse in this race, or any race, I suppose, but here are my 2 cents anyway: the Strava model (even assuming "perfect" GPS performance) is completely different than the Low-Key model.

With Strava, your time is "delta between the start and finish points" (lines, whatever).

With Low-Key, your time is (usually) "delta between the time the whistle blows and the time you cross the finish line". These are NOT the same thing. There's additional strategy with Low-Key, in positioning yourself at the start, finding/holding the right wheels, and knowing when/where to go (or, not).

I can understand possibly using Strava for maybe one event, in places where permitting a "race" (sorry..."mass start event") is an issue. But I definitely would not like to see the bulk of Low-Key events switch to Strava timing as I feel this would really ruin the character of Low-Key.

And don't get me wrong...I'm a huge Strava fan, even got myself a paid membership and one o' them fancy Garmins...but I don't really consider my results against others as "real"...ballpark, at best.

--Josh

jpo said...

Love the idea of gates (or some other form of intermediate checkpoints). This would solve a huge number of the segment (mis)matching issues we currently see on Strava.

I'm less concerned about the issue of start/end lines vs. points - I don't think this would make a big difference in the timing. I'm assuming that the current point-based matching looks for the closest data point to the segment start point. If the segment start were instead defined by a perpendicular line across the road, I think you're going to get pretty much the same results (particularly if Strava were to begin interpolating between data points, which they've hinted might be coming soon).

djconnel said...

Agree Low-Key is different for mass-start events, but for time trial events, we use difference between start and finish time. Agreed, however, were we to run a "self-timed or GPS-timed" week, optimal strategy might be distorted from that of either a time trial or a mass-start event. I don't see big issues with this, however. And I agree we'd only use it for maybe two climbs, max. Otherwise the series wouldn't have much point: people can chase Strava segments on their own already.

The idea on start and finish lines is that lateral errors in position have no effect, while if you use proximity to a point, then lateral errors increase the time, assuming proximity is a fixed radius. Maybe not a huge difference, but honestly I don't know what Strava does.

I agree the main thing for Strava to do isn't to worry about lines or points, but to allow customization over segment selection such that "check-points" can be placed at positions where I care about the rider passing, but omitting them where the rider has no choice. If there are parallel paths, for example a dirt road next to a paved road, and I want to make sure the rider stays on the dirt, I'd need periodic checkpoints along the dirt and tighten down on the threshold radius to make sure riders on the road don't trigger these points. On the other hand if the dirt road is surrounded by sheer cliffs, there's no need for "checkpoints" except the end-points of the road, and the threshold can be defined relatively loosely to avoid GPS error failing to match a legitimate rider. Presently Strava doesn't know what portions of a segment definition are important.