Monday, May 18, 2015

identifying non-overlapping "climbs" in a profile: a recursive approach

Recently I became interested in Jobst Brandt's climbing-descending algorithm for the Avocet 50. Jobst died recently, and I think it's safe to say the man was absolutely brilliant. Among his many contributions to cycling is the Avocet 50 altimeter climbing-descending algorithm, which has been the "gold standard" for determining total climbing on routes. For many years it was the most reliable determination of what a cyclist would consider "climbing", filtering out small altitude fluctuations which could easily be due to "measurement noise". But to compare Jobst's algorithm to mine I needed hill profiles.

So I wrote a little code to make random hill profiles. But it gave unrealistic results: rounded profiles, where the hills were more like sine waves than jagged peaks. This wasn't too surprising because the algorithm I used was simple: generate random-normal distributed altitude points then convolve the resulting profile with a well-behaved smoothing function (I used cosine-squared as a good finite-domain alternative to a Gaussian). Functions like this attenuate high frequency components, and what's left tends to be sinusoid in nature.

I modified the profile with an exponentially-based analytic truncation function to avoid negative altitudes, but still, the profiles were clearly unrealistic.

What was the problem? The problem was that roads are designed, they don't just happen. Engineers decided they need to get from point A to point B at different altitude, then based on the terrain, the anticipated traffic characteristics, and the relative cost of roadway length, they pick one or more design grades for the road. It won't be perfect: the nature of the terrain imposes local modifications to the design grade considerations. But realistic roads tend to behave differently than the random functions I was using. So I figured if I could identify the "climbs" in my random profiles I could generate more realistic profiles by making the grade more uniform between the bottom and the top.

So I needed to determine what a "climb" was. Well, I'd already given that plenty of thought, in particular in response to John Summerson's determination that Mount Hamilton Road was 3 climbs, not one. John Summerson, in his excellent "Guide to Climbing by Bike" series, rates climbs using a formula which I attribute to Fiets magazine but which no doubt pre-dates that: net altitude gained squared divided by distance.

Given this rating, then, a "climb" is simply a segment of road which maximizes this rating. If I add relatively flat road to the end of a steep road, I'll increase the denominator faster than the numerator. If I add steep road, however, the numerator will increase more rapidly. It doesn't need to be as steep as what precedes it, since the altitude in the numerator is squared, but it needs to be steep enough. And if I add a short descent followed by an additional climbing portion, the net altitude will need to increase sufficiently, overcoming the loss from the intermediate descent but including the distance from the descent, to increase the rating relative to if the descent-climb sequence was omitted and the climb terminated earlier. In the case of Mount Hamilton it turns out the 3 climbs combined with the 2 descents rates higher than any other option (for example just a climb, or a climb-descent-climb).

Long ago I implemented this definition in a code I wrote to determine the "climb rating" for climbs in the Low-Key Hillclimbs. To rate the climb, I wanted to apply the Fiets formula, but rather than doing it automatically to the full profile, I applied it to the portion of the full profile which would maximize the rating. The point here is every climb is at least as hard as its hardest subset. In the Low-Key Hillclimbs I use a different rating than the simple Fiets formula but it has similar behavior.

But here I was interested in identifying multiple climbs. What is the best way to do that?

The obvious approach is to evaluate the climb rating for possible combinations of start and end points on a route. But that's really silly -- the hardest climb might be a 10.00 km segment, then the "2nd hardest" climb would be a 9.99 segment excluding either 10 meters at the start or finish, etc. Climbs which sufficiently overlap shouldn't be counted separately. I wanted to set of individual climbs within a longer route profile which were relatively non-overlapping. This is related to an issue Strava has been dealing with lately, which is how to filter through the enormous surplus of largely redundent user-defined segments. Separate segments on the same portion of road which substantially overlap don't typically represent different climbs, and including both in a ride report with a default verbosity level may be overkill.

So I developed an iterative algotithm to identify multiple climbs within a longer ride, eliminating those on-the-fly which essentially overlapped. I was feeling good about this but to my surprise it didn't work. The reason was subtle and unanticipated: It would find climb A, then it would find climb B which was higher rated than A and substantially overlapped it. So it deleted A. Then it found climb C which substantially overlapped B and out-rated it. So it deleted B. But C didn't overlap A. So bringing an "obvious" climb A was unmarked, not part of any of the surviving climbs.

I then kicked myself because this wasn't a problem that is best solved iteratively. It's really much simpler represented recursively.

So here's the approach:

  1. identify all non-overlapping rateable climbs in a profile:
    1. If there's no rateable climbs of some minimum climb rating, then we're done.
    2. identify the highest-rated climb in a profile.
    3. identify all rateable climbs preceding this climb (recursive)
    4. identify all rateable climbs following this climb (recursive)

The only subtlety is how to define "preceding" and "following". Does this mean includes no common points? Maybe. But maybe instead it allows for some degree of overlap. Or maybe it requires some minimum degree of underlap. I'm thinking now that overlap is not desirable. So the rule would be adjacent segments can share at most one point.

Consider the following example, where I list distance and altitude for each point in normalized units:

0 0
1 1
2 2
3 5
4 8

First I find the highest rated climb. From 3-4 is a rating of 9. From 2-4 is a rating of 36/2 = 18 which is higher. From 1-4 is a rating of 49/3 which is smaller. 2-4 is the highest rated climb: rating = 18.

So then I want to find the 2nd-highest-rated climb. To do this I find the highest rated climb of points preceding, and the highest rated climb of points following the identified segment. The latter is easy: the climb goes to the end of the profile; there are no points following. So that leaves preceding.

I will allow re-use of the start point but that's it: my remaining profile is:

0 0
1 1
2 2

The highest rated climb is 0-2, which is all the points, so I'm done. There's two climbs in this profile: a gradual one, then a steeper one, with no descents in between. The full profile is by any reasonable definition a "climb", but the steeper portion is a better climb, so the full profile gets suppressed. If I were to allow some segment overlap, however, this would have resulted in an artificial first "climb", ending at some point within the steep portion.

Note the Fiets formula as I wrote it: the square of altitude gained divided by distance, rates climbs with the same number in each direction. So if I have a road which climbs, then descends to the original altitude, this should be marked as two "climbs". It's really a climb and a descent but I don't care. A small modification would be a signed rating: one which is positive for altitude gained, negative for altitude lost.

No comments: