Saturday, November 12, 2011

testing 2012 Low-Key Hillclimbs scoring code

I seem to have debugged the new Low-Key Hillclimbs scoring algorithm, so tested it on 2011 data for the completed first six weeks.

Recall the method is to calculate a rider's rating (not used for overall rankings) based on the natural logarithm of the ratio of his time each week to that climb's reference time. Meanwhile the climb's reference time is calculated as the average the natural logs of the times of the riders in the climb, subtracting their ratings. These "averages" are weighted by heuristic statistical weights which assign more importance to riders who did more climbs, and to a lesser extent to climbs with more riders. Each of these factors depends on the others, so the solution is done self-consistently until it converges, in this case until the sum of the squares of the reference times changes by less than 10-6 seconds2. This took 8 iterations in my test.

To avoid contaminating the results I check for annotations that a rider has experienced a malfunction or wrong turn during a climb, or that he was on a tandem, unicycle, or was running. These factors would generally invalidate week-to-week comparisons for these results, so I don't use them. So a rider whose wheel pops out of true during a climb and is forced to make time-consuming adjustments before continuing won't have his rating penalized by this, assuming that incident makes it into the results data.

All times here are adjusted for division (male, female, or hybrid-electric), as I've described.

week 1 median    = 2149.50
week 1 reference = 2054.26
week 1 ratio     = 104.636%
week 1 quality   = 0.0398
week 2 median    = 1760.50
week 2 reference = 1762.51
week 2 ratio     = 99.886%
week 2 quality   = 0.0096
week 3 median    = 2614.00
week 3 reference = 2559.27
week 3 ratio     = 102.139%
week 3 quality   = 0.0237
week 4 median    = 2057.50
week 4 reference = 2119.96
week 4 ratio     = 97.054%
week 4 quality   = -0.0140
week 5 median    = 1237.50
week 5 reference = 1246.35
week 5 ratio     = 99.290%
week 5 quality   = 0.0310
week 6 median    = 2191.00
week 6 reference = 2322.56
week 6 ratio     = 94.335%
week 6 quality   = -0.0254
Here the week "quality" is the average rating score of riders in the climb. You can see in general the ratio of the median to reference times tracks this quality score, although one is based on a weighted geometric mean, and the other is a population median.

In general less steep more popular climbs (1, 3, 5) have rider "qualities" which are positive, meaning times were somewhat slower, while steeper, more challenging climbs (4 and 6, but to a lesser extent 2) tended to have negative "qualities", indicating riders were generally faster. The exception here is week 2, Sierra Road. While this road is considered cruelly steep by the Tour of California, apparently Low-Keyers have a higher standard of intimidation, and it still managed a positive quality score with a ratio quite close to 100%. It essentially fell between the super-steep climbs and the more gradual climbs.

A side effect of this, even if I don't use this analysis for the overall scores (this year's score algorithm can't be changed mid-stream, obviously, although it's tempting, I admit...), is I get to add a new ranking to the overall result: rider "rating". This is a bit like the ratings that are sometimes published in papers for rating professional teams, not a statement of accomplishment, but a guide to betters on who is likely to beat whom. Don't take these results to Vegas, though, as they're biased towards riders who did steeper climbs, which produce a greater spread in scores. I could compensate for this with an additional rating for climbs (how spread the scores were), but I'll leave it as it is. I like "rewarding" riders for tackling the steep stuff, even if it's only in such an indirect fashion.

For the test, I posted the overall results with the official algorithm and with this test scoring algorithm so they can be compared. One thing to note is only this single page is available with the test algorithm, any linked results will be the official score:

  1. 2011 scoring algorithm
  2. 2012 scoring algorithm

Riders who did both Mix (week 6) and Bohlman (week 4) really benefit from this new approach. Coincidentally that includes me and my "team" for the series (Team Low-Key, even though my racing team is Team Roaring Mouse, which I strongly support).

No comments: