New scoring scheme for Low-Key 2012?
Low-Key scoring has gone through various phases.
In the 1990's, we scored based on fastest rider. The fastest man and the fastest women each week would score 100. Those slower would score based on the percentage of the fastest rider's score. This was super-simple, but when an exceptionally fast rider would show up, everyone else would score lower than normal. Additionally, this was frustrating for the fastest rider (typically Tracy Colwell among the men), since no matter how hard he or she pushed himself, the result would be the same 100 points.
So with Low-Key 2.0 in 2006, we switched to using the median rider (again treatng men and women separately). The median is much less sensitive to whether a particular individual shows up or not, so scores were now more stable. However, there was still an issue with women, and most especially with our hybrid-electric division, since smaller turnouts in these again made the score sensitive to who showed up.
So in 2010 I updated the system so now all riders were scored using a single median time, except instead of actual time, I used an "effective mens's time" using our history of Low-Key data to generate conversion factors from women's and hybrid electric's times to men's times. Mixed tandem's were scored by averaging a men's and a women's effective time.
This worked even better. Now if just a few women show, it's possible for them to all score over 100 points, as happened at Mix Canyon Road this past Saturday.
But the issue with Mix Canyon Road was because the climb is so challenging, and for many it was a longer than normal drive to reach, the turn-out among more endurance-oriented riders was relatively poor. The average rider at Mix would have scored over 100 points during, for example, Montebello (data here). It seems almost everyone who did both climbs had "a bad day" at Mix. That is far from the truth!
There is another scoring scheme I've been contemplating for many years. It's one which doesn't use a median time for each week, but rather compares the times of riders who did multiple weeks to come up with a relative time ratio for each climb. So if, for example, five riders did both Montebello and Mix, and if each one of them took exactly 10% longer to climb Mix, then a rider on Mix should score the same as a different rider on Montebello as long as the Mix rider's time was exactly 10% longer than the Montebello rider's time, once again after adjusting for whether the rider is a male, female, or hybrid-electric.
So why haven't I made this switch yet? It sounds good, right?
Well, for one it's more work for me. I'd need to code it. But that's not too bad because I know exactly what I need to do to make it work.
Another is it's harder to explain. It involves iterative solution, for example. I like things which are easy to explain. Median time is simple.
But another is it would mean scores for any week wouldn't be final until the entire series was complete. So a rider might celebrate scoring 100.01 points on Montebello, only to see that score drop to below 100 points later in the series. Why? Because the time conversion factor for a given climb would depend on how all riders did on that climb versus other climbs. And it's not as simple as I described: for example if rider A does climbs 1 and 2, and rider B does climbs 2 and 3, then that gives me valuable information about how climb 1 compares to climb 3. In effect I need to use every such connection to determine the conversion factor between these climbs.
But while scores might change for a climb, the ranking between riders during the climb would not. That's the most important thing. Finish faster than someone and you get a higher score. The conversion factor between men and women, for example, would stay the same. That's based on close to 10 years of data, so no need to continue to tweak that further.
I'll need to get to work on this and see if I can make progress. I'll describe my proposed algorithm next post.
Comments
This could be done by weighting each result proportional to the climb rating. For example, a climb rated 200 could count twice in the results, while a climb rated 100 (OLH, by definition) could count once. Then the number of climbs needed to rank in the overall standings would climbs whose ratings sum to at least half the total of the sum of the weightings of all climbs ridden so far.
But I don't want to do this because I view being fast on short climbs to be as valid as being fast on long climbs. Some may excel in one versus the other, and each type of rider should get their chance.
Rather the key issue here is the average ability of riders in a particular week. The new algorithm will automatically figure that out. I already started writing the code on the train this morning.
Note I did something very similar in the distant, distant past. See:
http://lowkey.djconnel.com/1995/results_analysis.html
That code is now lost, written in an inferior scripting language (awk)... to be honest a bit of what I did then is now somewhat over my head :). But I'll do the best I can.
The approach I propose here essentially uses everyone as a week-to-week reference. Everyone who does both weeks 1 and 2, for example, contributes to the reference time of week 2 relative to week 1. But also do people who do week 1 and week 3, along with those who do week 3 and week 2. The connections are infinitely complex, yet code-wise it's extremely simple.
With this scheme nobody, even I, will be able to predict what will be needed for rider A to beat rider B. For example, rider A may even be ahead of rider B with all scores more than rider B, rider A beats rider B in the final, and rider B moves ahead. That's extremely unlikely but possible.
But it will solve the theoretical problem of riders killing themselves on Mix then getting crappy scores compared to what riders got on Montebello when the Mix riders were busy coordinating :).
The first ride of the series is Montebello and generally has a good turn-out so you can use that to establish the baseline field quality score for the series. For each subsequent climb you divide that climb's field quality score by the Montebello figure and this is the weighting applied to that climb's results. So if the field quality is 10% higher on Mix than on Montebello, all the scores get boosted by 10%, though I bet the effect is more like low single digit percentage points.
Interesting effect of that: were I to reorder the weeks the scores would be different. That wouldn't be true with this scheme, nor with the prior one. It's not a major problem, though, just a curiosity.
I've got the code probably half-way done, not counting outlier pruning. I'll experiment on it with this year's results. But I really like your suggestion. It wouldn't be hard to code, either. We have enough regulars there would be decent statistical weight for the comparison.
Also, you might want to use a bias value rather than a weighting. I mean that you would subtract the Montebello score from this climb's score, and just add that into all the results. The effect is to shift the distribution curve up, and gives more benefit to riders with lower scores. One can argue whether that is fairer or not though.