2013 Low-Key Hillclimbs: rider score variability and the scoring algorithm
One of the goals of ths scoring system was that rider scores varied least from week-to-week. Of course, this is simply accomplished: just give each rider a score of 100 each week, \ then variation is zero. But of course that's not what's wanted. So an additional goal is that scores are roughly proportional to rider speed in a given week.
I'll consider three scoring schemes here for the Low-Key 2013 data:
- score 1 is 100 × median time / rider time
- score 2 is 100 × a reference time / rider time
- score 3 is 100 × (a reference time / rider time)slope factor
Here the reference time for the week is a geometric average for all solo riders adjusted for the rider division (male, female, hybrid-electric) and the slope factors are calculated \ for each week based on how spread out the rider times are, but have a weighted average of one.
I then calculated for each rider doing at least two climbs the standard deviation of their scores, for each score, and took the root-mean-square average of these standard deviations\ . The result of this was the following for the three scores:
- score 1 : 4.47
- score 2 : 4.07
- score 3 : 3.74
So the first score resulted in the most variability in scores for a given rider, the second (calculating a reference time adjusting for rider quality) reduced the variation, and the\ third score (adjusting for score slope) reduced the varation even more.
This comparison is related to an analysis of variance. The analysis of vatiance calculation is based on the assumption there are multiple, independent sources of variation. In thi\ s case, one source of variation for a given rider is how he rides from week to week. This is a desired source of variation: we want riders to score better when they ride better.
Another source of variation is who happens to show up for a given week. Mostly faster riders? Mostly endurance oriented riders? This is an undesired source of score variation. A\ rider shouldn't be penalized in a given week just because the endurance oriented riders stayed home.
Another source of variation is how much the climb spreads out the riders. If a hill is particularly steep, the faster riders will be proportionately more faster than they would be \ if the road was primarily flatter, or included descents where faster climbing ability failed to be of much benefit. This is another source of undesired variation.
The assumption is since independent sources of variation are generally uncorrelated, each tends to increase the total variation, and so the scoring system with the least total varia\ tion for a given rider will generally have the least amount of undesired variation, and is thus preferred.
Comments
For each week I develop a slope factor and a reference time. Times for women are converted by a fixed factor based on historical average, so I get for them an effective male time. A rider's score for a given week is:
rider score = (reference time / effective time)^slope factor
The reference time is what the "average" rider in the series would get for that week. Some weeks tend to attract faster riders, other slower riders, so the reference time needs to consider the quality of the riders who showed up on that particular week.
I could go in more detail, but that's the basic idea:
1. determine reference times for each week.
2. determine slope factors which compensate for how much spread there was.
3. calculate scores
The process is iterative because everything depends on everything else, so I just start with initial guesses and keep repeating the calculation until the results stop changing significantly.