Sunday, November 17, 2013

Low-Key scoring and X-weeks

Low-Key scoring got complicated when we started balancing the contributions from each week. The results from one week can affect the results of other weeks. This is fine. There's several goals in the scoring:

1. the average of all rider scores should be 100.
2. rider scores should be as consistent as possible week-to-week
3. if I plot the logarithm of scores versus the logarithm of times in a given week, I get a straight line, the average slope for all weeks being one.

A trivial example for this might be the following:

Suppose rider A does climb 1 and gets a time. He's the only rider in climb 1. Climb 1 has been the only climb. He gets 100 points, consistent with the first goal.

Now rider A does climb 2 and gets a time. Again he's the only rider, but now there's been two climbs. He gets scores of 100 and 100. This is consistent with goals 1 and 2.

But suppose now I realize there was also a rider B in week 2. Rider B was 20% faster than rider A. Using the third goal, this means rider B should score 20% more than rider A in this week.

So what do I do? I could assign rider A scores of 100 and 100, as before, and give rider B a score of 120. But this would be inconsistent with the first goal. I want the average to be 100. The two scores for rider A are x. The score for rider y is 1.2 x. I then get the following algebraic equation:

x + x + 1.2x = 300

The solution is rider A scores 93.75 for weeks 1 and 2, but rider B's score in week 2 is 112.5.

However, here's where I get into a problem when I use "X-weeks". X-weeks are climbs which are scored like regular climbs, but don't affect the regular series scores. We had examples in the 1990's, but the first "X-week" in the "modern era" with this sort of scoring scheme was yesterday: The Lomas Cantadas - Marin Ave double.

Suppose the second week was an X week. The score from the first week was affected by what happened in the X-week. I can't let week 1 scores change due to the presence of week 2.

So if week 2 is an X week, the scores should be 100, 100, and 120, as I initially calculated. The first goal, that the average score should be 100, and indeed the third goal, that the average slope of scores versus time on a log-log basis should be one, should exclude X weeks. I will still calculate a slope for the X week, consistent with the second goal (rider scores should be as consistent as possible), in a more complicated case (in this trivial case I don't have enough scores to do so). But the result shouldn't feed into a series average.

This problem only manifested itself when I calculated the week 7x scores. I saw scores dropped for the week 7 climb, which was just Lomas Cantadas. That wasn't desirable.

The problem obviously gets more complicated when you have up to 10 sets of scores for on order 100 riders per week. I end up calculating statistical weights based on the number of riders per week, and the number of climbs per rider. I'm not sure the method I use formally optimizes to these stated goals. But in testing, it does much better than the previous, far simpler algorithm. I described some of this testing back in 2011.

But the "X-week" model may still have a few surprises in store in terms of unintended consequences. Hopefully I can isolate any before the end of the series.