Friday, November 11, 2011

proposed 2012 Low-Key Hillclimbs scoring algorithm description

The whole key to comparing scores from week-to-week is to come up with a set of reference times for each week. Then the rider's score is 100 × this reference time / the rider's time, where times have first been adjusted if the rider is a woman or a hybrid-electric rider. Presently this reference time is the time of the median rider finishing the climb that week. But if riders who would normally finish in more than the median time don't show up one week, for example Mix Canyon Road, everyone there gets a lower than normal score. That's not fair. So instead we can do an iterative calculation. Iterative calculations are nice because you can simplify a complicated problem by converting it into a series of simpler problem. The solution of each depends on the solution of every other. But if you solve them in series, then solve them again, then again, eventually you approach the self-consistent solution which you would have gotten with a single solution of the full, unsimplified problem, except that problem might be too difficult to solve directly. So here's how we proceed:
  1. For each climb, there is a reference time, similar to the median time now. The reference time is the average of the adjusted times for riders doing the climb.
  2. For each rider, there is a time adjustment factor. The time adjustment factor is the average of the ratio of the rider's time for a week to that week's reference time. So if a rider always does a climb 10% over that climb's reference time, that rider's adjustment factor will be 1.1.
We have a problem here. The climb reference times depend on rider adjustment factors, and rider adjustment factors depend on climb reference times. We need to know the answer to get the answer. But this is where an iterative solution comes in. We begin by assuming each rider's adjustment factor is 1. Then we calculate the reference times for the climbs. Then we assume these reference times are correct and we calculate the rider adjustment factors. Then we assume these are correct and we recalculate the climb reference times. Repeat this process enough times and we get the results we're after. Once we have a reference time for each climb, we plug these into the present scoring algorithm where we now use median time, and we're done. The rest is the same. One minor tweak: not everyone's time should contribute equally to a climb's reference time, and not every climb should contribute equally to a rider's adjustment factor. This is in the realm of weighted statistics. Riders doing more climbs get a higher weighting factor, and climbs with more riders get a higher weighting factor. The climb weighting factor depends on the sums of the weighting factors of riders doing the climb, and the rider adjustment factor depends on the sum of the weights of the climbs the rider did. So this is another part of the iterative solution. But this tweak is unlikely to make a significant difference. The basic idea is as I described it. There's an alternative which was suggested by BikeTelemetry in comments on my last post on this topic. That would freeze scores for each week rather than re-evaluating them based on global ratings. That I haven't had time to test, but the code for the algorithm described here is basically done; just ironing out a few bugs.


Bike Telemetry said...

Sounds good! I think many can see the rationale for boosting the scores across the board in the Mix Canyon situation. But this will cut both ways. Say some ride brings out proportionately more endurance riders rather than elite riders. This might be due to weather, a date clash, or the type of climb. I wonder if Mt Hamilton brings out a statistically different crowd for example. How will competitors react when you downgrade their scores across the board because of a weak field?

djconnel said...

Ideally riders should score based on how well they rode, independent of the field, as opposed to the present situation where it depends heavily on the field. So if a climb tends to attract more riders below the global climb-weighted average climber, then on that climb, more than 50% of the riders might score below 100%. This will compensate for the fact on steep climbs (or poor weather) which tends to attract above-average climbers, more than 50% might score over 100 points.

Robert said...

If you want a scoring system that's independent of who shows up then you'll have to standardize on what's always there: the climb.

fulmar2 said...

Dan -
I haven't seen how you calculate this yet, and I'm still trying to wrap my head around it. Nevertheless, it sounds like, ultimately, you are trying to hone in as accurately as possible on a "reference time" for each climb.

I like your attempt at doing this, and I am interested in learning more. But, if your main concern is "Ideally riders should score based on how well they rode, independent of the field" then why not calculate an objective reference time based on the climb stats? There are a number of calculators where you can simply plug in some numbers, and get a projected time. Tim Clark's calculator comes to mind. I think it could be really objective. If you do this, the variable is no longer "who shows up for the ride," but rather the environmental conditions (i.e. was there a strong headwind). I believe that the environment will have a smaller impact on scoring than the fluctuations in ridership.

By the way, I'm not trying to knock your system. In fact, it might be interesting to see it run for a year. The complexity of having scores change from week to week could add an additional element of excitement. On the other hand, though, I'm typically a fan of the KISS rule. (Keep it Simple, Stupid!)

djconnel said...

That's an interesting idea, and in fact I'd never before considered that. Part of it is that only in the past few years have good profile data been universally available for climbs: back in 1995 detailed profiles were precious.

But there are a variety of factors that a simple profile-based result will miss. For example, a time trial stage might be slower than mass-start, and wind conditions vary, and rolling resistance depends on the surface quality. These aren't enough to invalidate an objective approach, but even a ±5% difference in a rider's speed from the modeled value for a given P/M would have enormous impact on the standings. For example, rolling resistance is typically around 7% of total power, and wind resistance up to around 12%, for a fast climber on Old La Honda. So a 50% change in rolling resistance would be around ±3%, and a 30% difference in wind resistance around ±4%, so the errors add up quickly. With the enormous statistical pool we have I think we should be able to do better.