Tuesday, December 31, 2013

blog posts per year: 2013 update

As the year ends, it's time to check on my post rate for 2013... I plot the accumulated posts per year by month:

posts per month

2013 is a purple line. There's a weak period following my injury in June, but then I had a burst of inspiration, and I ended up right up there with 2009 and 2011. 2010 was the most productive year, whereas 2012 was a slow year.

Here's the totals by year, along with a trend line I established last year, as well as a revised trend line:

posts per year

2013 was tied with 2009 for 2nd most after 2010, while 2011 was only one post less for 3rd. I recovered from the reducing post trajectory I seemed to be following last year.

The new fit uses a somewhat different formula consisting of two components: a hyperbolic tangent for the upward transient followed by an exponential decay. My 2012 formula uses a linear ramp for the upward transient, but hyperbolic transient is better, as it saturates:

posts = 169.2 tanh [ 1.13 ( year - 2007.89 ) ] exp [ -0.063 ( year - 2007.89 ) ]

Here's months where I've posted at least 15 times:

yearmonthposts
2010127
20121221
20121120
20091219
20121018
2010718
20131017
2011517
2011417
20101017
20091017
2009717
20131216
2012216
2012115
2011615
2009315

I'm looking forward to 2014...

Monday, December 30, 2013

some 2014 New Years resolutions

It's approaching the end of 2013, and it's time for some New Years resolutions.

I used to be against New Years resolutions, because I felt if there was something worth doing, it should be done immediately, not held off until New Years for the purpose of providing a resolution. Indeed, that may be true, but it's still worthwhile to take time to reflect at the end of the calendar year and think about changes worth making.

So with that in mind, some resolutions:

  1. to eat more apples
  2. to get my hair cut -- it's getting long
  3. to work from home more often. This requires definable goals for the day. I spend too much time on the Caltrain commuter rail.
  4. to ride from home to work at least 52 times during the year, barring issues such as the injury which got in the way of me doing so this year. This is not a top-priority goal, as riding to work gets in the way of other goals, such as running at lunch, or going to yoga class after work, since it gets me to work later. But 52 times is a good target.
  5. to continue attending yoga classes through the year. Holiday travel has me on a 2-week void in yoga classes. I'd been on a good run, inspired by recovering from my injury.
  6. to do a mountain bike event
  7. to run a 50 km trail race. My previous long is marathon (42 km).
  8. to go camping more than in 2013.
  9. to do a self-supported bike tour with Cara.

Sunday, December 29, 2013

running km per day: more trend analysis

Yesterday I did a run followed by a hilly hike, each approximately 10.5 km. This put a 21 km point on my trend analysis plot. I then redid the plot, including my least-square regression of an exponential curve. Here's the result, along with two other curves I'll explain in a bit:

plot

This was my longest day so far, but it was just one day. However, the result was a profound change in the exponential trend line.

Small aside: in the plot I did yesterday, I made a small error, which was to assume when you fit a curve K exp(α t) to time-series data, if &alpha is in units 1/week, then this represents a 100×&alpha% per week increase. This is a good approximation only for small values of &alpha. I corrected this error in the text of yesterday's blog (not the plot), and did it correctly in this plot.

Anyway, the problem with the exponential trend line is least-square fits are highly influenced by outliers, especially when they occur at the edge of the data. So yesterday's long day had an exceptional influence on the parameters. This suggests the parameters aren't so good.

So I returned to an established analysis mode: CTS and ATS. These are typically applied to data related to the work done during cycling events, but distance and work in running and hiking are strongly correlated, so applying them directly to distance seems suitable here. ATS (acute training stress) running exponential average with time constant 7 days, while CTS (chronic training stress) is a running exponential average with time constant 42 days. The shorter time constant represents fatigue and responds more rapidly to changes in daily distance, the longer time constant represents fitness and responds slower.

I see ATS spiked at San Diego 3 weeks ago. After that I recovered a but, then ATS ramped up, hitting an even higher spike at the point 0 (yesterday). It's now at 7 km/day. CTS, representing fitness, is only at 3 km/day, however. This implies I'm still ramping up mileage, building endurance. Were I to taper for a marathon, for example, I'd want ATS to dip below CTS by race day.

Saturday, December 28, 2013

running km per day trend analysis

With December travel now complete following completion of my post-injury physical therapy, I've been focusing the past weeks on running versus cycling. I decided to analyze how my trend has been in running distance.

It's common to plot running distance versus week, but this can yield artifacts. For example, if the week starts on Sunday versus Monday, it could shift a Sunday long run a full week. This is too crude a time precision for good trend analysis.

So instead I plotted distance per day, including running and hiking but not walking (since walking is so ubiquitous in daily activity it's hopeless to try and track it without wearable sensors). I then fit an exponential curve using an unweighted least-squares fit to analyze the trend.

The plotted data extend back to when I started running again post-injury. I had done a few treadmill workouts well before this, in August, but these left me hobbled, and so I basically restarted from scratch on 31 Oct due to the encouragement of my physical therapist (Dave @ Potrero Physical Therapy).
plot

So I'm basically at 6.4 km per day right now with a rapid rate of increase over the trend period (it's shown as 23.8%/week, but that should really be 26.9%/week, since the plot makes a linearization assumption of the exponential function which is invalid at this rapid rate of increase). That increase makes sense with very intermittent running, but I won't be able to sustain it, obviously. But it's good to know the trend has been upward and the distance number gives me a good baseline to judge future running as I think about going after a trail run in January.

Friday, December 27, 2013

2014 Low-Key Hillclimbs pages posted today

Low-Key

Every year it seems like more work than the year before, and that's probably because it is more work than the year before. But I just finished preparing a workable draft for the 2014 Low-Key Hillclimbs web pages. They're here. The default page remains 2013, as I want to keep the results from this year up at least through the end of the calendar year, which is just a few days away.

Why more work? Two reasons. One is the increased prevalence of new courses. In the past, when I'd select climbs for the next season (with rider input) I'd intentionally go for a mix: the annuals (Montebello and Hamilton), a few favorites on a roughly 3-year cycle (like Old La Honda, Kings Mountain Road, Sierra Road, Bohlman, Welch Creek, previously Diablo before we dropped it due to hassles from the rangers), and others which were good for rarer repeat visits (like Soda Springs, Page Mill, Highway 9). Then each year we'd add in a few, maybe two or three, fresh climbs just to stay creative. This way I'd just find old copies of the data for each climb, update as needed, and paste in the data. Add a new photo and I'm basically done.

But starting last year I went to "Coordinator's Choice" and coordinators like to exercise their creativity by picking new climbs. This is awesome: I love doing fresh climbs we've not visited before. But it means I have my work cut out for me, in particular creating a new route profile and generating climb stats.

The other time cost has been the "time saving" GPS climbs. These are for climbs where it's impractical or politically impossible to set up an actual event. So we invite riders to do the routes on their own and submit GPS data. But with these climbs, which are invariably fresh courses so far, I not only need to do the usual new course things, but then I need to set up the GPS checkpoints and test these against real-world data.

It's not that big a deal for any given week, but 9 weeks total in the series, it adds up. Still, I really do love it. I feel like I'm creating something good and valuable.

If I'm looking forward to any weeks the most, they are weeks 4, 5, and 7. Week 4 will be a fun tour of the Berkeley Hills. In the spirit of Paul McKenzie's brilliant Nifty Ten-Fifty, it's more of a "Nifty Lite": 3825 of timed climbing (5000 feet of climbing in all). These short, steep climbs are a blast, but they're typically too short for a full event, so combining six of them in this way is something enabled only by the GPS scoring (manual timing would be way too much work).

The next one I really look forward to is the Los Gatos Bike Racing Club's week: week 5. It's a fantastic climb in the Santa Cruz Mountains west of Loma Prieta. These are rugged, lightly trafficked roads which have a very different character than the more popularly traveled roads north of San Jose. They have a feeling of rugged remoteness which is fantastic.

And finally I'll point out my climb: week 7: San Bruno Mountain (W). This is a home climb for residents of San Francisco, is easily accessible by public transit, and offers fantastic views from the summit. It's short by Low-Key standards so results will be on the challenging side, but we always manage.

So I look forward to the fall! It should be a fun series. But in the meantime, don't forget the upcoming MegaMonster Enduro in February!

Friday, December 20, 2013

Marin Headlands: Miwok and Marincello

An amazingly nice mountain bike loop in the Marin Headlands, just across the Golden Gate from San Francisco, is the Miwok trail - Marincello trail climb combination. The two are connected by the Old Springs trail descent with its fun series of modest steps. The return from the Marincello summit is the Bobcat Trail descent.

A profile of the climbs with the Old Springs descent in between is here:

profile

The smoothed grade versus distance is here: grade

The grades omit the transitions at the bottom and top. Still, they don't do full justice to the difference. Miwok is an undulating grade, with a series of steeper portions, while Marincello is more of a steady grind with a brief recovery followed by a final short, steep bit at the end. Marincello is a smoother surface: there's some ruts on Miwok. But both trails are easily rideable on a road bike. The Old Springs descent is a bit rough going on a road bike but it's still not a problem.

This is an awesome loop and easily extendable with Coastal Trail from/to Conzelman for Golden Gate Bridge access, or to the north via the more technical portion of Miwok and the steeper trails surrounding.

A Strava route is here.

Thursday, December 19, 2013

Mount San Bruno, Price to summit

Pen Velo's annual New Years Day San Bruno Hillclimb climbs San Bruno mountain from the east side. The climb has two sections, one up Guadalupe Canyon Road, then there's a loop down and under that road onto Radio Road, then the final, steeper climb to the summit. I did a profile of that road awhile ago, to describe that race:

San Bruno (E)

But there's multiple ways up the mountain, 3 (at present) completely paved, perhaps a fourth to kick in when an unfortunate housing development marring the north side of the mountain is completed (Mount San Bruno, except for the very top, has failed to enjoy the general level of protection of development Bay area mountainsides have received). The two major approaches are the two sides of Guadalupe Canyon Road, the other side being the west side, from Daly City. This climb begins in earnest at Price Road.

An advantage of the western approach is freedom from traffic lights. The eastern side has a traffic light at Carter Road. Additionally the western approach avoids the descent to pass under Guadalupe Canyon. You get on Radio Tower Road by passing through a gate on the side of the road. With an event permit, hopefully that gate could be opened.

Here's the profile of the western approach. The top portion of the climb, Radio Road, is shared. In this profile I drew the grade for Radio Road as an average over a longer stretch, rather than as a tangential peak grade:

San Bruno (W)

My climb rating rewards continuous climbing, so this side gets a benefit versus the eastern side for the lack of the intermediate descent. But net climbing is less, and there's some steeper grades on the eastern side of Guadalupe Canyon, so the eastern approach comes out ahead in climb rating.

Wednesday, December 18, 2013

2013 was the steepest Low-Key Hillclimb series yet

On a short flight from Malta NY to Philadelpha, I decided to take a quick look at the net climbing statistics from Low-Key Hillclimb years. To do this, I took the stats for each climb from the last year the climb was used (rather than the stats claimed for each given year) since on some climbs there were revisions as better data became available. I summed up the climbs for each climb for which there were finishers, omitting "X" weeks.

Here's a plot of the result. I superpose lines representing constant average grades from 3% up to 10%. plot

1995 has the most total distance and climbing since there were 12 climbs that year, including both Mt Hamilton, Mount Diablo, and Soda Springs, all long climbs. It falls fairly average on the average grade spectrum. 1998 had the least climbing despite climbing Mt Hamilton twice. The series started out short and had two climbs canceled, leaving only five.

Since we stopped doing Mount Diablo after 2009, climbing in the series has generally been less. 2013 wasn't exceptional in climbing, but distances were relatively short (2nd shortest total series in history), making average grade the highest so far by a small margin. What may be surprising is the average grade is still relatively modest at 6.48%. Flattish portions, or for that matter descending like on Mount Hamilton, provide a lot of dilution, even if the 15%+ grades which were plentiful this year provide a disproportional fraction of the pain.

Here's some numbers:

year
climbs
net meters
net km
avg meters
avg km
avg grade
2013
9
5616
86.49
624.1
9.61
0.0648
2009
9
6495
100.68
721.7
11.19
0.0644
2010
9
5806
90.38
645.2
10.04
0.0641
2008
9
4928
77.62
547.6
8.62
0.0634
2011
9
5894
97.98
655
10.89
0.0601
1995
12
8166
136.01
680.5
11.33
0.0599
2006
7
5320
92.13
760.1
13.16
0.0577
2007
9
5932
108.58
659.2
12.06
0.0546
1997
9
6463
119.51
718.2
13.28
0.054
2012
8
5545
111.93
693.2
13.99
0.0495
1996
9
6071
124.13
674.6
13.79
0.0489
1998
7
4324
89.72
617.8
12.82
0.0481

Tuesday, December 17, 2013

In Malta, NY

I'm in Malta NY on a business trip. It was a cold night.

I wanted to get some food supplies. I knew there was a PriceChopper food market nearby, so I went to the desk. It's a cold morning. Weather underground says 0F now (8am) but it was probably colder then. Fortunately I brought layers.

Even earlier, in the pre-dawn darkness at 6:15 am, I'd seen a woman going toward the lobby wearing cold-weather running gear. "You're running outside?" I'd asked. "If so, it's been good knowing you..."

"I used to live in Lake Placid. I'm used to it," she responded confidently.

So now it was my turn. I asked directions for Price Chopper.

"You go here, then around this traffic circle...." It was clear I was getting driving directions.

"No -- I'm walking."

She looked at me incredulously. "It's a half mile away. Do you want me to call you a taxi?"

"I'll be fine..." I responded, and left.

Sidewalks were partially shoveled from snow two days ago. This made then somewhat treacherous. There was one set of footsteps on the sidewalk I was on, despite heavy car traffic and a strip mall being directly across the road. I had to wait quite awhile to wait for a gap in traffic to scamper across, and only because a car stopped for me. Half the trip was through long access roads and across extended parking lots because land is cheap and squandered.

It was clear bipedal motion was considered an atavism, or at best a way to and from the parking lot.

At the check-out of the food store, each shopper was given a 10 second speech cheerily informing them that their shopping card resulted in them getting an "X-cent" discount on a gallon of gas. I have no interest in gallons of gas.

If 300 million people listen to a 10 second speech about savings on gas once per week, that's around 60 human lifetimes squandered per year. That made this speech around twice as deadly as lightning strikes.

It was a nice walk. I was heavily dressed and was sweating from my combination run-walk. I don't get much opportunities to experience real cold in San Francisco, where we complain about sub-50F. 0F is a different beast. In a way, since it's viewed as a challenge and not an annoyance, easier to take, at least as a visitor. But the contrast between that runner and everything else was striking.

Monday, December 16, 2013

December travel

December for me this year is dominated by travel.

Last weekend and the last part of the preceding week was consumed by a trip to a company internal conference in San Diego. San Diego is cool: I'd previously been there for a Christmas bike tour supporting Hostels International maybe four years ago. But that was just in-and-out. This was first time spending real time there. The hotel was near the convention center, so it had excellent access to a bike-ped trail along the bay. At 6 am the first day was a 4 km running race for conference attendees along the path. That was fun: my first "speed work" since my injury, and all things considered I did okay, finishing 5th. In total Wed PM - Sun AM I managed 3 yoga classes (two in a local studio, one affiliated with the conference) and 4 runs (the race, a stair climbing session in the hotel stairwells, and two long runs). This overindulgance in running took a certain toll, and my legs are still a bit tired a week later. I look forward to getting my running legs back.

Then after returning on Sunday, on the following Saturday (two days ago) it was back on a plane for an important family event in Philadelphia. I was on a Saturday night red-eye, since it required late-in-the-game flight changes, and changing flights during the holiday season is difficult, more difficult every year.

Today I'm on a plane again up the east coast for a work trip. I'll be in the frigid Albany area until Wednesday, then it's back to Philadelphia Wednesday night.

From there: a train to New Jersey for more family visits, back to Philadelphia, and eventually back to San Francisco.

It's great seeing family, of course, and the work trip is what it is. New York state has its charm, anyway. On the fitness side, however, all of this is a real challenge. At Philadelphia, instead of hopping in a taxi at the airport, I took the regional SEPTA train, then walked around 1.5 miles to my destination, reversing that this morning on the return. This is what I call "incidental excercise": getting excercises in tasks which need to be done anyway. It's gotten a lot of attention in the wearable activity sensor market, but no technology is needed. The walk to the main train station is a nice one along the river. Had I been in a taxi, it would have been a totally unrewarding trip.

The Albany area will be a bigger challenge. There's a yoga studio near my hotel. I'd like to get some running in but I'm afraid it will be simply too cold, with temperatures dropping as low as -4F in the forecast for my stay there. Maybe I'll find a treadmill in the hotel. Once back to Philadelphia, then in suburban New Jersey, I'll get some running in for sure: hopefully my legs suffered no sustained damage from my San Diego excesses, and I'll be able to get some aerobic work in.

You'd think the running would be better in "rural" New Jersey than in urban Philadelphia, but this is very much not the case. There's great trails along the river in Philadelphia, and the steps at the Art Museum are a must-do for anyone who's seen a Rocky movie. Philadelphia is a great place to run.

So running, some yoga... not the best preparation for the San Bruno Hillclimb on 1 Jan near San Francisco. I'm still on the fence about whether to do that. It's a great tradition, but going with only two decent rides in my legs in the previous month doesn't sound like a winning plan. But maybe I should just do it and not worry about how well I place.

Saturday, December 14, 2013

cumulative SF2G rides by year

With a huge amount of travel this month, I'll get only two real rides in, and no SF2Gs. So it's a good time to make an accounting of my SF2G totals.

I last did so at the end of 2011, so somehow 2012 slipped through the cracks. I have a rough goal of averaging one per week, but haven't attained that yet. It's important for me to have a goal to kick my butt out the pre-dawn door to ride the more than 70 km into work.

I try to keep a running total when I upload rides, but since I usually do this at work, I'm always in a rush and relying on memory to do so occasionally fails. So I'm forced to go through my Strava record and count.

Here's the plot, which starts when I signed up for Strava in 2010:

SF2Gs by month

2012 started with happy memories of New Zealand riding. I had a down-time in May when I had back pain, but overall it was a solid riding year until I redirected my focus on running for the Sacramento Marathon (CIM).

At the start of 2013 I was still running, in anticipation of running the Napa marathon in March, but my legs just weren't recovered from Sacramento. I was getting zinging pains kicking in around mile 15. So I took a break and refocused on riding. This resulted in a very solid SF2G MArch, then a bit of a taper as I started doing weekend events: Murphy Mack's Spring Classic in March, Devil Mountain Double in late April, the Berkeley Hills Road Race in May, then the Memorial Day tour (a training ride) in late May. But everything came crashing to a halt on a stupid bike-path crash dodging an erratic walker in June. I only really emerged from this in August, slowly getting back up to speed, with my last physical therapy session this past week. November saw my return to riding Low-Key Hillclimbs (ending Thanksgiving). Then December has been a mess, as it so often is, with a return to running most notably in me doing a solid block last weekend at a conference in San Diego, including a 4 km "fun run" race.

Anyway, the moral of the story is I need more consistency in 2014. In particular, don't get injured. And to average 1/week, you need to target 2/week, because stuff always happens, and some weeks there will be none.

Friday, December 13, 2013

Montebello and Mount Hamilton: climbing speed trend in Low-Key Hillclimbs

Montebello Road and Mount Hamilton Road are the two climbs we've done pretty much every year in the Low-Key Hillclimbs. They are thus the best source of data on speed trends in the series.

For men and women solo riders, I took the geometric mean of rider times for each of the climbs each time they were done. Hamilton was done twice in 1998, while Montebello was skipped that year, but every other year Montebello was week 1, Hamilton on Thanksgiving.

Here's the result, with men in blue and women in pink (original, I know):

avg time data

There's some interesting trends. In the 1995-1996-1997 as the series got more popular the average speed dropped for both climbs. 1998 was a slight down year for turn-out, but there's no Montebello data. There were two Hamiltons that year: the first was week 1 and it went off as normal, with faster times for men and slower for women. The second one, on Thanksgiving, was even quicker, but that one was broken into two portions due to a motorcycle crash, with the times added, so riders got additional recovery.

When the series started up again in 2006, times were faster than even in 1995. Again turn-out was small. Series turnout built through 2009, and as it did, average times increased. Starting in 2009, however, times have come down every year for men and the trend has been downward for women. 2013 was the fastest year yet on both climbs for men, and were relatively fast for women (there's many fewer women then men, so the result depends more heavily on who happens to come that year, yielding substantially more variation).

Running regressions, the rate of improvement is substantial: between 1.1% per year to 2.0% per year depending on the climb and whether you look at men or women.

So the end result is just because you score less than you may have in the past, you're not necessarily slower. The fields have been getting faster. In general, the more popular the Low-Key Hillclimbs, the growth comes preferentially from relatively slower riders, and average times increase. And, as has been the trend since 2009, as turn-out drops, the speeds increase.

Thursday, December 12, 2013

updated annual trends in Low-Key Hillclimb turnout

Part-way through the 2013 Low-Key series, I did a blog post on the downturn in attendance versus last year. For completeness, with the series done for the year, I wanted to update that plot.

Here's the numbers through end of 2013. I plot a trend line from the 2009 peak through 2013. There's a loss in average finishers of 6.5% per year, with the rate of loss visibly accelerated the previous two years:

But one change the past two years has been the GPS timed events. These have started out a bit slowly, with Kennedy Fire Trail last year attracting only 45 finishers, still above expectations and still what I'd consider a great success.

Then this year we extended it to two GPS timed events: Portola Valley Hills, week 4, had 69 finishers. Montara Mountain, week 8, had 51 finishers, despite a somewhat remote starting location (at the coast) and a quite challenging dirt climb (too hard for most cyclists to do on a road bike).

So the GPS climbs dragged the numbers down a bit. I plot the turnout for non-GPS climbs here:

The decline is less, down to 4.2% per year.

As enjoyable as the lower stress associated with smaller numbers may have been, with a schedule for next year which is less top-heavy on grade, I'd expect to see the numbers rebound a bit, barring regions trends in the popularity of road cycling.

One relatively constant in the series has been Mount Hamilton on Thanksgiving. There was additionally an October Mt Hamilton to open the 1998 series. Mount Hamilton is a better example than Montebello, perhaps, because we've always been willing to relax the 150 rider limit at Mt Hamilton.

Results from Mount Hamilton also peaked in 2009, but have held fairly steadily since. Finish rates are dependent on weather, however, This year the weather was excellent.

Wednesday, December 11, 2013

2013 Low-Key Hillclimbs: rider score variability and the scoring algorithm

One of the goals of ths scoring system was that rider scores varied least from week-to-week. Of course, this is simply accomplished: just give each rider a score of 100 each week, \ then variation is zero. But of course that's not what's wanted. So an additional goal is that scores are roughly proportional to rider speed in a given week.

I'll consider three scoring schemes here for the Low-Key 2013 data:

  1. score 1 is 100 × median time / rider time
  2. score 2 is 100 × a reference time / rider time
  3. score 3 is 100 × (a reference time / rider time)slope factor

Here the reference time for the week is a geometric average for all solo riders adjusted for the rider division (male, female, hybrid-electric) and the slope factors are calculated \ for each week based on how spread out the rider times are, but have a weighted average of one.

I then calculated for each rider doing at least two climbs the standard deviation of their scores, for each score, and took the root-mean-square average of these standard deviations\ . The result of this was the following for the three scores:

  1. score 1 : 4.47
  2. score 2 : 4.07
  3. score 3 : 3.74

So the first score resulted in the most variability in scores for a given rider, the second (calculating a reference time adjusting for rider quality) reduced the variation, and the\ third score (adjusting for score slope) reduced the varation even more.

This comparison is related to an analysis of variance. The analysis of vatiance calculation is based on the assumption there are multiple, independent sources of variation. In thi\ s case, one source of variation for a given rider is how he rides from week to week. This is a desired source of variation: we want riders to score better when they ride better.

Another source of variation is who happens to show up for a given week. Mostly faster riders? Mostly endurance oriented riders? This is an undesired source of score variation. A\ rider shouldn't be penalized in a given week just because the endurance oriented riders stayed home.

Another source of variation is how much the climb spreads out the riders. If a hill is particularly steep, the faster riders will be proportionately more faster than they would be \ if the road was primarily flatter, or included descents where faster climbing ability failed to be of much benefit. This is another source of undesired variation.

The assumption is since independent sources of variation are generally uncorrelated, each tends to increase the total variation, and so the scoring system with the least total varia\ tion for a given rider will generally have the least amount of undesired variation, and is thus preferred.

Tuesday, December 10, 2013

Low-Key Hillclimbs 2013: weekly score parameters

In the Low-Key Hillclimbs scoring, I calculate two parameters for each week's climb: a "rider quality" parameter which describes the average strength of riders in the climb, and a "score slope" parameter which describes how spread out the riders are. These parameters are determined from rider identification and score alone. Only riders who do more than one climb contribute to these calculations, because these riders provide a basis for comparing one climb to the next. After a single climb, if riders finish close together, than it could be due to the fact the riders are similar in ability. But if the same riders do two climbs, then assuming the riders don't naturally spread or converge in ability, then if they score closer together in one of the climbs then it might be assumed this is due to the nature of the climb, for example that the climb where they finished closer together had shallower grades where wind resistance was more important, or maybe even descents where descending speed is only partially correlated with climbing speed.

Here's the results for the 2013 climbs. First I show "quality", where I've mapped the actual variable used in the code to something close to average rider score:

weekslope
195.5997
295.6284
399.6606
497.8925
597.4627
697.268
7101.167
7x105.022
8100.491
997.5895

The week with the riders with the lowest score here is Montebello. This is historically typical. The first week tends to draw a broader variety of riders. More dedicated riders tend to be stronger, or if they weren't strong to start with, they get stronger than those who chose to ride only the early climbs.

The quality score increases through the first three weeks, the third week being the intimidating Bohlman climb. The score again peaks with Lomas Cantadas week 7, with an even more select group moving on to accept the 7x challenge and climb Marin Ave. Interestingly Montara's score wasn't much above 100. Then Mount Hamilton, week 9, was another popularist favorite, attracting a broader range of riders.

Here's the slope score:

weekslope
11.0114
20.9930
30.9304
40.7744
51.1854
61.1860
71.0050
7x0.7519
80.9205
91.0598

A slope score of less than one means the scores were spread out and need to be compressed. A slope score of more than one means the scores were compressed and need to be spread out. The conjecture that this is related to steepness of the climb is fairly well borne out. The 7X challenge had an extreme slope score: not only was there an issue with rider speed but also motivation, since for many it was challenge enough just to make it to the top of Marin Ave. Curiously Portola Valley, week 4, was next. It seems on the series of short, steep climbs, which challenged recovery as well as anaerobic power, the difference between the fastest and slowest was amplified substantially. Week 8, Montara, was next. This was the steepest point-scoring climb in the series. Next was Bohlman, week 3, which was next on the steepness scale. Lomas Cantadas, which has some very steep sections, ends up with a slope score more than one due to the dilution effect of the descent.

On the other end, weeks 5 (Black Road) and 6 (Patterson Pass) had slope scores well above one. These climbs both had some steepness, but also had extended sections of relatively gradual grade. Mount Hamilton was next, where the descents dilute the time cost of the climbing, and where drafting can be a considerable factor on the first climb.

It's encouraging when the numbers resulting from the analysis correlate with identifiable features of the climb in a way which was anticipated when the algorithm was first developed in 2011.

Sunday, December 8, 2013

2013 Low-Key Hillclimbs: examining the score algorithm

With the 2013 Low-Key Hillclimbs now over, it is a good chance to reflect on the scoring scheme and to see if it accomplished its goal of making similar relative performances on substantially different climbs score similarly.

To check this, I took the score from each week and adjusted it for the quality of the riders on that week. This should in theory result in a similar scoreing distribution as if riders of similar speed showed up each week. The rider quality adjustment is done to scores if primarily faster or slower riders show up certain weeks. For example, particularly challenging climbs like Montara tend to attract primarily stronger riders.

Then I plotted these scores versus rank. I used a normalized rank r which goes from 0 to 1, then applied a log-normal transformation to that number to map 0 to 1 to -infinity to +infinity.

Each week is scored using two adjustable parameters: a reference time and a slope factor. The goal of these parameters is to make each rider's scores during the series as tight as possible. The slope factor is needed because on some climbs, like Mount Hamilton, riders tend to finish relatively closer together due to the influence of the descents and on packs riding together up the first of the three climbs. Other climbs, most notably Portola Valley short-hills, riders tend to finish with a relatively larger spread of times. The 7x challenge, including the super-steep Marin Ave, also had a relatively broad spread of times, due no doubt to the fact some riders were forced to climb in survival mode up Marin.

You might think the slope factor would result in each climb having a similar score-versus-rank curve. This is similar, but not identical. Nor should they be identical: if they overlapped that would mean rank and only rank counted. If riders finish in a group I want them to have similar scores. Groups create plateus on a plot of score versus rank. But generally I want the curves to be such it's hard to differentiate one climb from another, except for plateaus.

Here's the result.

plot

Montara was remarkable for some very high scores. But you can see from the plot that it's only the top 2 scores which are unusual. Montara attracted two particularly good dirt riders, and they finished close together, so it's appropriate they scored highly.

Actually the only curve which looks a bit weird there is Lomas Cantadas. I'll need to look into that. Here's the curves for Lomas Cantadas, Montara, and the aggregate curve from combining all weeks. It seems Lomas is a bit flatter than the others.

plot

I thought perhaps this is due to the influence of so many one-time riders, 12/72, who failed to contribute to the slope term since the code had no basis for comparison for them since they did only one climb. So I recalculated the Lomas curve with only returning riders and plotted it as a dashed line. That is a bit better, but not much:

plot

So I take a step back and look at the code, which I wrote back in 2011 for the "score slope" factors. Does it make sense?

sub iterate_score_slopes {
  for my $w ( @weeks ) {
    my $sum0 = 0;
    my $sum1 = 0;
    warn("iterating reference time for week $w.\n");
    # sum of the squares of the deviations of log scores from ratings
    for my $r ( @{$week_riders{$w}} ) {
      if ( $rider_statistical_weight{$r} > 0 ) {
        $sum0 += $rider_statistical_weight{$r} * $rider_rating{$r} ** 2;
        $sum1 += $rider_statistical_weight{$r} *
          log($reference_time_eff{$w} / $rider_time_eff{$w}->{$r}) ** 2;
      }
    }
    $score_slope{$w} = sqrt($sum0 / $sum1)
      if ($sum1);
  }

  # normalize score slopes
  $sum0 = 0;
  $sum1 = 0;
  for my $w ( @weeks ) {
    if ($score_slope{$w} > 0) {
      $sum0 += $week_statistical_weight{$w};
      $sum1 += $week_statistical_weight{$w} * log($score_slope{$w});
    }
  }
  if ($sum0) {
    for my $w ( @weeks ) {
      $score_slope{$w} = exp(log($score_slope{$w}) - $sum1 / $sum0)
        if ($score_slope{$w});
      warn("normalized slope for week $w = $score_slope{$w}\n");
    }
  }
}

The key here is it is calculating a rating for each rider based on all of his results, then checking that the deviation of the rider's scores from that rating is minimized. This explains the shallower slope of the curve for Lomas Cantadas. It's not that the algorithm failed to yield the same curve, but that the riders who showed up for the climb were more similar (as judged by how they did in other climbs) than riders for other climbs tended to be. So the algorithm is good. To have tried to spread out the scores for Lomas more than they were, comparable to the spreads of scores from other weeks, would have been artificial, since the riders were of more similar abilities than climbs of other weeks.

So here's a plot of rider scores plotted versus the rider rating. This plot only makes sense for riders who've ridden at least two climbs, otherwise their rating equals their single score. The goal of the scoring is to have this cloud be as tight as possible with the two adjustable parameters per week.

plot

It's a cluttered plot, so I isolate the two weeks of primary interest here:

plot

Montara resulted in scores which deviated from rating more than most weeks, as expected, but here Lomas Cantadas seems quite typical. You can see David Collet's big score from Montara (brown point) but not Keith Hillier. Keith isn't shown because this was the only Low-Key he did.

In any case, the conclusion is in particular for Montara the scores were not anomalously high. There were two exceptional dirt riders there and they scored exceptionally highly. Was dirt too much of an influence on the series this year? I think it was a relatively high influence relative to past years, but part of the fun of Low-Key, like other races such as the Tour de France, is it's a bit different year to year. Just like the 2011 Tour was affected by the cobbles of Paris-Roubaix, an atypical influence, and the Tour of 2014 will feature an exceptionally long time trial, we can't expect every year to favor the same riders in Low-Key Hillclimbs.