Tuesday, October 29, 2013

recursive course timing for Low-Key Hillclimbs week 4: Portola Valley Road

There were a number of challenges with organizing the Portola Valley Hills Low-Key Hillclimbs. For example, the Edge 500 GPS quality on Joaquin Road. But one of the more interesting was the scoring conundrum represented by a rider who repeated one of the climbs.

The scoring code I used here was originally developed for Kennedy Fire Road last year. There we had a long climb with a number of intermediate check-points. To get an overall time I assigned a time to each checkpoint, but overall time was essentially the time crossing the finish minus the time crossing the start. If the rider completed the course multiple times I'd keep track of times between the start and finish and take the shortest one. If a rider recrossed the start line I'd start over. If he crossed the finish line I'd ignore everything until he recrossed the start line. It was easy.

For Portola Valley Hills I added in the cocnept of time budgets for time segments between checkpoints. If a rider took less than the time budget between checkpoints then no time was assessed for that segment. The idea was to keep riders moving on the course but not put any time pressure on riders to rush from one climb to the base of the next. This worked fairly well, except a regoup at the top of Cerventes resulted in the first riders there exceeding the 10 minutes I had allocated for the transfer to Golden Oak West. No problem: I increased the time to 15 minutes, since I wanted to encourage groups to stick together to help with navigation. And navigation went very well. Nobody missed any checkpoints, which is remarkable.

With a climb like Kennedy Fire Trail, suppose a rider recrossed an earlier checkpoint due to backtracking. That wouldn't matter: the time was ticking. If I was ranking riders on splits between checkpoints I'd just take the shortest time between checkpoints. But for total time, how I allocated total time between checkpoints was irrelevent. Total time was just the finish time minus the start time.

With Portola Valley Hills, however, it did matter. For example, suppose I have the following checkpoints:


where A-B has a time budget of 10 minutes, C-D has a time budget of 10 minutes, but B-C has no time budget and therefore contributes directly to total time. If the rider passes right through, crossing A-B-C-D, there's no question about how to score this. Suppose A-B takes 5 minutes, B-C takes 2 minutes, and C-D takes 7 minutes. 5 minutes and 7 minutes are each less than the 10 minute budget, so they contribute zero. That leaves the 2 minutes from B-C, which is thus the overall time for this portion of the route.

Now consider a rider who repeats the climb of B-C. That rider then crosses the checkpoints in order A-B-C-B-C-D. I had anticipated this, considering riders might want to redo segments if they had difficulty such as a dropped chain. If they repeated a climb they would likely go faster the second time, because if they'd been fast the first time, they wouldn't have repeated it. So I took for each checkpoint the last time the rider crossed that checkpoint, unless the checkpoint was the beginning of a segment with a time budget, then I took the first time the rider crossed it. The reason for this special treatment is it assigned more total time to segments with time budgets. But for repeats of timed segments the last time would count.

But what happend at Portola Valley is one rider repeated a climb but was slower the second time. He expected his first try should have counted, and with good reason. But the code essentially ignored his first crossings of checkpoints B and C. Thus his checkpoint crossing pattern of A-B-C-B-C-D became simplified by ignoreing the crossings in (): A-(B-C-)B-C-D. The first climb and return to the start of that climb was considered part of the transition from A to B. Using the numbers before, with A-B taking 5 minutes, and B-C taking 2 minutes, the code would assume A-B was 7 minutes. The time for the second climb of B-C, for example 3 minutes, was then taken as the time for that. The time from C to D was thus 7 minutes and under the 10 minute budget.

The rider would have preferred eliminating the checkpoint crossing in () as follows: A-B-C(-B-C)D. Then the return to start and second climb of B-C would be considered part of the budgeted C-D transition. The second climb B-C was 3 minutes and the subsequent ride to D was 7 minutes so this is exactly the 10 minute budget so A-B and C-D would each be counted as zero, with B-C counted as 2 minutes, for a total time of 2 minutes.

I could have tried to kludge some ad hoc rule of thumb into my scoring to deal with this. But it would have also been flawed. Consider a similar situation: A-B-C-B-C-D. Again, A-B takes 5 minutes, B-C takes 2 then 3 minutes, but this time C-D takes 9 minutes. The rider could choose the first, faster B-C at 2 minutes, but then the second B-C would get lumped in with the C-D and total 12 minutes, exceeding the 10 minute budget. So the rider would receive a total time of 4 minutes. On the other hand, taking the second, slower of the B-C climbs would result in 7 total for A-B (under budget), 3 for B-C, and 9 for C-D (under budget). The total would be 3 minutes, less time despite using the slower climb of B-C.

So I had to come up with a new look at the problem. Kludged fixes wouldn't do. It gets worse when you add in accidental line crossing, a too-common occurance with so many of the climbs ending at Peak Road, and with some riders taking an alternate route to Summit Spring which crossed the Hillbrook pre-start.

I realized I needed the computer to consider all possible interpretations of a rider's data. A completion of a course involves passing all checkpoints in order. There may be other checkpoint passings, for example repeating climbs, or perhaps accidentlly triggering checkpoints from earlier climbs when moving from one checkpoint to the next. Checkpoint recrossings were another issue with which I'd had to deal, and I'd added more complexity to ignore checkpoint crossings which were inconsistent with repeating climbs. It was turning into a mess.

But by considering all possible interpretations of the data, without trying to be intelligent about it, I could calculate all possible times for the rider, and simply take the fastest. It was just a matter of how to figure out all possible interpretations of the data. After I'd thought about it a bit, I realized recursion was the simplest and most reliable approach.

But first consider the example:


I can number these crossings 1=>A, 2=>B, 3=>C, 4=>B, 5=>C, and 6=>D. From this, I need a sequence A-B-C-D. I have several options for this:


Note each of these maps to A-B-C-D. For each of these I can then calculate a total time. I then simply take the one with the best time.

With recursion, the problem is simplified. I need to consider only three cases: how to know when I've got a solution, how to know when there is no solution, and taking the next step towards constructing a legal sequence of checkpoint crossings. To do this I divide the list into two portions: those checkpoints which I've decided to use so far, and those checkpoints crossed after the last checkpoint on that first list for which I have not yet decided. I then need to keep track of which checkpoint needs to be crossed next.

I start with no checkpoints in the first list, all in the second list. So I have 1=>A, 2=>B, 3=>C, 4=>B, 5=>C, and 6=>D in the second list. The next checkpoint to be crossed is A. I'm not done, because D isn't at the end of the first list (there is no first list). I'm not out of solutions, since there's an A in the list of unprocessed checkpoints. So I pick an A. There's only one, so this is easy. That goes into the first list. The others remain in the second list.

Now the lists are 1=>A for the first list, and 2=>B, 3=>C, 4=>B, 5=>C, and 6=>D for the second list. The next checkpoint is B. I'm not done (the first list doesn't end in D) and I'm not stuck without a solution (there's at least one B in the second list). Indeed, there's two B's in the second list. So I need to try both cases. This is where recursion is so useful. The computer can keep track of each separate case using the language's (in this case Perl) recursion support.

So I first consider the case where I put 2=>B on the first list, and solve the problem for the next checkpoint is C. I then get 1=>A, 2=>B for the first list, and 3=>C, 4=>B, 5=>C, and 6=>D for the second list. I then need to solve the second list for the next node being C. There's two possibilities, so I need to try both.

Eventually I'll finish solving these problems and then I return to solve the case for the second choice of B. In this case the first list would be 1=>A, 4=>B, while the second list would be 5=>C, and 6=>D.

Eventually in each case where I don't walk down a dead-end (this is a lot like my favorite maze-solving algorithm). Then the first list contains a sequence A-B-C-D and I know I'm done. The second list may or may not be empty. The rider may have crossed checkpoints after the finish line. But as soon as D is crossed having crossed A-B-C, the rider obviously doesn't want to continue.

Once I came up with this algorithm, laying in bed at 3 am, things became so much simpler. The rider could ride multiple laps, cross random checkpoints accidentally, etc, and while these may add to the work the computer needs to do, it will always find the interpretation most favorable to a rider. My growing list of ad-hoc rules I'd used before could be discarded. No rider can ever get himself into trouble by crossing a checkpoint versus not crossing it.

Unfortunately, when I thought of this at 3 am on Monday morning, I had to code it immediately. A 1:27 Caltrain delay on my morning commute allowed me to finish the job. But then later that day I got a recommendation for another, unrelated enhancement which would require me to rip open the code again. No doubt I'll get sucked back in and get to work again.

No comments: