Tuesday, July 31, 2012

Hacking the Sudoku solver from The Ruby Programming Language

On only page 18 of Flanagan and Matsumoto's excellent "The Ruby Programming Language" a relatively complex example is presented: a Sudoku puzzle solver. I've had a decades-long interest in maze solvers, and this tied in nicely with that. I resolved to give the code a close look when I'd finished the book.

And surprising myself, I soon did. I read it cover-to-cover, frequently even re-reading sections I'd already covered when a lack of mastery of specific material became evident later. The language is a nice alternative to Perl, which had been my scripting language of choice since 1998. Ruby has a lot of cool characteristics, even if it sometimes falls weakness to Perl's desire to appease too much diversity in coding styles. I personally think being able to do things effectively two different ways is inferior to being able to do things as effectively only one, as the latter makes multi-author code easier to follow and coherently maintain. But as usual, I digress.

When I'd felt I'd given enough attention to the rest of the text, I returned to the Sudoku solver. I couldn't resist making some on-the-fly changes immediately. Ffor example, certain look-up tables are hard-coded when they can be trivially generated algorithmically, for example a table to the "box number" associated with a given cell. But beyond that, I didn't like that the code was hard-coded for a 9×9 puzzle. Sudokus must be a square array of squares, so the length must equal the width each of which is the square of an integer. This integer I call the "size" of the puzzle. So a 9×9 would be a size 3.

9×9 is convenient because squares can be filled with the symbols 1, 2, 3, 4, 5, 6, 7, 8, and 9. I say "symbols" rather than "numbers" because there's nothing numerical about Sudoku. The only test of the elements is the test of uniqueness. Any set of characteristics which could be tested for uniqueness could do. You can use letters, pictograms, colors, anything which can be compared for match or no match. So it's commonly stated that Sudoku is a mathematical puzzle, but the only mathematics involved are the mathematics of sets. The elements of the set are arbitrary.

Given this, I was surprised that the example puzzle begins by converting the symbols in the puzzle into integers using an obscure string of numbers stores in a string. Mappingf character sequences to integer representation is an efficiency already taken care of by the symbol object in Ruby. This approach seems more appropriate for C code than Ruby. Instead of storing cell contents with numbers and converting to strings, I converted the code to use symbols, which I store in an array. This is much simpler. The only question is what symbols to use. I picked digits 1-9 followed by letters (A-Z, then a-z) then on to circled numbers available in Unicode. This allowed me to solve puzzles up to at least 100×100 (size 10). An alternate would be to allow users of the puzzle "class" to specify the symbols to be used.

Then there was the issue of how the class was organized. A Sudoku module was defined with a puzzle "class", then a module method was defined to solve the puzzle. This seemed artificial to me, since the class had specific functionality required to support the solution algorithm, why act as if the solution was some sort of client of a more general class? I instead incorporated the solution method directly into the class, formalizing the justification for the class methods specific to the solution algorithm.

So with the code now supporting arbitrary puzzle sizes, I gave it a try. 9×9: no problem. It blows through those, even for a starting puzzle which is a totally blank board. Then 16×16 was also no problem. But get up to 25×25 and it started to develop hic-cups... and 36×36 was a real challenge.

These large puzzles revealed some more fundamental weaknesses of the original code. Here's a general outline of the algorithm:

  1. If there's any squares with no options, the puzzle is impossible, so indicate that.
  2. If there's a square which has only one option, choose that option, and return to the previous step (making sure the puzzle isn't impossible).
  3. At this step, every blank square has at least two options. Pick one of those with the least number of options. Think of this as the square most likely to yield a correct "guess". Pick one of the available options for this square (original code picked the lowest-value option; in my code I randomize the selection order).
  4. Try to solve the puzzle with the "guess" in this square. If it fails, try the next guess. If there's no more guesses available for this square, if all have been tried and each yielded an impossible puzzle, then mark the puzzle as impossible.

This algorithm naturally lends itself to recursion, so that's what Flanagan and Matsumoto do. Everytime a guess is made at a cell value, the solution method calls itself with a full copy of the puzzle with the guess filled in. Unfortunately, a large puzzle can involve a LOT of guesses. Making a full copy of the puzzle for a recursion call every time is inefficient.

So I changed this to a hierarchical data structure. Every time a guess was made only the new cell's value was stored, in a hash. If a value wasn't found in the hash, it was checked for in a parent hash from the solution call which recursively called the present one. Now memory wasn't a problem, but cell value lookups really slowed things down.

The approach I took was to have the code cache these value look-ups. Furthermore, instead of evaluating from scratch what values were unused in each row, column, and box, I stored these values in separate hashes. With this combination, memory use was substantially reduced while decent speed was restored.

A problem remained however, and that was stack size. LISP is a language in which recursion is a preferred technique: the language is designed with what's called "tail recursion" so that recursion can be done efficiently. Ruby doesn't have this -- it's optimized for loops, not recursion. I was hitting fundamental stack limits with the larger puzzles for which there seemed to be no solution.

So I decided to bite the bullet and get rid of the recursion. Recursion is super-convenient for book-keeping. Eliminating the recursion required that I keep track of where I was, where I was, and how to get back to previous states if the present solution direction yielded a dead end. Eventually I got it working, but only with considerable effort.

Now once again I had no problem with the smaller puzzles. But I really wanted to solve 36×36 and beyond. In these puzzles, the code would get hung up for many hours, seemingly stuck in a loop. But from the algorithm, a loop should be impossible... to check for this, I installed code which kept a record of a "hash value" for a puzzle state, then checking to make sure no solution loops were found. I'd check for loops up to 10 thousand steps long, and never found any. Yet the solution would grind on for 10 million total guesses or more...

I retreated to smaller puzzles. Generally the code could solve these, but every often it would get hung up. It would grind on and on, modifying a small group of squares in an attempt to break through to a solution path. Once the path was found, the rest of the solution might go quickly.

The issue is Sudoku puzzles are like mazes, but in solving mazes, the number of options available at a given point is at most 2×N, where N is the number of dimensions. With Sudoku there may be as many as L options available, where L is the side length of the puzzle. It's not surprising it would be easy to get lost in a 32-dimensional maze...

So I've concluded solving big Sudoku's is simply difficult. Without researching it, I wonder if it's possible to apply more intelligence to the situation than simply guessing, in order, the options of the square with the least number of options. But I'll leave it at that: my mission of exercising Ruby skills was accomplished. (note: I just noticed there's a forum dedicated to the topic!)

Finally, here's a sample of my test case. I start with a minimal puzzle, one with a single row and column randomly filled it with no box violations:

 .  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  C  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  6  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  N  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  3  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  5  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  9  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  7  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  F  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  E  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  D  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  K  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  P  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  I  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  M  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  8  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  O  .  .  .  .  .  .  .  .  .  . 
 3  C  P  4  9  G  B  K  1  F  7  L  5  N  A  D  O  I  6  H  E  M  8  2  J 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  G  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  L  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  H  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  J  .  .  .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  B  .  .  .  .  .  .  .  .  .  . 

Then I solve. Progress is shown here, with a logarithmic axis on iteration number, where the y-axis is number of points solved (not every point is shown). You can see how the progress is in short bursts, then the solver becomes bogged down, then there's another short burst. These portions of no visible progress can last for millions of iterations if I'm unlucky.

puzzle progress

Then here's the solution (non-unique in this case). Nodes which received at least as many guesses as total cells in the puzzle (in this case 625) are encased in [brackets]:

 G  L  4  B  1  K  P  H  J  9  M  D  I  O  2  3  N  7  C  8  6  A  5  E  F 
 F  K  6  A  M  D  N  C  O  8  G  5  L  B  1  P  H  E  2  4  9  3  J  7  I 
 8  O  D  2  E  B  L  1 [5][I] H  3  7  4  C [A][6][J][9] F  M  G  N  K  P 
 I  N  C  3  J  E  F  G  2  7  P  8  9  A  6  M  5  L  D  K  1  H  B  4  O 
 7  5  9  P  H  4  A  M  3  6  K  E  F  J  N  B  I  G  O  1  8  2  D  C  L 
 K  H  G  7  C  9  I  B  D  P  E  4  J  M  3  1  A  2  F  L  N  O  6  8  5 
 2  8  E  D  A  L  3  J  F  C  I  P  O  1  5  G  4  6  H  N  K  B  9  M  7 
 5  F  N  J  P  A  K  O  G  M  L  2  B  6  9  8  E  D  7  I  C  4  H  3  1 
 M  6  O  9  4  8  1  5  E  2  N  A  H  G  7  C  K  3  B  P  L  J  F  I  D 
 1  3  L  I  B  N  H  7  6  4  8  C  D  K  F  O  M  9  J  5  A  E  G  P  2 
 4  M  K  1  8  F  J  I  L  B  5  7  N  3  E  6  G  P  A  D  2  C  O  9  H 
 B  J  7  F  O  1  E  4  M  K  2  9  A  L  D  H  3  8  N  C  5  I  P  6  G 
 A  G  2  6  L  H  7  P  C  N  O  F  M  I  4  K  J  5  E  9  B  1  3  D  8 
 P  E  H  C  I  O  D  3  9  5  J  6  G  8  K  4  7  B  1  2  F  L  M  N  A 
 D  9  5  N  3  6  G  2  8  A  B  H  1  C  P  L  F  M  I  O  4  7  K  J  E 
 N  1  M  L  2  J  9  8  P  H  3  G  E  D  I  5  C  F  4  7  O  6  A  B  K 
 6  7  B  8  F  2  O  N  4  D  9  1  K  P  M  J  L  A  G  E  3  5  I  H [C]
 O  D  J  5  G  I  M  6  A  E  C  B  4  H  8  2  1  K  P  3  7  N  L  F  9 
 H  I  A  E  K  C  5  L  7  3  F  J  6  2  O  9  B  N  8  M [D][P][1][G] 4 
 3  C  P  4  9  G  B  K  1  F  7  L  5  N  A  D  O  I  6  H  E  M  8  2  J 
 E  4  1  M  6  5  C  D  K  L  A  N  8  9  G  I  2  H  3  J  P  F  7  O  B 
 9  B  I  G  N  P  2  F  H  O  1  M  3  E  L  7  D  C  K  A  J  8  4  5  6 
 J  2  3  K  D  M  8  A  B  1  4  I  C  7  H  F  P  O  5  6  G  9  E  L  N 
 C  A  F  H  7  3  6  9  N  G  D  O  P  5  J  E  8  4  L  B  I  K  2  1  M 
 L  P  8  O  5  7  4  E  I  J  6  K  2  F  B  N  9  1  M  G  H  D  C  A  3 

Friday, July 27, 2012

Mount Diablo Time Trial: the numerator problem

Last post I talked about the denominator issue I had with the Diablo hillclimb time trial. There I converted my present weight to my weight when I set my Old La Honda PR late last year and concluded that my time, which would have put me 3rd in my category, was pretty good, and that therefore my numerator was fine: my power was basically on target.

But that level of analysis really doesn't stand muster. I have the data available to estimate my power: I just need to run the numbers.

I could use Strava's estimated power but that's based on an undocumented model. It's easy enough to download data from Strava myself and run my own calculations.

So here I go: Diablo starts at relatively low altitude and the time trial climbs a net of only 520 vertical meters so the air is fairly thick: I assumed 1.15 kg/m³. I know my mass at ride time (59 kg), my bike's mass (approx 5.1 kg), and the mass of stuff on my body (1.5 kg estimated). My tires are Vittoria Chronos pumped to 160 psi: good stuff. I estimate my rolling resistance coefficient at 0.3%. I don't know much about my wind resistance coefficient and area, but I'm tempted to use around 0.36 m². There might have been a bit of a tailwind, especially in the opening kilometers. I assume drivetrain losses are around 0.3%.

But before estimating power, it's good to try and validate the model against known data. I didn't have a power meter, but a friend, who I'll refer to as K, had a very nice Quarq crank-based unit. K claims to be 64 kg. I guessed his tires were slightly higher rolling resistance, so used 0.4% for his Crr. I assumed his bike weighs close to the UCI limit (which didn't apply at this race) so I used 7 kg for the bike mass. That might have been slightly high. That leaves CdA which is uncertain. He's taller than me and rail thin, which would imply a higher CdA, but he also had a skin suit while I had a standard jersey + bibs. I had tape on some of my helmet vents; I'm not sure about him. So overall I assume his CdA isn't that much larger than mine.

Unfortunately using a fixed CdA I couldn't get a good fit to his data unless I used a very low value and boosted his mass to unrealistic values. So instead I decided to use a two-phase CdA: a lower value for the flatter portion near the beginning,then a higher value for the climb proper. This made sense, both because I know I paid more attention to aerodynamics on the flat bit, then focused more on being relaxed on the steep climbing. I assume he did similar. Additionally, there might have been a more significant tail wind on the bottom portion. Tail wind can be modeled directly, but I decided to just fold that into an effective CdA reduction, since I wanted to reduce the number of knobs to turn.

So I modeled CdA with an error function, starting at 0.26 for K then increasing to 0.36 m². For me, I then assumed a similar pattern, but reduced the values by 0.01 m² for reasons I described.

Here's K's data along with the power I calculated, each smoothed with a 15-second time constant. He rode an excellent time trial, with data virtually constant throughout at 5.04 W/kg. And it paid off for him with a win in his category.

K data and fit
Calculated and measured power from rider K

Okay, now my turn. I felt fairly good about the model at this point so I applied it to myself. Here's what I get:

my calculated power
Calculated power from rider Me

My average power, according to this calculation, was 4.3 W/kg. Even were I a lean and mean 56 kg, what I was last winter, that would boost this only to 4.6 W/kg. That's well below my target 5 W/kg. On pacing, I clearly started harder than I was able to sustain going into the finish. But that's not, I would say, because I started too hard but rather because I finished too weakly. Of course, this depends on the assumption the power modeling I applied to rider "K" also applies to me. Wind conditions may have changed somewhat. But I'm going to stick with this analysis: I didn't feel as if I was going especially hard near the start.

There's a lot of assumptions in power modeling. For example, I could well be substantially off on the rolling resistance coefficient of my tires. But a more reliable calculation is to compare the results from Diablo to what an equivalent time would be up Old La Honda. Then if I underestimate rolling resistance up Diablo that gets to a large degree canceled by the reverse calculation underestimating the rolling resistance up Old La Honda.

Calculating instantaneous "Old La Honda" time needs to be done with a bit of care, because if power goes to zero, instantaneous OLH time goes infinite. So some smoothing needs to be applied to power before calculating time (you don't want to smooth time). Anyway, I applied 15 and 60 second smoothings to power before calculating "instantaneous" OLH time:

effective OLH time
Instantaneous OLH time

In contrast, rider "K" was around 17 minutes.

So obviously I was not at my best at Diablo. Granted, I've done nothing to expect myself to be at my best: after hurting my back, I did virtually no training for a month in May-June, then came back doing a lot of volume without much intensity. But I don't like making excuses. Clearly both the numerator and the denominator weren't up to the task on Saturday.

Saturday, July 21, 2012

Mount Diablo Time Trial: the denominator problem

As I was standing at the finish of the Mount Diablo Time Trial, I suddenly remembered that I'd left a pack of Gu in my pocket. Gu was sponsoring the race and there had been a box of peanut butter Gu's at registration. I'd taken one and put it in my back pocket, intending to leave it with the backpack I had stashed at the start line. But I'd forgotten... would have been better to push it under the leg of my bib shorts where I'd be less likely to overlook it.

All of this seems like a lot of fuss over a little Gu, but Gu packs are 30 grams each. That's around 0.5 seconds extra climbing time for the course, which consists of the first half of the climb to Diablo summit from the north (formula for estimating this sort of thing). Sure, usually 0.5 seconds doesn't matter, but sometimes it does. As it turns out, I ended up 5th in my age group, 1.9 seconds out of 4th, 5.0 seconds out of 3rd. So 0.5 seconds alone didn't make a difference. But people spend a lot of money to save 30 grams: Anyone going with a top-end group because it's lighter is spending around $120 for each 30 grams saved. So it's important to save weight where one can.

Of course, it's important to not lose sight of the big picture, and the dominant factor in weight is the weight of the rider. And here was the real problem: right now I'm around 3.2 kg heavier than when set my Old La Honda PR back in December. In fact, I don't think I've ever raced as heavy as I am right now.

Body mass is always a balance between input and output. So it's important to look at both sides.

On the input side, the obvious approach is to identify some relatively calorie dense choices I tend to make and replace them with less calorie dense options. For example, when I ride in I tend to stop for one or even two oat cakes at a coffee place near work, which is in a car-dominated wasteland of very limited choices. I need to come up with a better alternative to these dense bricks of oats and sugar.

But on the output side, I need to start running again. I am convinced regular running promotes a lower steady-state mass than cycling generally does. One can hypothesize all one wants about why this might be, but whatever the reason, I have observed that when I run I'm lighter than when I don't, and participants in running races seem to be generally leaner than participants in cycling events. Even racers have a tendency for a bit of a gut, something I simply don't see at the typical trail run. On a microscopic level, there's always the calories-in, calories out argument. But few of us count our calories: we eat to a point of satiation, driven by hunger and/or perceived weakness. There's something different between running and cycling in the way these activities stimulate hunger and promote metabolism.

Ah, yes... Diablo. My time wasn't that bad, actually, but there was a bit of a tail wind this year, and I think times were generally strong. I can say for sure I wasn't climbing at the level I was last fall. On the positive side, my power seems fine, but gravity has the irrefutable property that retarding force is proportional to mass, so power alone doesn't get you up a hill. I know plenty of riders with impressive power who don't climb as well as they should because of issues with the denominator rather than the numerator of W/kg.

3.2 kg should have been worth around 52 seconds which would have had me within 20 seconds of Carl Nielson, solidly in third.

In any case, I'm registered for the California International Marathon in December, so I need to start running again in any case, and sooner rather than later. It will be interesting to see if this helps resolve my denominator problem.

Tuesday, July 17, 2012

proposed crash rule for Grand Tours

My pick for the bottom step of the Tour podium was Ryder Hesjedal. Unfortunately, I never got a chance to see whether he was up to the task. He crashed on the sixth stage and abandoned before the next stage.

For as long as I've followed cycling, crashes in the Tour, Vuelta, and Giro have taken their toll on the general classification. Serious contenders are either taken out of the race outright, or lose so much time they are no longer in a position to contend. Crashing is a part of racing. Bike racing isn't about finding he physiologically superior rider, although physical aptitude a part of success. It's also about strategy, about tactics, and about luck. Luck is a critical component to the interest of cycling.

Crashing is just one component of cycling's luck. Bike racing is carried out on real roads, for example, and real roads provide the risk of changes in weather, shifts in the wind, and debris which can lead to punctures. And since real roads aren't fully predictable, cyclists who want to go fast will always take risks, and risk implies failure. Crashing will always be a part of cycling's risk budget.

crash!
Simon Gerrans does a Hoogerland in stage 3, 30 km from finish. Sydney Morning Herald.

But perhaps what we have today goes beyond an optimal dose of that risk. Rules in sport are designed to mitigate risk. In auto racing, rules are in place which limit the speed cars can reach. In cycling, there's rules limiting the safety margin of equipment, including the mandatory use of helmets. And in virtually all international sports, we have anti-doping rules which attempt to reduce the risk to which cyclists expose their health. We want riders to compete risk, but we want the advantage of truly reckless behavior to be mitigated by rules.

The issue with the crashes is that there's so many of them that the general classification riders need to try to stay ahead of them. There's nothing new about this, which is why the UCI has rule 2.6.027, which says an attempt is made to rectify crashes which happen in the last 3 km. But what may be new is for how long the GC riders need to be in the front. This rule at one time was for only 1 km, since this was long enough to cover dangerous sprints, but with the rise of team coherence and communication the paradigm of sprints changed, and by 3 km out full-on mayhem has already begun. So the safety margin was extended in 2005. So now GC riders are at least relieved of the obligation to stay at the front of the pack in the last 3 km. Here's the rule:

2.6.027 In the case of a duly noted fall, puncture or mechanical incident in the last three kilometers of a road race stage, the rider or riders involved shall be credited with the time of the rider or riders in whose company they were riding at the moment of the accident. His or their placing shall be determined by the order in which he or they actually cross the finishing line. If, as the result of a duly noted fall in the last three kilometers, a rider cannot cross the finishing line, he shall be placed last in the stage and credited with the time of the rider or riders in whose company he was riding at the time of the accident.

But now even 3 km may not be sufficient. And it's not just for how much distance the riders need to be in the front: it's how many riders need to be there. You've got your GC guys trying to stay out of trouble, the guys supporting the GC guys who brought them there, then there's the sprinters, of course, and but also the sprinter lead out trains, and of course the riders chasing down the ubiquitous breakaway. That's simply more riders than spots available. It's like taking a gas and compressing it into a small volume. According to the universal gas law the pressure builds. And when gas pressure gets out of control, the result is always the same.

Boom.

So can anything be done? Let's consider another rule which removes the temptation for risk, one which has been in the sport for many decades if not a full century, is the one neutralizing the effect of rail crossings:

2.3.035 The following rules shall apply:
  1. One or more riders who have broken away from the field are held up at a level crossing but the gates open before the field catches up. No action shall be taken and the closed level crossing shall be considered a mere race incident.
  2. One or more riders with more than 30 seconds' lead on the field are held up at a level crossing and the rest of the field catches up while the gates are still closed. In this case the race shall be neutralised and restarted with the same gaps, once the official vehicles preceding the race have passed. If the lead is less than 30 seconds, the closed level crossing shall be considered a mere race incident.
  3. If one or more leading riders make it over the crossing before the gates shut and the remainder of the riders are held up, no action shall be taken and the closed level crossing shall be considered a race incident.
  4. Any other situation (prolonged closure of the barrier, etc.) shall be resolved by the commissaires. This article shall apply equally to similar situations (mobile bridges, obstacles on the route, etc.).

So here we have two rules designed to mitigate the effect of luck on the general classification. In one case, time gaps due to crashes in the final 3 km on flat stages are discounted if possible. And in another case, if a crossing gate closes on a portion of the field, other portions of the pack not directly affected are held up to neutralize the effect of the crossing gate.

The solution, I think, is to apply this same principle to crashes which happen outside the 3 km sprint zone. All the required language is there, it just needs to be appropriately combined. This isn't a new idea, @steephill proposed something like it on Twitter in response to a comment by @Vaughters.

So what I propose is that, in a stage race, any crash which "substantially retards the progress" of at least "approximately twenty riders" outside the 3 km sprint radius be treated similarly to a closed level crossing, and riders be held up for time which in the judgement of officials is sufficient to allow those who are able to continue racing within a prudent time, for example no more than five minutes. This could help take some of the extreme incentive off riders to need to be at the front all the time when the going gets hot.

Ironically the effect of such a rule, reducing the cost of large crashes, could make such crashes far less likely. With less need for everyone to be at the front, it would allow for a relaxation of the reckless risk taking needed to be there.

Cycling may not always be about the best rider winning, but it would be nice if more of the best riders were able to stay in contention in the biggest races for a bit longer than they do in this time of "everyone to the front!"