Monday, May 2, 2011

San Francisco Giants 2011 vs 2010: luck or skill?

The San Francisco Examiner devotes more than 1/3 of its content to sports, and of course a popular subject is the San Francisco Giants, who won the Major League Baseball championship last year. During the season in 2010, they won 92 games and lost 70, a 56.8% winning fraction. As of last Friday they are 12 wins and 12 losses, only a 50% win fraction. Obviously, one concludes from reading the headlines, something has gone terribly wrong.

The issue is I've not read past the headlines. Maybe there's compelling arguments made for how the team is playing. But invariably in the analysis of baseball and everything else, there is a lack of appreciation for the statistics of random numbers.

Baseball games aren't fully random events, but there is clearly a random component to them. I think everyone recognizes that luck is a big factor.

So a quick test: I'm going to assume the Giants had luck on their side last year, since they won the division (and ended up going on to win the championship, but that's irrelevant here). Teams towards the top of the standings tend to have been luckier, and those toward the bottom of the standings tend to have been less lucky. Failing to recognize this is a flaw of the dice-based games I played as a kid, but that's another topic. I'll simply assume the Giants would get at least as good a record as they got last year if they replayed the season with everything essentially equivalent only one year in three.

Now to some statics: assuming they had an equal probability to win each game, and the probability of winning was 92/162, the variance in the number of wins would be 92 × 62 / 100. The standard deviation is the square root of this, or 7.55. I'm assuming they had 1/3 good luck, so assuming a normal probability distribution for wins, this implies they won 3.2 games more than expected based on skill, or that their expected record for the season was 88.75 wins and 63.25 losses, a 54.8% winning percentage.

So given this winning percentage, what is the chance their record so far would be as bad as 12-12? I won't assume a normal distribution here; there's not quite enough games for the central limit theorem to apply. Instead I can use Perl to generate a quick simulation.
my $p = 0.548;

my $sum = 0;
for my $n ( 0 .. 99999 ) {
  my $w = 0;
  for my $g ( 0 .. 23 ) {
    $w ++   
      if (rand() < $p);
  $sum ++
    if ($w <= 12);
print $sum, "\n";

The result: 38958 of 100 thousand trials had the Giants finishing the first 24 games no better than 12-12. This implies they went from around 1/3 good luck to 1/3 bad luck for the first 24 games this year.

Next I modified the program to simulate a full season. What are the Giants chances of winning only 81 (50%) or fewer of their games for the full year? Their record was this bad in 10929 of the 100 thousand trials. In other words, you would expect them to tank to .500 or worse around 1 year in 9, given the estimated winning probability for last year (which assumed 1-in-3 luck).

In baseball, winning 100 games is considered exceptional. So I checked the chance for them to win 100 games. They did so in 5359 of the 100 thousand trials: around 1 in 19. If I start the simulation with a 12-and-12 start, the number of successes falls to 2695 / 100 thousand (2.7%), and their success at matching last year falls to 28.5% from 36.4%, so the slow start does hurt them. The question is whether it indicates poor preparation or just a change in the winds of fortune.

Sure, each baseball game is not an independent random trial. Assuming this is the "worst case" assumption. But looking only at the Giants' win loss records, the result so far is indistinguishable from random. It is important to realize that luck may play an enormous role in baseball standings, even at the end of 162 games.

No comments: