Posts tagged: World Series

Predicting the World Series using Python

By Rathan Haran, October 29, 2009 7:45 am

Last week, I’ve started to learn Python through a peer-to-peer learning session set up through nextNY.  The material that we’ve gone through has made learning programming very easy to wrap our heads around, and the environment of cooperative learning has been awesome.  I’m looking forward to being a Python ninja* pretty soon.

With four and half chapters of Python at my disposal, I wanted to put my skills to the test.  Since I’m a huge baseball fan, I thought I’d try my hand in simulating who would lose the World Series this year, a pillow-fight match-up between the New York Yankees and the Philadelphia Phillies.

The first thing to do was to crunch the numbers.  Crunching the numbers means exactly that, figuring out the probabilities of events occurring over a seven game series.  I incorporated things like Ryan Howard’s immense strike-out rate, Derek Jeter’s incredible lack of range at shortstop, and Brad Lidge’s ninth inning ERA.  I also made sure to incorporate correlations, or how related each variable is to each other.  Funny enough, the highest correlation I found was between having a runner on first base with less that two outs in the seventh inning onwards and Arod weakly grounding into a double-play.  Numbers never lie.

Now this got me a pretty good picture of who would lose the World Series, but I hadn’t taken into consideration the qualitative variables, the intangibles, the “Cole Hamels’ is a play-off pitcher” and the “Mariano is unhittable in the World Series” bullshit bullshit.  These are usually the ’statistics’ that overzealous fans throw out (with no meaningful data except their distorted memories) as their defense to a player’s immortality.

The classic intangible lies on the shoulders’ of the Yankee captain, Derek Jeter, a ball player that seems to find himself at the right place at the right time in the postseason.  Yankee fans have constantly spouted his ‘greatness’, and refuse to admit that he was horribly out of position on the Jeremy Giambi play at the plate, and doesn’t even register as having the highest batting average in a World Series (that designation goes to Billy Hatcher who hit a sickening .750 for the Reds in 1990 in 12 ABs).  Heck, Jeter doesn’t even deserve the nickname “Mr. November” for his play in the 2001 World Series.  He had 1 HR, 1 RBI, and 2 runs scored in November, numbers that were almost matched by a pitcher for the Arizona Diamondbacks (1 RBI and 2 runs scored).  Oh, and that pitcher also won two potentially series ending games in two days that November with a 2.22 ERA, .96 WHIP, 8Ks in 8.1 innings.  Derek Jeter, I’d like you to meet the real “Mr. November,” Randy Johnson.

Okay, so I wrote my little Python program to capture all of this.  The stats, the pseudo-stats, the Phillie Phanatic’s rants, and the countless times we’ll hear “26 World Series rings.”  With so many probabilities and interactions, this program chugged along for two days, and finally, yesterday before the first pitch, I got the result:  Value Error: Let’s Go Mets.

*Looking forward to the day when ninja is not used in start-up world employment searches and reverts back to its original awesomeness of stealthy nighttime assassin.

Panorama theme by Themocracy