Category: Math

Predicting the World Series using Python

By Rathan Haran, October 29, 2009 7:45 am

Last week, I’ve started to learn Python through a peer-to-peer learning session set up through nextNY.  The material that we’ve gone through has made learning programming very easy to wrap our heads around, and the environment of cooperative learning has been awesome.  I’m looking forward to being a Python ninja* pretty soon.

With four and half chapters of Python at my disposal, I wanted to put my skills to the test.  Since I’m a huge baseball fan, I thought I’d try my hand in simulating who would lose the World Series this year, a pillow-fight match-up between the New York Yankees and the Philadelphia Phillies.

The first thing to do was to crunch the numbers.  Crunching the numbers means exactly that, figuring out the probabilities of events occurring over a seven game series.  I incorporated things like Ryan Howard’s immense strike-out rate, Derek Jeter’s incredible lack of range at shortstop, and Brad Lidge’s ninth inning ERA.  I also made sure to incorporate correlations, or how related each variable is to each other.  Funny enough, the highest correlation I found was between having a runner on first base with less that two outs in the seventh inning onwards and Arod weakly grounding into a double-play.  Numbers never lie.

Now this got me a pretty good picture of who would lose the World Series, but I hadn’t taken into consideration the qualitative variables, the intangibles, the “Cole Hamels’ is a play-off pitcher” and the “Mariano is unhittable in the World Series” bullshit bullshit.  These are usually the ’statistics’ that overzealous fans throw out (with no meaningful data except their distorted memories) as their defense to a player’s immortality.

The classic intangible lies on the shoulders’ of the Yankee captain, Derek Jeter, a ball player that seems to find himself at the right place at the right time in the postseason.  Yankee fans have constantly spouted his ‘greatness’, and refuse to admit that he was horribly out of position on the Jeremy Giambi play at the plate, and doesn’t even register as having the highest batting average in a World Series (that designation goes to Billy Hatcher who hit a sickening .750 for the Reds in 1990 in 12 ABs).  Heck, Jeter doesn’t even deserve the nickname “Mr. November” for his play in the 2001 World Series.  He had 1 HR, 1 RBI, and 2 runs scored in November, numbers that were almost matched by a pitcher for the Arizona Diamondbacks (1 RBI and 2 runs scored).  Oh, and that pitcher also won two potentially series ending games in two days that November with a 2.22 ERA, .96 WHIP, 8Ks in 8.1 innings.  Derek Jeter, I’d like you to meet the real “Mr. November,” Randy Johnson.

Okay, so I wrote my little Python program to capture all of this.  The stats, the pseudo-stats, the Phillie Phanatic’s rants, and the countless times we’ll hear “26 World Series rings.”  With so many probabilities and interactions, this program chugged along for two days, and finally, yesterday before the first pitch, I got the result:  Value Error: Let’s Go Mets.

*Looking forward to the day when ninja is not used in start-up world employment searches and reverts back to its original awesomeness of stealthy nighttime assassin.

xkcd on Purity

By Rathan Haran, June 14, 2009 10:14 pm

xkcd - purity

They should just rename it the NYU Prize

By Rathan Haran, June 1, 2009 12:35 pm

Congratulations to Micha Gromov, the 3rd NYU recipient of the Abel Prize for greatness in math in the past five years.  The Abel Prize for Mathematics is considered as the contemporary to the Nobel Prize, for which there is no award for math.  NYU’s dominance of this award began in 2005 with Dr. Peter D. Lax, who also worked on the Manhattan Project, and continued in 2007 when Dr. Srinivasa S. R. Varadhan won for his work in probability of rare events.

How rare it is to have this type of success was summarized by the expert himself, Dr. Varadhan: “When I met with the king and queen, he said, ‘Since you’re a specialist in probabilities, what is the probability that you’ll have another prize winner from your institution?’ ” Professor Varadhan recalled. “I said, ‘Probably very small,’ but I was wrong.”

Big ups to the Courant Institute of Mathematics at NYU, where I’m proud to say I spent many undergraduate days studying the infinite dopeness of math.

Pandora’s (Beat)Box

By Rathan Haran, May 13, 2009 12:04 pm

The story of Pandora and her box goes a little something like this.  Zeus gets pretty pissed off at this guy Prometheus for throwing fire in a game of RoShamBo, and in retaliation creates this woman Pandora to punish all of mankind (OK, I’m speculating about the game of Rock-Paper-Scissor, but I’d get pretty upset if I got beat by someone using their once in a lifetime throw of fire).  Pandora was given many seductive gifts from the gods and one in particular, the gift of curiosity, led her to open a box releasing all the awfulness into the world (including the credit crisis).  Realizing what she had just done, Pandora quickly slams the box shut, trapping only Hope inside … or maybe not.  After using Pandora.com, it is easy to see why President Obama sees so much hope in the world.

Pandora is an online, streaming music player where users can “customize” their own radio stations.  It’s absolutely brilliant since the only thing you really have to do is enter in a song or artist, and Pandora will automatically stream music that is similar to that song or artist.  Now, instead of spending hours customizing the perfect 80s party playlist (or mix tape/CD for the romantics out there), we can just tell Pandora what song fits our mood at the time and get hours of music delivered too us.  Best of all, if a song comes on that we’re not sure why it was there in the first place (Blame it on the Rain made it on all my mix tapes for some reason), we can easily skip it, and Pandora will exclude songs like it.

So how does Pandora do this?  Well, they’ve hired a team of music analysts who essentially measure each song on 100+ musical characteristics, an idea inspired by the Music Genome Project.  These characteristics, or metrics, make up the “genes” of a song, and their measurements are used to construct a song vector, a mathematical attempt to value the essence of a song.  The similarity of two songs is figured out by measuring the differences between all the musical characteristic of the two songs.  To do this well, Pandora uses a complex distance function, which is essentially saying “how far apart, or different, are these two songs.”  The shorter the distance, the more similar the songs are, and the more likely that song will be played next in your Pandora station.

This is a very powerful framework, but there is one important assumption that shouldn’t be overlooked, and could be a major drawback to implementing this particular recommendation engine.  That critical assumption is that we have identified every factor that captures the je ne sais quoi of a song, which for the non-French speaking means an intangible quality that makes something distinctive.  Do you smell the conundrum brewing?  How does one measure the intangible?  Can you find all the right factors to accurately describe Kris Allen’s performance of Kayne West’s Heartless?  Now while it might be next to impossible to figure out everything that makes a song click, it is very important that you catch the most influential ones in your recommendation model.  Failure to do this could get you voted off.

Pandora is doing a pretty damn good job recommending songs using this framework and they understand that there are a lot of factors that make a song a unique piece of work.  They have developed a framework where they have identified a lot of the measurable, tangible metrics, and have used them to effectively relate songs to each one another.  The next big step in recommendation models would be to understand how each individual values a song, what aspects are more important on a case by case basis, and eventually delivering a personalized, Rathan and Rathan’s Infinite Playlist just for me.

Whatchu talkin’ about Warren

By Rathan Haran, May 8, 2009 1:19 pm

Berkshire Hathaway had their annual shareholders’ meeting last Saturday (May 2), and Warren Buffett and Charlie Munger totally hated on “higher-order” mathematics used in finance.    Come on guys, what did little ol’ math do to you?  Math and modern portfolio theory were picked on by these investment gurus more than Arnold was picked on by the Gooch! Don’t worry math, I got your back.

The truth of the matter is while Mr. Buffet and Mr. Munger are right about Wall Street’s reliance on complex math, the real blame should be focused on the consultants and investment managers who hawked these models as the end-all, be-all, best thing since sliced bread.  This is one case where it is totally fine to shoot the messenger in the face, however, we shouldn’t abandon using math to help us make better decisions.  We just need to find a better translator, because the message has some very valuable insights.

The reason why we build financial models, or really any models, is to keep track of numerous and complex relationships, something that is very difficult to do in our heads.  The world does not move in simple, predictable ways and the real value in modeling frameworks is to find the best representation of how the world actually behaves.  Sometimes a simple relationship just doesn’t make sense; Mr. Buffet would surely agree that modeling investment growth as a simple linear change is not nearly a good as modeling it as an exponential change (there are a number of high school curriculum that consider this “higher math”).

The key is to fully understand and make transparent that as we increase complexities in models, we increase the number of things that can go wrong, and therefore decrease our certainty of performance.  Think back to our first calculator, which for a lot of us often doubled as our first watch (wicked).  Simple, easy, and reliable.  Now add in a 2.66Ghz Intel Processor, 8GB RAM, 320GB of Storage, and a super-fly, aluminum cased, glow in the dark keyboard.  We have a kick-ass laptop that let’s us do all sorts of things a whole lot better, but it’s not surprising that its average lifespan is somewhere around 2 – 4 years.  And when it goes, we lose everything (yes, even that awesome illegally downloaded music collection that was the envy of our less tech savvy and risk adverse friends).  The funny thing is that Casio can still multiply two five-digit numbers, even after 20+ years!  But that doesn’t make it better.

Unfortunately, the certainty of performance only really bothers us in the worst of times, like when our computers crash and the stock market collapses.  Now, just like backing up our hard-drives, there are ways that we can create more security around financial modeling.  A few things that come to mind are good stress testing frameworks (if your models can’t do this easily for you, then be very cautious with its results), putting good translators (i.e., people who get how the model works AND understand its limitations) in front of decision makers early and often, and moving to a risk-based incentive compensation model (a discussion for another time).

Modeling frameworks are very useful, but they shouldn’t be used as a reason to stop thinking about what we are doing.  The human element in analyzing data can never be replaced by a pure modeling framework.  We shouldn’t site blantent disregard of rational thought by high-paid consultants and star investment analysts as failures in mathematical modeling.  Because remember, when you point your finger at your model, there are three fingers pointing back at you … wait for it  …. wait for it … okay, you got it, cool.

Overage!

By Rathan Haran, May 5, 2009 11:12 pm

I came across this from my buddy Hamish, who by the way just finished his MBA at London Business School and is starting an exciting investment banking job at Credit Suisse.  Congrats buddy!

fail owned pwned pictures
see more pwn and owned pictures

UPDATE:  My friend Justin just informed me that the actual formula in here is .002 + e ^ (i * pi) + Sum (1/(2^n)) from n = 1 to infinity.  This reduces to .002 -1 +1, which is much more interesting than .002 + (some random number) + 1.  This is a brilliant  response to George Vaccaro’s unbelievable encounter with Verizon’s billing department.

Lost in Translation

By Rathan Haran, May 4, 2009 2:49 pm

There are 10 types of people in this world, those that know binary and those that don’t.

Quant Finance D-Bag

Monday Morning - Quant Finance Department

Panorama theme by Themocracy