Posts tagged: statistics

“So this (Fantasy Baseball) team is perfect”

By Rathan Haran, March 2, 2011 12:50 pm

It’s almost fantasy baseball time again and although my slow starting roster made a respectable run to finish in 4th, I didn’t re-capture the title which I won in 2008.  The consolation prize is that our commissioner, Hans Ruddilicious, for the 9th straight season did not win a title, settling for his record breaking/setting fourth 2nd play finish (see Bills, Buffalo – 1990, 1991, 1992, 1993).

Last year’s champ christened this year’s bulletin board with this fine piece of smack talk:

“i just took a quick glance at the initial rankings and the top 10 players looked like the 2010 Championship Bag of Poo lineup. weird…”

This got me thinking about what it would take to build a team that would perfectly beat any other roster of players, even if you were able to pick the same players (barring obvious ties).  It would seem that a baseball season, with 162 games would be enough to do statistical analysis, although we’d probably want to dampen the volatility a bit by looking at it on a weekly basis.  In this case, building a Monte Carlo simulation model and running thousands of scenarios could give you a good idea of which players would be best to wager a high draft pick on.

This would be a pretty good start, but what about other factors like injury proneness, skills progression/deterioration, team/lineup changes, and to a lesser degree strength of schedule and home field advantages/disadvantages?  It would be a pretty cool analysis to do to try to capture a bit of an edge in your fantasy league (i.e., look for Give It More Hand to return to the top of the standing this year).  Plus, it would be a pretty fun to put to the test our fantasy expertise, especially for bragging rights.

I came across this site that is bringing this concept to life, although not for fantasy baseball.  fantisserie.com looks like a game that lets you put your fantasy shit-talking to the test in a weekly competition for cash prizes.  The entry free is pretty nominal, about the cost of most iPhone and Android apps or a drunk impulse purchase of a Slim Jim from the local deli at 4:30AM, and they are offering a pretty large cash prize to anyone that can hit perfection for a week.

fantisserie

Not a bad trade-off, much like indigestion for the tasty deliciousness of previously mentioned 4:30AM Slim Jim.  Gotta have beef, gotta have spice, need a little excitement.  SNAP INTO A SLIM JIM.

How Bad is my Fantasy Baseball Team?

By Rathan Haran, May 3, 2010 8:54 pm

I’m sitting in 9th place, 67 whopping points behind the league leader after the first month of the baseball season.  That’s pretty awful.  The question is, do I have any hope?  I decided to take a statistical approach to see if I should jump ship on my team, or try to hold through this miserable start.

The first thing I looked at was the historical monthly averages and standard deviations of the players on my team.  When I did this, I got a pretty ugly picture for April, but some hope for the future.

The green represents the categories where my players are performing better than their historical averages, while the red …. well, you know.  A lot of red on this chart … a lot.  Should I be selling low right now?

The data is telling us that April 2010 was a uncharacteristically bad for my team when compared to typical months in their career.  If statistics hold, I should expect a reasonable bounce back for nearly all my players for the rest of the year.  I should hold out for a little bit longer before making any significant panic moves, but I will certainly be looking for improvements across the board.  I’ll be keeping a close eye on riskier players like Ichiro (age) and Hamilton (injuries), and they will be the guys I’d look to move for a younger/healthier equal poor starting ’star.’

What have I done so far?  I’ve moved Molina for the ‘no timeshare’ Miguel Olivio in Colorado, and I’ve gotten Lance Berkman back to take the spot of Curtis Granderson (who actually wasn’t that much of a drag on my team since I only play him against right-handers, check the splits for his career).   If Rich Harden can put it together, Grienke get’s some offensive help, and Dave Duncan has really fixed Brad Penny, I might have a fighting shot.

Your turn.  Who would you target right now in a trade, and who would you move off my team?

Reblog this post [with Zemanta]

Stats in Sociology – Not Boring Version

By Rathan Haran, December 30, 2009 11:23 am

Hans Rosling shows you how to to make stats cool again.  Visualization of data is a great learning tool, and creating ways to do this will be of great value to those with less mathematical sophistication than others.

The insights that Hans Rosling was able to illustrate in this presentation were pretty amazing as well, and definitely surprising to see how much the world has leveled in the past 40 years.  Check it out:

Bill Belichick + Statistics = Usually Good Outcomes

By Rathan Haran, November 16, 2009 3:48 pm

I’ve been a die hard Giants fan for as long as I can remember, and although I was technically alive and barely remember parts of the 86 season (and nothing from that Super Bowl), the 90 Super Bowl team made me love the G-Men .  They had me at Mark Ingram’s 3rd and 13 conversion.

Bill Belichick was the Giants’ defensive coordinator in that Super Bowl and designed a genius defensive game plan, predicated on the statistics of his defensive unit.  Knowing that Jim Kelly and the Buffalo Bills could rip apart the Giants’ secondary, he had his defensive linemen and linebackers give up yards on 1st and 2nd down.  He believed that this would dictate that Buffalo would run the ball, rather than pass in longer 3rd down situations, a place where the Giants were statistically strong in all season.

His gutsy calls are not relegated to just defense.  The 07 Patriots’ offense was a great example of not playing into defenses strengths (case in point, going to five WR sets against the top ranked Minnesota run defense), and taking gratuitous unsportsmanlike advantages of your strengths when the game was well out of hand.  Gratefully honor prevailed and the Giants laid the smack-down on the Patriots to win their 3rd Super Bowl and squash a rather presumptuous book before it made its way onto bookshelves.

Last night’s Patriots Colts game is just another way Bill Belichick makes football analysts (like Mike Francesca) and arm-chair quarterbacks look like idiots.  It was absolutely the right move to go for it on 4th and 2 and the odds were completely in his favor, as described on “Advanced NFL Stats:

“With 2:00 left and the Colts with only one timeout, a successful conversion wins the game for all practical purposes. A 4th and 2 conversion would be successful 60% of the time. Historically, in a situation with 2:00 left and needing a TD to either win or tie, teams get the TD 53% of the time from that field position. The total WP for the 4th down conversion attempt would therefore be:

(0.60 * 1) + (0.40 * (1-0.53)) = 0.79 WP

A punt from the 28 typically nets 38 yards, starting the Colts at their own 34. Teams historically get the TD 30% of the time in that situation. So the punt gives the Pats about a 0.70 WP.

Statistically, the better decision would be to go for it, and by a good amount.”

It didn’t work out for Bill last night, but the decision was sound, and in the long run, he’s going to come out on top more often than not.  It’s why he’s a great NFL coach, and we shouldn’t be convinced that our “conventional” wisdom is better than his statistical prowess.  Just be content knowing that he’s a prick and move on.

Predicting the World Series using Python

By Rathan Haran, October 29, 2009 7:45 am

Last week, I’ve started to learn Python through a peer-to-peer learning session set up through nextNY.  The material that we’ve gone through has made learning programming very easy to wrap our heads around, and the environment of cooperative learning has been awesome.  I’m looking forward to being a Python ninja* pretty soon.

With four and half chapters of Python at my disposal, I wanted to put my skills to the test.  Since I’m a huge baseball fan, I thought I’d try my hand in simulating who would lose the World Series this year, a pillow-fight match-up between the New York Yankees and the Philadelphia Phillies.

The first thing to do was to crunch the numbers.  Crunching the numbers means exactly that, figuring out the probabilities of events occurring over a seven game series.  I incorporated things like Ryan Howard’s immense strike-out rate, Derek Jeter’s incredible lack of range at shortstop, and Brad Lidge’s ninth inning ERA.  I also made sure to incorporate correlations, or how related each variable is to each other.  Funny enough, the highest correlation I found was between having a runner on first base with less that two outs in the seventh inning onwards and Arod weakly grounding into a double-play.  Numbers never lie.

Now this got me a pretty good picture of who would lose the World Series, but I hadn’t taken into consideration the qualitative variables, the intangibles, the “Cole Hamels’ is a play-off pitcher” and the “Mariano is unhittable in the World Series” bullshit bullshit.  These are usually the ’statistics’ that overzealous fans throw out (with no meaningful data except their distorted memories) as their defense to a player’s immortality.

The classic intangible lies on the shoulders’ of the Yankee captain, Derek Jeter, a ball player that seems to find himself at the right place at the right time in the postseason.  Yankee fans have constantly spouted his ‘greatness’, and refuse to admit that he was horribly out of position on the Jeremy Giambi play at the plate, and doesn’t even register as having the highest batting average in a World Series (that designation goes to Billy Hatcher who hit a sickening .750 for the Reds in 1990 in 12 ABs).  Heck, Jeter doesn’t even deserve the nickname “Mr. November” for his play in the 2001 World Series.  He had 1 HR, 1 RBI, and 2 runs scored in November, numbers that were almost matched by a pitcher for the Arizona Diamondbacks (1 RBI and 2 runs scored).  Oh, and that pitcher also won two potentially series ending games in two days that November with a 2.22 ERA, .96 WHIP, 8Ks in 8.1 innings.  Derek Jeter, I’d like you to meet the real “Mr. November,” Randy Johnson.

Okay, so I wrote my little Python program to capture all of this.  The stats, the pseudo-stats, the Phillie Phanatic’s rants, and the countless times we’ll hear “26 World Series rings.”  With so many probabilities and interactions, this program chugged along for two days, and finally, yesterday before the first pitch, I got the result:  Value Error: Let’s Go Mets.

*Looking forward to the day when ninja is not used in start-up world employment searches and reverts back to its original awesomeness of stealthy nighttime assassin.

Blackjack, Basic Strategy, Battle of Wits – Part III

By Rathan Haran, August 5, 2009 12:28 pm

Have you ever been on a blackjack table and accidentally hit a hard 14 with the dealer showing a 5 while playing basic strategy?  Replay it in your mind, you bust on the King, dealer makes his 21 on a 6, the entire table gives you the death stare, curses your first born, all while mumbling under their breath, “Never hit on a hard 14 with the dealer showing a 5 idiot. That King was the dealer’s bust card. We all would have won.”  Tough room.

First things first, those people have no idea what they are talking about.  There is no such thing as “that was the dealer’s bust card.”  The deck doesn’t know whether the dealer or the player is hitting or staying and the cards don’t change because of how someone plays their hand.  The probabilities that guide basic strategy haven’t been altered because someone does not make the optimal play and theoretically the dealer still has the same likelihood of busting (in practice though, since the deck has a fixed amount of cards, the distribution of remaining cards changes the underlying probabilities of basic strategy.  Card counting attempts to exploit this by identifying random deck distributions that happen to have a large amount of 10-value cards remaining).

The important thing to remember here is that basic strategy gives you the probabilistically best play given that the deck has a RANDOM DISTRIBUTION OF CARDS.  That means that if the deck is not random, basic strategy might not be the optimal play.  So what everyone should consider before baptize themselves in the holy waters of basic strategy is what it takes to make a deck random (and who controls what it takes to make a deck random).

To make a single deck random, the deck must be riffle shuffled about 7 times.  Since suit doesn’t matter in blackjack, and K, Q, J, and 10 hold the same value, you actually need to shuffle a single deck less, about 4 times, to make it random.  Most casino blackjack tables play with 6 or 8 decks at once which are shuffled together and played from a dealer’s shoe.  In order to randomize a shoe of 8 decks, it takes about 12 riffle shuffles.  Does your casino shuffle a shoe 12 times?  Probably not.  Most casinos shuffle a shoe 4 times, and that has some interesting implications when an entire table is playing basic strategy.

So let’s take a look at what happens to a shoe when the entire table is playing basic strategy.  The first thing is that anyone that has a strong hand on their first two cards (17+) is instructed to stay, and their cards remain on the table until the deal is over.  Players with weak hands play out their hands, and if they bust, the cards are removed from the table and placed in the discard shoe.  This begins to create layers of cards in the shoe; clusters of low cards placed on the shoe first, followed by clusters of high cards that were left on the table.  Since most casinos do not shuffle the shoe enough times, these layers loosely exist in the new shoe, and are further propagated when the entire table plays basic strategy (some people attempt to exploit this by using a technique called cluster counting).

Clustering of cards creates decks that are not random, which is one of the critical assumptions that basic strategy is built on.  This creates opportunities for dealers to win/push more hands than basic strategy predicts.  During a high cluster deal, a dealer is likely to have high cards to push, or even beat, the tables “strong” 19s and 20s.  In the case where low card clusters are being dealt, a dealer will likely have a low up-card, a situation where basic strategy dictates to hit against until about 14.  The thing is that since it’s a low cluster deck, the dealer  has a better chance to make a hand!  The player also has a better chance to make a hand, but basic strategy actually advises them not to try.  INCONCIEVABLE!

Basic strategy is still by far the best way to reduce the house odds, but since decks are not completely random, there is certainly room for improvements in game play.  For example, in high cluster deck situations, it may be worthwhile to split face cards, while in low cluster situations, taking another card to try to make a better hand may be your best bet.  Playing this way may add a little bit more excitement to the rule-based approach of basic strategy as you’d be trying to exploit the rest of the table playing the basic strategy system.  And if it pisses anyone off at your table, just turn to them and say “You fell victim to one of the classic blunders!  The most famous is never get involved in a land war in Asia, but only slightly less well-known is this: never play basic strategy against a dealer when deck isn’t random!”  I’d get a kick out of that if I heard that on a blackjack table.

SmackDown Headliner – Google VS Facebook

By Rathan Haran, June 23, 2009 12:26 pm
Me at 7, with bigger guns

Me at 7, with bigger guns

I haven’t watched WWF, or WWE, or Friday Night Smackdown since I was a kid (see right), but after reading Wired magazine’s article on Google vs. Facebook, I could not help but think about, in my opinion, the greatest wrestling match of all time.  This battle pitted the up and coming, wildly popular, eccentric and electric young superstar against the stalwart, power punching, mega-myth champion of the world.  Of course, I’m talking about the headliner at WrestleMania 6 where the Heavy Weight Champion of the World Hulk Hogan fought the Intercontinental Champ, The Ulllttiiimmmatteeeee Warrrrrioorrrrrrr!

Champion against champion, title for title, that’s what it’s all about.

Google and Facebook are waging their own war on shaping what the Internet’s future will look like.  They both have an underlying mission to share information, but their core approaches and visions of the web are very different.  Google has historically viewed the web as the great equalizer, the place where information can be accessed by anyone and everyone, and that information can be efficiently found by harnessing the power of cold, hard algorithms.  Facebook sees the web not as the source of information per say, but rather as the medium for which people can share information across their social net.  Instead of relying on complex math necessarily, Facebook puts the power of human sharing in the forefront of spreading information.

Both of these approaches have their place on the web.  What good is a platform to share information easily from the people that matter most if the people that matter the most can’t find the information in the first place, and vice verse?  In my mind, the bigger challenges lie in front of Facebook, because the future of sourcing information from hundreds of friends (if not thousands for the Facebook junkies “power users”) will come down to powerful ranking, grouping, sorting, and prioritizing algorithms, a space that Google has done very well in.

“So wha’cha gonna do brother … when the Hulkster (read as Google) comes for youuuu (read as Facebook)!”  Well, Facebook has been able to pull some ex-Googlers into their shop, to a tune of nearly 9% of their staff, and they have a virtual lock on the social network space (although I begin to worry about the hipness of it when my parent’s generation is “friending” me).  As difficult as it may seem, they may be putting together the pieces and the relationships to really challenge Google’s web dominance.  And maybe, just maybe, they’ll have enough to gorilla slam the powerhouse, avoid the leg-drop, and big splash their way to top, just like the greatest character wrestler of all time was able to do.  R.I.P. The Ultimate Warrior.

Bonus Footage:  Top Ultimate Warrior Promos Ever

Reblog this post [with Zemanta]

xkcd

By Rathan Haran, May 21, 2009 5:14 pm

One of my most favorite comics online, xkcd, does a great job creating funquations (please see About Webinometry if you are not familiar with funquations).  Here’s their latest one:

fermirotica

Panorama theme by Themocracy