I analyze data for a living. First @CarnegieEndow, then @Yammer and @Microsoft, now @ModeAnalytics.

Read this first

Are Home Run Derby Hitters Different?

Tonight’s Home Run Derby features ten of the world’s best home run hitters. But the Derby isn’t about just hitting home runs—it’s about seeing how far they fly.

The graphic below explores how this year’s participants’ homers compare to the 40,000 home runs that have been hit over the last seven-and-a-half MLB seasons. Do they hit the ball harder, higher, and farther than average?

Data for the graphic was provided by ESPN. The full dataset for all regular season home runs can be found and analyzed on Mode.

Continue reading →

Jul 23, 2014

Five Public Datasets, and Lots of Ideas for Exploring Them

The world is full of interesting datasets. But even though data is increasingly accessible, it’s sometimes hard think up an interesting problem to analyze. Maybe there are just too many possible questions, maybe it’s a pain to set up analytical tools, or maybe it’s just too easy to get distracted by animal GIFs.

Whatever the case, we want to make it easier to start working on interesting problems right away. Here are five datasets, already loaded into Mode’s public database, that you can query, analyze, and visualize right now.

For each dataset, I’ve provided a link to the table in Mode’s public data warehouse. If you’re feeling lazy and only want to work with a tiny amount of data (as in, one row), I found the best single row of data from each dataset. And if you’re feeling ambitious—and want to get popular on the internet or explain some things—I added some ideas for turning these...

Continue reading →

Jun 23, 2014

Are Taxi Drivers Racist?

Last week, Chris Whong published a massive dataset of every taxi trip taken in New York in 2013. The data, provided through a Freedom of Information Law request, includes an incredible amount of detail on where trips started, where they ended, when they occurred, how much they cost, and how many passengers there were.

A number of people have already done incredible things with this data, including making a remarkably detailed map of where cabs typically pick up and drop off passengers. A dataset of this detail opens the door for countless questions and angles of exploration.

One such question surrounds accusations that New York City cabs discriminate against potential passengers. A number of anecdotes claim that New York cabs are reluctant to stop for black passengers, especially after dark. Could this new dataset shed any light on this issue?

The dataset, which provides no...

Continue reading →

May 5, 2014

Where Americans Think They Live

Last week, FiveThirtyEight’s Walt Hickey wrote a couple of interesting articles about which states are in the Midwest and South. Being from North Carolina—where people definitely consider themselves Southern—I was surprised to see that only two-thirds of all respondents to the FiveThirtyEight survey said North Carolina was in the South.

This made me wonder: How do people from different states define the South and Midwest? And specifically, how do the views of people from who live in a state differ from those of people who don’t live there?

In keeping with their commitment to release an article’s underlying data, FiveThirtyEight published the full survey results on GitHub. Opening this data enables further exploration, as in the interactive graphic below. I looked at how every state in the U.S. views the South and Midwest, and how local opinions compare to national views.

The data...

Continue reading →

Apr 29, 2014

Are the Playoffs Taking Forever?

We’re over ten days into this year’s NBA playoffs, and several of the opening series have made it all the way to…Game 5. The NHL playoffs, which are entering their third week but only the second round, are also in no apparent hurry.

So far, this year’s exciting playoffs are keeping critics quiet. But if tight overtime games and buzzer beaters give way to blowouts and snoozers, the annual complaints about the length of the NBA and NHL playoffs will probably resurface.

So I decided to take a detailed look. Sure, the playoffs feel long, but are they really?

Measuring playoff length isn’t actually a straightforward question. Because each league structures its playoffs differently, direct comparisons aren’t always appropriate. However, by attempting to standardize the pace and length across leagues—and by collecting data on every regular season and playoff game in the MLB, NBA, NFL, and...

Continue reading →

Apr 16, 2014

Finding the Most Gerrymandered Districts

Yesterday, I came across an interesting Vox.com article discussing Congressional gerrymandering. In one of the article’s cards, author Andrew Prokop highlighted several of the country’s most gerrymandered districts. Having recently crunched some numbers on geographic data, why not try to quantitatively define the most gerrymandered districts and states?

Defining Gerrymandering

As Prokop noted, there’s not a great way to determine if a district is gerrymandered. Nevertheless, researchers have proposed a few ideas to approximate it. The proposals largely measure gerrymandering in one of two ways: By calculating how far various points on the district’s boundary are from the district’s geographic center, and by comparing the perimeter of the district to that of a similar-sized district with a regular shape (in this case, a circle). Both calculations are far from perfect—the first...

Continue reading →

Apr 11, 2014

Plotting the Rest of the Baseball Season

We’re less than two weeks into the 2014 baseball season, and most people would say that it’s too early to make any forecasts about the rest of the year.

Still, as others have noted, though ten games only represents 6% of an MLB season, surely these early games provide some indication of how a team will finish. Does Milwaukee’s 7-2 start mean that they may not be the sub-.500 team they were predicted to be? Are the 4-8 Diamondbacks likely to even worse than expected?

The graphic below explores this question. It plots the full season for every team over the last 10 years, or 300 seasons in total. By filtering by record, you can see how teams with similar starts fared over the rest of the season, and how this compares to an average season. (Because the graphic is loading nearly 50,000 games, it takes moment to first display.)

Note that for records with fewer than 5 teams, the...

Continue reading →

Mar 27, 2014

FiveThirtyEight vs. The Oddsmakers

You come at the king, you best not miss. - Omar Little

At the start of this year’s NCAA tournament, FiveThirtyEight, the new website of reigning forecast champion Nate Silver, predicted each team’s chances of making it to different rounds of the tournament. In an update yesterday, FiveThirtyEight looked into how their forecasts were doing. Having made my own predictive bracket based on Las Vegas odds, I figured I’d do the same—and see who comes out on top.

How Did FiveThirtyEight Do?

Rather than simply forecasting winners, FiveThirtyEight’s predictions—like mine—calculate each team’s probability of winning every game. To assess how well these forecasts performed, it’s not appropriate to see how many of their “favorites” won. Instead, it’s better to see if favorites win more or less often than expected. In other words, if FiveThirtyEight identified 100 games in which the favorite had...

Continue reading →

Mar 18, 2014

The Odds of Your NCAA Bracket

This year, Warren Buffett promised a billion dollars to anyone who picks a perfect bracket. Unfortunately, the odds aren’t in your favor—the chance of picking a perfect bracket if you pick every game at random is one in 9 quintillion (or 9,000,000,000,000,000,000).

But that’s just a hypothetical bracket. What are the odds of your bracket? The interactive below lets you figure that out. Using the betting lines for each game and on each team’s chances of the making the Final Four and winning the NCAA Championship, the graphic calculates the odds of every possible NCAA matchup—and every possible NCAA bracket. The graphic also shows how each team affects your bracket’s odds, and which picks lower your chances of winning a billion dollars the most.

Click on the bracket to see the interactive

The odds for first round games are calculated using the betting lines for those games. In...

Continue reading →

Feb 27, 2014

Engineering a Best Picture

When Netflix wanted to create a hit TV show, it turned to data. By analyzing its viewers habits, Netflix uncovered that its customers particularly liked Kevin Spacey, director David Fincher, and political thrillers. In part because of these interests, Netflix brought the three together to create House of Cards—and thus far, the results have been tremendous.

Having binged our way through Season 2 of House of Cards, we in the entertainment world now turn our attention to the Oscars, and particularly, the race for Best Picture. In doing so, perhaps we could take a page from Netflix’s book. Perhaps, using data about movies and the relationships between them, we can identify a perfect cocktail of movie attributes—PG-13-rated biopics about celebrities, or heart-wrenching World War II stories directed by Steven Spielberg, or anything related to Michael Bay—that strikes every Best Picture...

Continue reading →