Urban transportation expertise gives HuMNet Lab an edge in MIT Big Data Challenge

April 1, 2014

By Denise Brehm
Civil & Environmental Engineering

When the MIT Big Data Challenge asked, “What can you learn from data about 2.3 million taxi rides?” graduate students in Professor Marta González’ research lab had some answers.

Based on their experience writing machine-learning algorithms that find meaningful patterns in very large data sets, and their skill applying those patterns to understand how people use transportation in urban areas, the students were able to predict the number of taxi pickups that had occurred in 700 time intervals at 36 locations in the Boston area.

This image from Jameson Toole's scientific visualization of Boston taxi data shows the roads used by taxis that picked up people in Boston's financial district on an average day. Bright white indicates one or more taxis traveled on a road after leaving the financial district. Pink increasing in hue to purple correlates with increased taxi use on a road segment.  Image / Jameson Toole
Jameson Toole's scientific visualization of Boston taxi data. Image / Jameson Toole. View the image in a larger size.

Their predictions were the best in the competition, earning them the number one spot and $4,000 in prize money. The scientific visualization of the data prepared by one team member garnered a second-place prize and an additional $1,000. The awards were announced mid-March.

Graduate student Yingxiang Yang of the CEE led the prediction team, which included graduate students Lauren Alexander, Serdar Colak and Suma Desu, and Engineering Systems Division graduate student Jameson Toole, whose visualization won second prize.

Experience in city science applications gives an edge

The students work with González, an assistant professor in CEE, whose Human Mobility and Networks (HuMNet) Lab culls through massive repositories of passive data generated by cellphones and other networked systems using methods from statistical physics and network theory to identify relevant patterns and make inferences about human mobility and other aspects of city science.

Yingxiang Yang
Yingxiang Yang

Their familiarity with human mobility in urban areas gave the team an edge in the MIT Big Data Challenge, Yang and Toole say.

“The key, and the hard part really, is to figure out what features in all these data sets are going to be useful and which can be ignored,” Yang says. For instance, they knew already that precipitation is the most relevant weather predictor in transportation decisions, so they could ignore other weather data.

The MIT Big Data Challenge: Transportation in the City of Boston, was sponsored by MIT’s Computer Science and Artificial Intelligence Laboratory, the City of Boston and Transportation@MIT. The goal was for competitors to develop algorithms that could take several large data sets and anticipate taxi need in prescribed locations around Boston. Some also created compelling spatial visualizations to convey the data in insightful ways.

Competition officials provided roughly six months of data including hourly weather conditions, numbers and locations of taxi dropoffs and pickups divided into two-hour intervals, transaction data from the MBTA, information about events, and geolocations of tweets. The taxi pickup data omitted information from 700 of the two-hour intervals and required teams to predict that missing information.

While many of the other teams placing among the top finalists (250 teams registered for the competition) focus on machine-learning tools and/or artificial intelligence as an end, that’s only the beginning of the process for the HuMNet Lab.

Jameson Toole
Jameson Toole

“I always tell my s