Urban transportation expertise gives HuMNet Lab an edge in MIT Big Data Challenge

April 1, 2014

By Denise Brehm
Civil & Environmental Engineering

When the MIT Big Data Challenge asked, “What can you learn from data about 2.3 million taxi rides?” graduate students in Professor Marta González’ research lab had some answers.

Based on their experience writing machine-learning algorithms that find meaningful patterns in very large data sets, and their skill applying those patterns to understand how people use transportation in urban areas, the students were able to predict the number of taxi pickups that had occurred in 700 time intervals at 36 locations in the Boston area.

This image from Jameson Toole's scientific visualization of Boston taxi data shows the roads used by taxis that picked up people in Boston's financial district on an average day. Bright white indicates one or more taxis traveled on a road after leaving the financial district. Pink increasing in hue to purple correlates with increased taxi use on a road segment.  Image / Jameson Toole
Jameson Toole's scientific visualization of Boston taxi data. Image / Jameson Toole. View the image in a larger size.

Their predictions were the best in the competition, earning them the number one spot and $4,000 in prize money. The scientific visualization of the data prepared by one team member garnered a second-place prize and an additional $1,000. The awards were announced mid-March.

Graduate student Yingxiang Yang of the CEE led the prediction team, which included graduate students Lauren Alexander, Serdar Colak and Suma Desu, and Engineering Systems Division graduate student Jameson Toole, whose visualization won second prize.

Experience in city science applications gives an edge

The students work with González, an assistant professor in CEE, whose Human Mobility and Networks (HuMNet) Lab culls through massive repositories of passive data generated by cellphones and other networked systems using methods from statistical physics and network theory to identify relevant patterns and make inferences about human mobility and other aspects of city science.

Yingxiang Yang
Yingxiang Yang

Their familiarity with human mobility in urban areas gave the team an edge in the MIT Big Data Challenge, Yang and Toole say.

“The key, and the hard part really, is to figure out what features in all these data sets are going to be useful and which can be ignored,” Yang says. For instance, they knew already that precipitation is the most relevant weather predictor in transportation decisions, so they could ignore other weather data.

The MIT Big Data Challenge: Transportation in the City of Boston, was sponsored by MIT’s Computer Science and Artificial Intelligence Laboratory, the City of Boston and Transportation@MIT. The goal was for competitors to develop algorithms that could take several large data sets and anticipate taxi need in prescribed locations around Boston. Some also created compelling spatial visualizations to convey the data in insightful ways.

Competition officials provided roughly six months of data including hourly weather conditions, numbers and locations of taxi dropoffs and pickups divided into two-hour intervals, transaction data from the MBTA, information about events, and geolocations of tweets. The taxi pickup data omitted information from 700 of the two-hour intervals and required teams to predict that missing information.

While many of the other teams placing among the top finalists (250 teams registered for the competition) focus on machine-learning tools and/or artificial intelligence as an end, that’s only the beginning of the process for the HuMNet Lab.

Jameson Toole
Jameson Toole

“I always tell my students: Use human intelligence to inform artificial intelligence,” says González. “We want to apply our results to real-world problems.”

This is why Toole, in preparing his visualization, determined the likely routes taken by taxis on airport runs. “Our brain doesn’t think in terms of census tracts; it thinks about streets,” Toole says. “So I wanted to map things to roads because that’s the way we know the city.”

Instead of showing only the basics — highlighting the number of taxi pickups and dropoffs by date and time of day — he added the numbers per census tract, displayed when the cursor rolls over a tract. But it also goes beyond the traditional heatmap to show taxi routes and magnitude of road use when a user clicks on a census tract.

He included the census tract boundaries, because the demographics available for census tracts are valuable information for the HuMNet Lab to use in later research for making inferences about population groups and activities at locations visited.

HuMNet Lab mines passive data to find underlying motifs

Examples of González’ and the HuMNet Lab’s research is the mining of cellphone and census data to pinpoint the feeder roads and source communities that generate most of the traffic congestion in Boston and San Francisco metropolitan areas, and discovering underlying common motifs in the daily travel behavior of entire populations of cities on different continents.

Marta González
Marta González

González hopes her work can one day be fed back online to the urbanites whose data she uses, helping them make better travel decisions and allowing them to interact with the information to help other users.

She teaches a graduate subject on big data, 1.204 Transportation Networks. And beginning next spring, will offer a new undergraduate subject, 1.022 Urban Networks, that will draw on engineering, applied mathematics, computer science and statistical physics to analyze real-world data sets. She’s also excited about a n