Skip to navigationSkip to contentSkip to footer

Categories

Before autonomous vehicles scale up, researchers call for stronger scientific standards

Before autonomous vehicles scale up, researchers call for stronger scientific standards
Share

A new study finds autonomous mobility-on-demand research must become more transparent and reproducible to guide real-world transportation decisions

As cities and companies move towards fleets of self-driving taxis, a group of researchers is urging the field to build stronger scientific standards first.

In a new study published in IEEE Transactions on Robotics, a team from MIT and collaborating institutions examined a growing problem in the field of autonomous mobility-on-demand (AMoD) systems research: lack of reproducibility.  

AMoD systems are fleets of self-driving cars that respond to ride requests via an app. They are designed as a solution to improve urban efficiency by reducing traffic, emissions, and the cost of car ownership. But behind that solution lies a complex challenge, which is determining how to control thousands of vehicles in real time so that passenger wait times, traffic, and operating costs are minimized.

“Research in autonomous mobility sits at the intersection of transportation, operations research, robotics, and control,” says Xinling Li, co-leading author and graduate student in civil and environmental engineering at MIT. “Each field has distinct norms for what counts as a reproducible result.” 

Co-leading author Meshal Alharbi, a graduate student in electrical engineering and computer science at MIT, says the challenge stems from the many interconnected assumptions that shape these studies.

“Reproducibility becomes especially challenging because the results depend on many interconnected choices at once — the system model, demand assumptions, congestion model, simulator setup,” he says. “If these elements are not reported clearly, the results become difficult to reproduce.”

A hidden problem in simulation-based research

Most AMoD work is tested on simulations. Researchers build mathematical digital models of cities, generate virtual rider demand, and then test dispatching and routing algorithms to see how fleets perform. But the researchers found that those simulations often differ in subtle ways that are not fully documented.

Two studies may appear comparable on paper while actually relying on very different assumptions.

Even small choices about how traffic congestion is shown or how ride requests are made can really change results. Without documentation, it becomes difficult to understand why one method works better than another.

Gioele Zardini, senior author of the study and the Rudge (1948) and Nancy Allen Career Development Professor, and assistant professor in civil and environmental engineering at MIT, compares the situation to cooking without a clear recipe.

“Think of two chefs following the same recipe but not telling anyone they changed the ingredients or oven temperature,” Zardini says. “When the dishes come out differently and no one knows why, you can’t tell which recipe actually works — or whether either one does.”

Why reproducibility matters for cities

If AMoD research is not reproducible, it could lead to risky decisions for cities and transportation agencies that rely on the studies when deciding whether to deploy large-scale autonomous fleets.

“A municipality could deploy a large-scale robo-taxi fleet based on simulation studies that report congestion reduction, but whose assumptions are not fully documented,” says Zardini. “The real-world system may generate more empty vehicle miles, worsen traffic, under-serve certain neighborhoods.”

The problem also slows scientific progress. When researchers leave out important assumptions, data, or details about implementation, other researchers cannot build upon their work.

“What surprised us most,” says Li, “was how rarely code and dataset processing steps are available. Without them, replication is either impossible or requires re-implementing large parts of the work from scratch.”

A framework for research

To address the problem, the researchers created a framework for conducting and reporting AMoD research. They broke down the research process into parts: how the transportation system is modeled, how control strategies are designed, how simulations are implemented, and how performance is evaluated. For each stage, they identify common sources of inconsistency and made a checklist for researchers to follow.

The researchers recommend that modeling assumptions should be reported, evaluation metrics should be standardized, and simulation code and data should be shared for greater transparency. If researchers share their tools and set up clear benchmark scenarios, the field can move towards more comparable results.  Such changes could accelerate progress rather than slow it. When research is reproducible, it makes it easier to identify weak approaches that can be refined and improved. It also builds trust.  Both within the scientific community and among policymakers who are considering AMoD adoption.

Building trust in autonomous mobility

The need for stronger scientific standards comes at a critical moment for the autonomous vehicle industry. While robo-taxis are not yet widely deployed, pilot programs with companies like Waymo, continue to expand trials in different cities.

“When companies try to scale robo-taxi systems they must rely on large-scale control studies, and if those studies hide assumptions or can’t be replicated the deployed system will behave differently, often worse, than promised,” says Zardini. “That breaks public trust, wastes money, and slows safe roll-out.”

Reproducibility will also be essential as cities try to adapt these systems to their own transportation networks.

If the city of Boston were to launch an AMoD system, the researchers say their framework could help reveal underreported assumptions about travel demand, traffic congestion patterns, passenger behavior, or data preprocessing that can dramatically change outcomes once a system is deployed at scale. By bringing those assumptions out into the open, it provides a structured way to document the modeling choices, constraints, datasets, and evaluation criteria behind a result, so a city can stress-test policies under different demand scenarios and local conditions before committing to large-scale deployment.

“Our hope is that this paper can support a more transparent, rigorous research culture for future urban mobility and large-scale autonomous systems,” Zardini says.

The study includes researchers from MIT, Stanford University, ETH Zurich, Google DeepMind, the Technical University of Denmark, and the Technical University of Munich.

The research is supported by the U.S. Department of Energy, the Sidara Urban Seed Grant Program at MIT’s Leventhal Center for Advanced Urbanism, and the MIT Amazon Science Hub.

Share on Bluesky