Tier IV to build massive autonomous driving dataset with Japan’s NEDO support
Press release, 28 December 2025
Japanese autonomous driving software innovator Tier IV has been selected by the government’s New Energy and Industrial Technology Development Organization (NEDO) to lead a major initiative aimed at creating one of the most comprehensive data resources ever assembled for autonomous vehicle AI training and validation. The project will combine large-scale real-world driving data with synthetic data generated by advanced multimodal artificial intelligence models, helping to overcome one of the biggest bottlenecks in self-driving development — the scarcity of diverse, high-quality scenarios necessary to train and validate autonomous systems.
Tier IV, renowned for its open-source Autoware software used by many vehicle makers and autonomous technology developers worldwide, will gather extensive sensor data from a variety of vehicles, covering different driving environments, weather conditions, traffic densities and rare edge cases that are difficult to capture in ordinary testing. Collecting and accurately labeling this kind of real-world data is traditionally slow, costly and labor-intensive, especially when it comes to unusual but safety-critical situations such as near-miss collisions or complex urban interactions. By integrating automated labeling technologies and optimizing dataset construction workflows, Tier IV aims to make this enormous data trove not only richer but also more accessible and usable for engineers across the industry.
Realistic training data is the lifeblood of AI-driven autonomous driving systems. While modern self-driving stacks ingest data from cameras, LiDAR, radar and other sensors, much of today’s machine learning still struggles to generalize to conditions that are rare yet crucial for safety. To bridge this gap, the Tier IV project will leverage multimodal generative AI — models that can synthesize plausible sensor data spanning dozens of modalities and scenarios. This isn’t just about creating more data, but about generating the right data: scenes before accidents occur, unusual weather patterns, nighttime traffic flow and combinations of factors that a typical test fleet might never encounter in millions of kilometers of driving. Such synthetic augmentation can dramatically expand the variety and depth of training sets while reducing dependence on purely real-world captures.
Beyond data collection and synthetic generation, the initiative also includes development of an AI data platform and ecosystem. Tier IV plans to create infrastructure that enables autonomous driving developers — from startups to established automakers — to securely access and exploit this dataset for research, simulation, validation and algorithm refinement. By lowering the barrier to high-quality data, the project aspires to accelerate innovation across the Japanese autonomous vehicle sector and help smaller players contribute to safety improvements without shouldering the full cost of dataset creation themselves.
The collaboration aligns with broader strategic goals within Japan’s mobility landscape. As the nation encourages safe deployment of advanced driver-assistance systems and autonomous services — including low-speed robotaxis and shuttle services in controlled environments — the availability of extensive, diverse, well-labeled data becomes a competitive advantage. High-level autonomy, particularly beyond SAE Level 3, demands robust models capable of handling complex real-world variability; building a dataset that addresses edge cases and rare circumstances is essential for meeting those requirements.
Machine learning models trained on both real and synthetic data will be essential not just for perception — the ability to recognize and classify objects around the vehicle — but also for prediction and planning systems that forecast other road users’ behavior and make split-second decisions. Synthetic data generated by multimodal AI can help simulate scenarios that would be impractical, unsafe or prohibitively expensive to stage in physical testing, speeding up development cycles and improving safety verification before any model sees the open road.
For Tier IV, which already serves as a cornerstone of many autonomous development efforts through Autoware, this dataset project represents a natural extension of its mission to democratize access to autonomous technology tools and accelerate industry-wide progress. By partnering with NEDO and leveraging cutting-edge AI, the company is positioning itself at the forefront of data-centric approaches to autonomy — an area increasingly recognized as vital for achieving truly reliable self-driving systems.
As the automotive world transitions toward more sophisticated levels of automation, initiatives like this could help solve one of the field’s most persistent challenges: how to provide AI systems with enough meaningful, diverse experience to handle the unpredictability of real traffic. If successful, the Tier IV dataset could not only boost Japan’s competitiveness in autonomous driving technology but also serve as a foundational resource for developers globally.


