Awesome World Models for Autonomous Driving

Collect some World Models (for Autonomous Driving) papers.

If you find some ignored papers, feel free to create pull requests, open issues, or email me. Contributions in any form to make this list more comprehensive are welcome. 📣📣📣

If you find this repository useful, please consider citing and giving us a star 🌟.

Feel free to share this list with others! 🥳🥳🥳

Workshop & Challenge

CVPR 2024 Workshop & Challenge | OpenDriveLab Track #4: Predictive World Model.

Serving as an abstract spatio-temporal representation of reality, the world model can predict future states based on the current state. The learning process of world models has the potential to elevate a pre-trained foundation model to the next level. Given vision-only inputs, the neural network outputs point clouds in the future to testify its predictive capability of the world.
CVPR 2023 Workshop on Autonomous Driving CHALLENGE 3: ARGOVERSE CHALLENGES, 3D Occupancy Forecasting using the Argoverse 2 Sensor Dataset. Predict the spacetime occupancy of the world for the next 3 seconds.

Papers

World model original paper

Using Occupancy Grids for Mobile Robot Perception and Navigation [paper]

Technical blog or video

Yann LeCun: A Path Towards Autonomous Machine Intelligence [paper] [Video]
CVPR'23 WAD Keynote - Ashok Elluswamy, Tesla [Video]
Wayve Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy [blog]

World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations.

Survey

A survey on multimodal large language models for autonomous driving. WACVW 2024 [Paper] [Code]
World Models for Autonomous Driving: An Initial Survey. 2024.3, arxiv [Paper]

2024

[ViDAR] Visual Point Cloud Forecasting enables Scalable Autonomous Driving. CVPR 2024 [Paper] [Code]
[GenAD] Generalized Predictive Model for Autonomous Driving. CVPR 2024 [Paper] [Data]
[Cam4DOCC] Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications. CVPR 2024 [Paper] [Code]
[Drive-WM] Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving. CVPR 2024 [Paper] [Code]
[DriveWorld] DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving. CVPR 2024 [Code]
[Panacea] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving. CVPR 2024 [Paper] [Code]
[MagicDrive] MagicDrive: Street View Generation with Diverse 3D Geometry Control. ICLR 2024 [Paper] [Code]
[Copilot4D] Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion. ICLR 2024 [Paper]
[SafeDreamer] SafeDreamer: Safe Reinforcement Learning with World Models. ICLR 2024 [Paper] [Code]
[RoboDreamer] RoboDreamer: Learning Compositional World Models for Robot Imagination. 2024.4, arxiv [Paper] [Code]
[LidarDM] LidarDM: Generative LiDAR Simulation in a Generated World. 2024.4, arxiv [Paper] [Code]
[3D-VLA] 3D-VLA: A 3D Vision-Language-Action Generative World Model. 2024.3, arxiv [Paper]
[DriveDreamer-2] DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation. 2024.3, arxiv [Paper] [Code]
[Think2Drive] Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving. 2024.2, arxiv [Paper]

2023

[TrafficBots] TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction. ICRA 2023 [Paper] [Code]
[WoVoGen] WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation. 2023.12, arxiv [Paper] [Code]
[CTT] Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent. 2023.11, arxiv [Paper]
[OccWorld] OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving. 2023.11, arxiv [Paper] [Code]
[MUVO] MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations. 2023.11, arxiv [Paper]
[DrivingDiffusion] DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model. 2023.10, arxiv [Paper] [Code]
[GAIA-1] GAIA-1: A Generative World Model for Autonomous Driving. 2023.9, arxiv [Paper]
[ADriver-I] ADriver-I: A General World Model for Autonomous Driving. 2023.9, arxiv [Paper]
[DriveDreamer] DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving. 2023.9, arxiv [Paper] [Code]
[UniWorld] UniWorld: Autonomous Driving Pre-training via World Models. 2023.8, arxiv [Paper] [Code]

2022

[MILE] Model-Based Imitation Learning for Urban Driving. NeurIPS 2022 [Paper] [Code]
[Symphony] Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation. ICRA 2022 [Paper]
Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving. IROS 2022 [Paper]

Other World Model Paper

2024

[Genie] Genie: Generative Interactive Environments. DeepMind [Paper] [Blog]
[Sora] Video generation models as world simulators. OpenAI [Technical report]
[IWM] Learning and Leveraging World Models in Visual Representation Learning. Meta AI [Paper]
[V-JEPA] V-JEPA: Video Joint Embedding Predictive Architecture. Meta AI [Blog] [Paper] [Code]
[Newton] Newton™ – a first-of-its-kind foundation model for understanding the physical world. Archetype AI [Blog]
[MAMBA] MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning. ICLR 2024 [Paper] [Code]
[Compete and Compose] Compete and Compose: Learning Independent Mechanisms for Modular World Models. 2024.4, arxiv [Paper]
[MagicTime] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators. 2024.4, arxiv [Paper] [Code]
[Dreaming of Many Worlds] Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization. 2024.3, arxiv [Paper] [Code]
[ManiGaussian] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. 2024.3, arxiv [Paper] [Code]
[LWM] World Model on Million-Length Video And Language With RingAttention. 2024.2, arxiv [Paper] [Code]
Planning with an Ensemble of World Models. OpenReview [Paper]
[WorldDreamer] WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens. 2024.1, arxiv [Paper] [Code]

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.gitattributes		.gitattributes
ContributionGuidelines.md		ContributionGuidelines.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

ContributionGuidelines.md

ContributionGuidelines.md

README.md

README.md

Repository files navigation

Awesome World Models for Autonomous Driving

Workshop & Challenge

Papers

World model original paper

Technical blog or video

Survey

2024

2023

2022

Other World Model Paper

2024

About

Releases

Packages

Contributors 2

LMD0311/Awesome-World-Model

Folders and files

Latest commit

History

Repository files navigation

Awesome World Models for Autonomous Driving

Workshop & Challenge

Papers

World model original paper

Technical blog or video

Survey

2024

2023

2022

Other World Model Paper

2024

About

Topics

Resources

Stars

Watchers

Forks