Skip to content

LMD0311/Awesome-World-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 

Repository files navigation

Awesome World Models for Autonomous Driving Awesome

Collect some World Models (for Autonomous Driving) papers.

If you find some ignored papers, feel free to create pull requests, open issues, or email me. Contributions in any form to make this list more comprehensive are welcome. 📣📣📣

If you find this repository useful, please consider citing and giving us a star 🌟.

Feel free to share this list with others! 🥳🥳🥳

Workshop & Challenge

Papers

World model original paper

  • Using Occupancy Grids for Mobile Robot Perception and Navigation [paper]

Technical blog or video

  • Yann LeCun: A Path Towards Autonomous Machine Intelligence [paper] [Video]

  • CVPR'23 WAD Keynote - Ashok Elluswamy, Tesla [Video]

  • Wayve Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy [blog]

    World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations.

Survey

  • A survey on multimodal large language models for autonomous driving. WACVW 2024 [Paper] [Code]
  • World Models for Autonomous Driving: An Initial Survey. 2024.3, arxiv [Paper]

2024

  • [ViDAR] Visual Point Cloud Forecasting enables Scalable Autonomous Driving. CVPR 2024 [Paper] [Code]
  • [GenAD] Generalized Predictive Model for Autonomous Driving. CVPR 2024 [Paper] [Data]
  • [Cam4DOCC] Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications. CVPR 2024 [Paper] [Code]
  • [Drive-WM] Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving. CVPR 2024 [Paper] [Code]
  • [DriveWorld] DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving. CVPR 2024 [Code]
  • [Panacea] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving. CVPR 2024 [Paper] [Code]
  • [MagicDrive] MagicDrive: Street View Generation with Diverse 3D Geometry Control. ICLR 2024 [Paper] [Code]
  • [Copilot4D] Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion. ICLR 2024 [Paper]
  • [SafeDreamer] SafeDreamer: Safe Reinforcement Learning with World Models. ICLR 2024 [Paper] [Code]
  • [RoboDreamer] RoboDreamer: Learning Compositional World Models for Robot Imagination. 2024.4, arxiv [Paper] [Code]
  • [LidarDM] LidarDM: Generative LiDAR Simulation in a Generated World. 2024.4, arxiv [Paper] [Code]
  • [3D-VLA] 3D-VLA: A 3D Vision-Language-Action Generative World Model. 2024.3, arxiv [Paper]
  • [DriveDreamer-2] DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation. 2024.3, arxiv [Paper] [Code]
  • [Think2Drive] Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving. 2024.2, arxiv [Paper]

2023

  • [TrafficBots] TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction. ICRA 2023 [Paper] [Code]
  • [WoVoGen] WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation. 2023.12, arxiv [Paper] [Code]
  • [CTT] Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent. 2023.11, arxiv [Paper]
  • [OccWorld] OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving. 2023.11, arxiv [Paper] [Code]
  • [MUVO] MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations. 2023.11, arxiv [Paper]
  • [DrivingDiffusion] DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model. 2023.10, arxiv [Paper] [Code]
  • [GAIA-1] GAIA-1: A Generative World Model for Autonomous Driving. 2023.9, arxiv [Paper]
  • [ADriver-I] ADriver-I: A General World Model for Autonomous Driving. 2023.9, arxiv [Paper]
  • [DriveDreamer] DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving. 2023.9, arxiv [Paper] [Code]
  • [UniWorld] UniWorld: Autonomous Driving Pre-training via World Models. 2023.8, arxiv [Paper] [Code]

2022

  • [MILE] Model-Based Imitation Learning for Urban Driving. NeurIPS 2022 [Paper] [Code]
  • [Symphony] Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation. ICRA 2022 [Paper]
  • Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving. IROS 2022 [Paper]

Other World Model Paper

2024

  • [Genie] Genie: Generative Interactive Environments. DeepMind [Paper] [Blog]
  • [Sora] Video generation models as world simulators. OpenAI [Technical report]
  • [IWM] Learning and Leveraging World Models in Visual Representation Learning. Meta AI [Paper]
  • [V-JEPA] V-JEPA: Video Joint Embedding Predictive Architecture. Meta AI [Blog] [Paper] [Code]
  • [Newton] Newton™ – a first-of-its-kind foundation model for understanding the physical world. Archetype AI [Blog]
  • [MAMBA] MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning. ICLR 2024 [Paper] [Code]
  • [Compete and Compose] Compete and Compose: Learning Independent Mechanisms for Modular World Models. 2024.4, arxiv [Paper]
  • [MagicTime] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators. 2024.4, arxiv [Paper] [Code]
  • [Dreaming of Many Worlds] Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization. 2024.3, arxiv [Paper] [Code]
  • [ManiGaussian] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. 2024.3, arxiv [Paper] [Code]
  • [LWM] World Model on Million-Length Video And Language With RingAttention. 2024.2, arxiv [Paper] [Code]
  • Planning with an Ensemble of World Models. OpenReview [Paper]
  • [WorldDreamer] WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens. 2024.1, arxiv [Paper] [Code]