Skip to content

Latest commit





Zero-Shot Text-Guided Object Generation with Dream Fields

by Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, and Ben Poole

Project website | arXiv paper

Watch the video


This code implements Dream Fields, a way to synthesize 3D objects from natural language prompts.

Abstract: We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects solely from natural language descriptions. Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision. Due to the scarcity of diverse, captioned 3D data, prior methods only generate objects from a handful of categories, such as ShapeNet. Instead, we guide generation with image-text models pre-trained on large datasets of captioned images from the web. Our method optimizes a Neural Radiance Field from many camera views so that rendered images score highly with a target caption according to a pre-trained CLIP model. To improve fidelity and visual quality, we introduce simple geometric priors, including sparsity-inducing transmittance regularization, scene bounds, and new MLP architectures. In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.

Running with Docker

We provide a Dockerfile based on a NVIDIA NGC container. Pull the base container:

docker pull

Run at low quality on all GPUs:

bash all "matte painting of a bonsai tree; trending on artstation."

Videos will be written to results/. To specify a subset of GPUs, use:

bash '"device=0,1,2,3"' "matte painting of a bonsai tree; trending on artstation."

To monitor training, run Tensorboard with:

bash 6006

We provide three configuration files. config/ provides low quality, faster (~30 minute) text-to-3D synthesis. config/ and config/ provide higher quality by rendering higher resolutions and using more augmentations during training. For low quality results, 4 16GB GPUs should be enough. Modify to use these configs if sufficient resources are available. If you run out of memory, lower render_width and crop_width or n_local_aug in config/

Running in a virtual environment

Python 3 is required. Create and activate a virtual environment:

python -m venv env
source env/bin/activate

Install JAX with GPU or TPU support following the JAX docs, depending on the accelerator you have available. For example, for CUDA 11.1:

pip install --upgrade pip
pip install --upgrade jax[cuda] -f

Test your installation with:

python -c "print(__import__('jax').local_devices())"

Then, install dependencies:

pip install -r requirements.txt

To run on all visible GPUs:

python --config=config/ --query="bouquet of flowers sitting in a clear glass vase."


Please cite our paper if you find this code or research relevant:

  author = {Jain, Ajay and Mildenhall, Ben and Barron, Jonathan T. and Abbeel, Pieter and Poole, Ben},
  title = {Zero-Shot Text-Guided Object Generation with Dream Fields},
  journal = {arXiv},
  month = {December},
  year = {2021},