Skip to content

🔥🔥🔥Official Codebase of "DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation"

License

Notifications You must be signed in to change notification settings

EllingtonKirby/DiT-3D

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

🔥🔥🔥DiT-3D is a novel Diffusion Transformer for 3D shape generation, which can directly operate the denoising process on voxelized point clouds using plain Transformers.

DiT-3D Illustration

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Shentong Mo, Enze Xie, Ruihang Chu, Lanqing Hong, Matthias Nießner, Zhenguo Li
arXiv 2023.

Requirements

Make sure the following environments are installed.

python==3.6
pytorch==1.7.1
torchvision==0.8.2
cudatoolkit==11.0
matplotlib==2.2.5
tqdm==4.32.1
open3d==0.9.0
trimesh=3.7.12
scipy==1.5.1

Install PyTorchEMD by

cd metrics/PyTorchEMD
python setup.py install
cp build/**/emd_cuda.cpython-36m-x86_64-linux-gnu.so .
# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
# GPU kNN
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

Or please simply run

pip install -r requirements.txt

Data

For generation, we use ShapeNet point cloud, which can be downloaded here.

Pretrained models

Pretrained models can be downloaded here.

Note that this pre-trained model is based on Small with a patch size of 4. We reported the XL models to the main table in our paper for final comparisons.

Training

Our DiT-3D supports multiple configuration settings:

  • voxel sizes: 16, 32, 64
  • patch dimensions: 2, 4, 8
  • model complexity: Small (S), Base (B), Large (L) and Extra Large (XL)

For training the DiT-3D model (Small, patch dim 4) with a voxel size of 32 on chair, please run

$ python train.py --distribution_type 'multi' \
    --dataroot /path/to/ShapeNetCore.v2.PC15k/ \
    --category chair \
    --experiment_name /path/to/experiments \
    --model_type 'DiT-S/4' \
    --bs 16 \
    --voxel_size 32 \
    --lr 1e-4 \
    --use_tb
# for using window attention, please add flags below
    --window_size 4 --window_block_indexes '0,3,6,9'

Please check more training scripts in the scripts folder.

During training, we train each model using each category for 10,000 epochs. We evaluated the test set using checkpoints saved every 25 epochs and reported the best results.

Testing

For testing and visualization on chair using the DiT-3D model (S/4, no window attention) with voxel size of 32, please run

$ python test.py --dataroot ../../../data/ShapeNetCore.v2.PC15k/ \
    --category chair --num_classes 1 \
    --bs 64 \
    --model_type 'DiT-S/4' \
    --voxel_size 32 \
    --model MODEL_PATH

Testing this S/4 model, you should get performance close to the tables below.

Model Train Class Test Class 1-NNA-CD 1-NNA-EMD COV-CD COV-EMD
DiT-3D-S/4 Chair Chair 56.31 55.82 47.21 50.75

For point clouds rendering, we use mitsuba for visualization.

Citation

If you find this repository useful, please cite our paper:

@article{mo2023dit3d,
  title = {DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation},
  author = {Shentong Mo and Enze Xie and Ruihang Chu and Lewei Yao and Lanqing Hong and Matthias Nießner and Zhenguo Li},
  journal = {arXiv preprint arXiv: 2307.01831},
  year = {2023}
}

Acknowledgement

This repo is inspired by DiT and PVD. Thanks for their wonderful works.

About

🔥🔥🔥Official Codebase of "DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.9%
  • Cuda 19.7%
  • C++ 6.8%
  • Shell 0.6%