Skip to content

ShihuaHuang95/DEIM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DEIM: DETR with Improved Matching for Fast Convergence

license arXiv project webpage prs issues stars Contact Us

DEIM is an advanced training framework designed to enhance the matching mechanism in DETRs, enabling faster convergence and improved accuracy. It serves as a robust foundation for future research and applications in the field of real-time object detection.


Shihua Huang1, Zhichao Lu2, Xiaodong Cun3, Yongjun Yu1, Xiao Zhou4, Xi Shen1*

1. Intellindust AI Lab Β  2. City University of Hong Kong Β  3. Great Bay University Β  4. Hefei Normal University

**πŸ“§ Corresponding author:** [email protected]

If you find our work helpful, please consider giving us a ⭐!

Image 1 Image 2

πŸš€ Updates

  • [2024.12.03] Release DEIM series. Besides, this repo also supports the re-implmentations of D-FINE and RT-DETR.

Table of Content

1. Model Zoo

DEIM-D-FINE

Model Dataset APval #Params Latency GFLOPs config checkpoint
S COCO 49.0 10M 3.49ms 25 yml ckpt
M COCO 52.7 19M 5.62ms 57 yml ckpt
L COCO 54.7 31M 8.07ms 91 yml ckpt
X COCO 56.5 62M 12.89ms 202 yml ckpt

DEIM-RTDETRv2

Model Dataset APval #Params Latency GFLOPs config checkpoint
S COCO 49.0 20M 4.59ms 60 yml ckpt
M COCO 50.9 31M 6.40ms 92 yml ckpt
M* COCO 53.2 33M 6.90ms 100 yml ckpt
L COCO 54.3 42M 9.15ms 136 yml ckpt
X COCO 55.5 76M 13.66ms 259 yml ckpt

2. Quick start

Setup

conda create -n deim python=3.11.9
conda activate deim
pip install -r requirements.txt

Data Preparation

COCO2017 Dataset
  1. Download COCO2017 from OpenDataLab or COCO.

  2. Modify paths in coco_detection.yml

    train_dataloader:
        img_folder: /data/COCO2017/train2017/
        ann_file: /data/COCO2017/annotations/instances_train2017.json
    val_dataloader:
        img_folder: /data/COCO2017/val2017/
        ann_file: /data/COCO2017/annotations/instances_val2017.json
Custom Dataset

To train on your custom dataset, you need to organize it in the COCO format. Follow the steps below to prepare your dataset:

  1. Set remap_mscoco_category to False:

    This prevents the automatic remapping of category IDs to match the MSCOCO categories.

    remap_mscoco_category: False
  2. Organize Images:

    Structure your dataset directories as follows:

    dataset/
    β”œβ”€β”€ images/
    β”‚   β”œβ”€β”€ train/
    β”‚   β”‚   β”œβ”€β”€ image1.jpg
    β”‚   β”‚   β”œβ”€β”€ image2.jpg
    β”‚   β”‚   └── ...
    β”‚   β”œβ”€β”€ val/
    β”‚   β”‚   β”œβ”€β”€ image1.jpg
    β”‚   β”‚   β”œβ”€β”€ image2.jpg
    β”‚   β”‚   └── ...
    └── annotations/
        β”œβ”€β”€ instances_train.json
        β”œβ”€β”€ instances_val.json
        └── ...
    • images/train/: Contains all training images.
    • images/val/: Contains all validation images.
    • annotations/: Contains COCO-formatted annotation files.
  3. Convert Annotations to COCO Format:

    If your annotations are not already in COCO format, you'll need to convert them. You can use the following Python script as a reference or utilize existing tools:

    import json
    
    def convert_to_coco(input_annotations, output_annotations):
        # Implement conversion logic here
        pass
    
    if __name__ == "__main__":
        convert_to_coco('path/to/your_annotations.json', 'dataset/annotations/instances_train.json')
  4. Update Configuration Files:

    Modify your custom_detection.yml.

    task: detection
    
    evaluator:
      type: CocoEvaluator
      iou_types: ['bbox', ]
    
    num_classes: 777 # your dataset classes
    remap_mscoco_category: False
    
    train_dataloader:
      type: DataLoader
      dataset:
        type: CocoDetection
        img_folder: /data/yourdataset/train
        ann_file: /data/yourdataset/train/train.json
        return_masks: False
        transforms:
          type: Compose
          ops: ~
      shuffle: True
      num_workers: 4
      drop_last: True
      collate_fn:
        type: BatchImageCollateFunction
    
    val_dataloader:
      type: DataLoader
      dataset:
        type: CocoDetection
        img_folder: /data/yourdataset/val
        ann_file: /data/yourdataset/val/ann.json
        return_masks: False
        transforms:
          type: Compose
          ops: ~
      shuffle: False
      num_workers: 4
      drop_last: False
      collate_fn:
        type: BatchImageCollateFunction

3. Usage

COCO2017
  1. Training
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml --use-amp --seed=0
  1. Testing
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml --test-only -r model.pth
  1. Tuning
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth
Customizing Batch Size

For example, if you want to double the total batch size when training D-FINE-L on COCO2017, here are the steps you should follow:

  1. Modify your dataloader.yml to increase the total_batch_size:

    train_dataloader:
        total_batch_size: 64  # Previously it was 32, now doubled
  2. Modify your deim_hgnetv2_l_coco.yml. Here’s how the key parameters should be adjusted:

    optimizer:
    type: AdamW
    params:
        -
        params: '^(?=.*backbone)(?!.*norm|bn).*$'
        lr: 0.000025  # doubled, linear scaling law
        -
        params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
        weight_decay: 0.
    
    lr: 0.0005  # doubled, linear scaling law
    betas: [0.9, 0.999]
    weight_decay: 0.0001  # need a grid search
    
    ema:  # added EMA settings
        decay: 0.9998  # adjusted by 1 - (1 - decay) * 2
        warmups: 500  # halved
    
    lr_warmup_scheduler:
        warmup_duration: 250  # halved
Customizing Input Size

If you'd like to train DEIM on COCO2017 with an input size of 320x320, follow these steps:

  1. Modify your dataloader.yml:

    train_dataloader:
    dataset:
        transforms:
            ops:
                - {type: Resize, size: [320, 320], }
    collate_fn:
        base_size: 320
    dataset:
        transforms:
            ops:
                - {type: Resize, size: [320, 320], }
  2. Modify your dfine_hgnetv2.yml:

    eval_spatial_size: [320, 320]

4. Tools

Deployment
  1. Setup
pip install onnx onnxsim
  1. Export onnx
python tools/deployment/export_onnx.py --check -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml -r model.pth
  1. Export tensorrt
trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16
Inference (Visualization)
  1. Setup
pip install -r tools/inference/requirements.txt
  1. Inference (onnxruntime / tensorrt / torch)

Inference on images and videos is now supported.

python tools/inference/onnx_inf.py --onnx model.onnx --input image.jpg  # video.mp4
python tools/inference/trt_inf.py --trt model.engine --input image.jpg
python tools/inference/torch_inf.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml -r model.pth --input image.jpg --device cuda:0
Benchmark
  1. Setup
pip install -r tools/benchmark/requirements.txt
  1. Model FLOPs, MACs, and Params
python tools/benchmark/get_info.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml
  1. TensorRT Latency
python tools/benchmark/trt_benchmark.py --COCO_dir path/to/COCO2017 --engine_dir model.engine
Fiftyone Visualization
  1. Setup
pip install fiftyone
  1. Voxel51 Fiftyone Visualization (fiftyone)
python tools/visualization/fiftyone_vis.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml -r model.pth
Others
  1. Auto Resume Training
bash reference/safe_training.sh
  1. Converting Model Weights
python reference/convert_weight.py model.pth

5. Citation

If you use DEIM or its methods in your work, please cite the following BibTeX entries:

bibtex
@misc{huang2024deim,
      title={DEIM: DETR with Improved Matching for Fast Convergence},
      author={Shihua Huang, Zhichao Lu, Xiaodong Cun, Yongjun Yu, Xiao Zhou, and Xi Shen},
      year={2024},
      eprint={2412.04234},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

6. Acknowledgement

Our work is built upon D-FINE and RT-DETR.

✨ Feel free to contribute and reach out if you have any questions! ✨