support rec

hhaAndroid · Nov 22, 2023 · 6fad548 · 6fad548
1 parent 44d1281
commit 6fad548
Show file tree

Hide file tree

Showing 4 changed files with 198 additions and 45 deletions.
diff --git a/configs/grounding_dino/README.md b/configs/grounding_dino/README.md
@@ -78,16 +78,42 @@ Note:
 ## LVIS Results
 
 |       Model       | MiniVal APr | MiniVal APc | MiniVal APf | MiniVal AP | Val1.0 APr | Val1.0 APc | Val1.0 APf | Val1.0 AP |       Pre-Train Data       |                                 Config                                  |                                           Download                                           |
-|:-----------------:|:-----------:|:-----------:|:-----------:|:----------:| :--------: | :--------: | :--------: | :-------: | :------------------------: | :---------------------------------------------------------------------: | :------------------------------------------------------------------------------------------: |
-| Grounding DINO-T  |    18.8     |    24.2     |    34.7     |    28.8    |            |            |            |           |            O365,GoldG,Cap4M            | [config](lvis/grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py)  | [model](https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth) |
-| Grounding DINO-B  |    27.9     |    33.4     |    37.2     |    34.7    |            |            |            |           |            COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO            | [config](lvis/grounding_dino_swin-b_pretrain_zeroshot_mini-lvis.py)  | [model](https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swinb_cogcoor_mmdet-55949c9c.pth) |
+|:-----------------:|:-----------:|:-----------:|:-----------:|:----------:|:----------:|:----------:|:----------:|:---------:| :------------------------: | :---------------------------------------------------------------------: | :------------------------------------------------------------------------------------------: |
+| Grounding DINO-T  |    18.8     |    24.2     |    34.7     |    28.8    |    10.1    |    15.3    |    29.9    |   20.1    |            O365,GoldG,Cap4M            | [config](lvis/grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py)  | [model](https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth) |
+| Grounding DINO-B  |    27.9     |    33.4     |    37.2     |    34.7    |    19.0    |    24.1    |    32.9    |   26.7    |            COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO            | [config](lvis/grounding_dino_swin-b_pretrain_zeroshot_mini-lvis.py)  | [model](https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swinb_cogcoor_mmdet-55949c9c.pth) |
 
 
 Note:
 
 1. The above are zero-shot evaluation results.
 2. The evaluation metric we used is LVIS FixAP. For specific details, please refer to [Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details](https://arxiv.org/pdf/2102.01066.pdf).
 
+## Referring Expression Comprehension Results
+
+| Method                 | Grounding DINO-T  | Grounding DINO-B  |
+|------------------------|-------------------|-------------------|
+| RefCOCO val @1,5,10    | 50.77/89.45/94.86 | 84.61/97.88/99.10 |
+| RefCOCO testA @1,5,10  | 57.45/91.29/95.62 | 88.65/98.89/99.63 |
+| RefCOCO testB @1,5,10  | 44.97/86.54/92.88 | 80.51/96.64/98.51 |
+| RefCOCO+ val @1,5,10   | 51.64/86.35/92.57 | 73.67/96.60/98.65 |
+| RefCOCO+ testA @1,5,10 | 57.25/86.74/92.65 | 82.19/97.92/99.09 |
+| RefCOCO+ testB @1,5,10 | 46.35/84.05/90.67 | 64.10/94.25/97.46 |
+| RefCOCOg val @1,5,10   | 60.42/92.10/96.18 | 78.33/97.28/98.57 |
+| RefCOCOg test @1,5,10  | 59.74/92.08/96.28 | 78.11/97.06/98.65 |
+
+Note:
+
+1. `@1,5,10` refers to precision at the top 1, 5, and 10 positions in a predicted ranked list.
+2. The pretraining data used by Grounding DINO-T is `O365,GoldG,Cap4M`, and the corresponding evaluation configuration is (grounding_dino_swin-t_pretrain_zeroshot_refcoco)[refcoco/grounding_dino_swin-t_pretrain_zeroshot_refcoco.py].
+3. The pretraining data used by Grounding DINO-B is `COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO`, and the corresponding evaluation configuration is (grounding_dino_swin-t_pretrain_zeroshot_refcoco)[refcoco/grounding_dino_swin-b_pretrain_zeroshot_refcoco.py].
+
+Test Command
+
+```shell
+cd mmdetection
+./tools/dist_test.sh configs/grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth 8
+./tools/dist_test.sh configs/grounding_dino/refcoco/grounding_dino_swin-b_pretrain_zeroshot_refexp.py https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swinb_cogcoor_mmdet-55949c9c.pth 8
+```
 
 ## Custom Dataset
 

diff --git a/configs/grounding_dino/refcoco/grounding_dino_swin-b_pretrain_zeroshot_refexp.py b/configs/grounding_dino/refcoco/grounding_dino_swin-b_pretrain_zeroshot_refexp.py
@@ -0,0 +1,14 @@
+_base_ = './grounding_dino_swin-t_pretrain_zeroshot_refexp.py'
+
+model = dict(
+    type='GroundingDINO',
+    backbone=dict(
+        pretrain_img_size=384,
+        embed_dims=128,
+        depths=[2, 2, 18, 2],
+        num_heads=[4, 8, 16, 32],
+        window_size=12,
+        drop_path_rate=0.3,
+        patch_norm=True),
+    neck=dict(in_channels=[256, 512, 1024]),
+)
diff --git a/configs/grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refcoco.py b/configs/grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refcoco.py
diff --git a/configs/grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py b/configs/grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py
@@ -0,0 +1,155 @@
+_base_ = '../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py'
+
+model = dict(test_cfg=dict(max_per_img=15))
+
+data_root = 'data/coco/'
+
+test_pipeline = [
+    dict(
+        type='LoadImageFromFile', backend_args=None,
+        imdecode_backend='pillow'),
+    dict(
+        type='FixScaleResize',
+        scale=(800, 1333),
+        keep_ratio=True,
+        backend='pillow'),
+    dict(type='LoadAnnotations', with_bbox=True),
+    dict(
+        type='PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'text', 'custom_entities', 'tokens_positive'))
+]
+
+# -------------------------------------------------#
+ann_file = 'mdetr_annotations/final_refexp_val.json'
+val_dataset_all_val = dict(
+    type='MDETRStyleRefCocoDataset',
+    data_root=data_root,
+    ann_file=ann_file,
+    data_prefix=dict(img='train2014/'),
+    test_mode=True,
+    return_classes=True,
+    pipeline=test_pipeline,
+    backend_args=None)
+val_evaluator_all_val = dict(
+    type='RefExpMetric',
+    ann_file=data_root + ann_file,
+    metric='bbox',
+    iou_thrs=0.5,
+    topk=(1, 5, 10))
+
+# -------------------------------------------------#
+ann_file = 'mdetr_annotations/finetune_refcoco_testA.json'
+val_dataset_refcoco_testA = dict(
+    type='MDETRStyleRefCocoDataset',
+    data_root=data_root,
+    ann_file=ann_file,
+    data_prefix=dict(img='train2014/'),
+    test_mode=True,
+    return_classes=True,
+    pipeline=test_pipeline,
+    backend_args=None)
+
+val_evaluator_refcoco_testA = dict(
+    type='RefExpMetric',
+    ann_file=data_root + ann_file,
+    metric='bbox',
+    iou_thrs=0.5,
+    topk=(1, 5, 10))
+
+# -------------------------------------------------#
+ann_file = 'mdetr_annotations/finetune_refcoco_testB.json'
+val_dataset_refcoco_testB = dict(
+    type='MDETRStyleRefCocoDataset',
+    data_root=data_root,
+    ann_file=ann_file,
+    data_prefix=dict(img='train2014/'),
+    test_mode=True,
+    return_classes=True,
+    pipeline=test_pipeline,
+    backend_args=None)
+
+val_evaluator_refcoco_testB = dict(
+    type='RefExpMetric',
+    ann_file=data_root + ann_file,
+    metric='bbox',
+    iou_thrs=0.5,
+    topk=(1, 5, 10))
+
+# -------------------------------------------------#
+ann_file = 'mdetr_annotations/finetune_refcoco+_testA.json'
+val_dataset_refcoco_plus_testA = dict(
+    type='MDETRStyleRefCocoDataset',
+    data_root=data_root,
+    ann_file=ann_file,
+    data_prefix=dict(img='train2014/'),
+    test_mode=True,
+    return_classes=True,
+    pipeline=test_pipeline,
+    backend_args=None)
+
+val_evaluator_refcoco_plus_testA = dict(
+    type='RefExpMetric',
+    ann_file=data_root + ann_file,
+    metric='bbox',
+    iou_thrs=0.5,
+    topk=(1, 5, 10))
+
+# -------------------------------------------------#
+ann_file = 'mdetr_annotations/finetune_refcoco+_testB.json'
+val_dataset_refcoco_plus_testB = dict(
+    type='MDETRStyleRefCocoDataset',
+    data_root=data_root,
+    ann_file=ann_file,
+    data_prefix=dict(img='train2014/'),
+    test_mode=True,
+    return_classes=True,
+    pipeline=test_pipeline,
+    backend_args=None)
+
+val_evaluator_refcoco_plus_testB = dict(
+    type='RefExpMetric',
+    ann_file=data_root + ann_file,
+    metric='bbox',
+    iou_thrs=0.5,
+    topk=(1, 5, 10))
+
+# -------------------------------------------------#
+ann_file = 'mdetr_annotations/finetune_refcocog_test.json'
+val_dataset_refcocog_test = dict(
+    type='MDETRStyleRefCocoDataset',
+    data_root=data_root,
+    ann_file=ann_file,
+    data_prefix=dict(img='train2014/'),
+    test_mode=True,
+    return_classes=True,
+    pipeline=test_pipeline,
+    backend_args=None)
+
+val_evaluator_refcocog_test = dict(
+    type='RefExpMetric',
+    ann_file=data_root + ann_file,
+    metric='bbox',
+    iou_thrs=0.5,
+    topk=(1, 5, 10))
+# -------------------------------------------------#
+datasets = [val_dataset_all_val, val_dataset_refcoco_testA,
+            val_dataset_refcoco_testB, val_dataset_refcoco_plus_testA,
+            val_dataset_refcoco_plus_testB, val_dataset_refcocog_test]
+dataset_prefixes = [
+    'val', 'refcoco_testA', 'refcoco_testB', 'refcoco+_testA', 'refcoco+_testB', 'refcocog_test'
+]
+metrics = [val_evaluator_all_val, val_evaluator_refcoco_testA,
+           val_evaluator_refcoco_testB, val_evaluator_refcoco_plus_testA,
+           val_evaluator_refcoco_plus_testB, val_evaluator_refcocog_test]
+
+val_dataloader = dict(
+    dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets))
+test_dataloader = val_dataloader
+
+val_evaluator = dict(
+    _delete_=True,
+    type='MultiDatasetsEvaluator',
+    metrics=metrics,
+    dataset_prefixes=dataset_prefixes)
+test_evaluator = val_evaluator