The model and loaded state dict do not match exactly #140

LeonBytes · 2024-05-22T08:19:02Z

I want to use the pre-trained co_deformable_detr_r50_1x_coco.pth to test on my dataset, but it shows "The model and loaded state dict do not match exactly" I have modified the def coco_classes(): to return my classification from ./mmdet\core\evaluation\class_names.py , modified the classes in CocoDataset(CustomDataset): from \mmdet\datasets\coco.py, and also the content related to the num_classes from projects\configs\co_deformable_detr\co_deformable_detr_r50_1x_coco.py
The content of co_deformable_detr_r50_1x_coco.py is below:
base = [
'../base/datasets/coco_detection.py',
'../base/default_runtime.py'
]

model settings

num_dec_layer = 6
lambda_2 = 2.0

model = dict(
type='CoDETR',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='ChannelMapper',
in_channels=[512, 1024, 2048],
kernel_size=1,
out_channels=256,
act_cfg=None,
norm_cfg=dict(type='GN', num_groups=32),
num_outs=4),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
octave_base_scale=4,
scales_per_octave=3,
ratios=[0.5, 1.0, 2.0],
strides=[8, 16, 32, 64, 128]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0num_dec_layerlambda_2),
loss_bbox=dict(type='L1Loss', loss_weight=1.0num_dec_layerlambda_2)),
query_head=dict(
type='CoDeformDETRHead',
num_query=300,
num_classes=2,
in_channels=2048,
sync_cls_avg_factor=True,
with_box_refine=True,
as_two_stage=True,
mixed_selection=True,
transformer=dict(
type='CoDeformableDetrTransformer',
num_co_heads=2,
encoder=dict(
type='DetrTransformerEncoder',
num_layers=6,
transformerlayers=dict(
type='BaseTransformerLayer',
attn_cfgs=dict(
type='MultiScaleDeformableAttention', embed_dims=256, dropout=0.0),
feedforward_channels=2048,
ffn_dropout=0.0,
operation_order=('self_attn', 'norm', 'ffn', 'norm'))),
decoder=dict(
type='CoDeformableDetrTransformerDecoder',
num_layers=num_dec_layer,
return_intermediate=True,
look_forward_twice=True,
transformerlayers=dict(
type='DetrTransformerDecoderLayer',
attn_cfgs=[
dict(
type='MultiheadAttention',
embed_dims=256,
num_heads=8,
dropout=0.0),
dict(
type='MultiScaleDeformableAttention',
embed_dims=256,
dropout=0.0)
],
feedforward_channels=2048,
ffn_dropout=0.0,
operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
'ffn', 'norm')))),
positional_encoding=dict(
type='SinePositionalEncoding',
num_feats=128,
normalize=True,
offset=-0.5),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=2.0),
loss_bbox=dict(type='L1Loss', loss_weight=5.0),
loss_iou=dict(type='GIoULoss', loss_weight=2.0)),
roi_head=[dict(
type='CoStandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[8, 16, 32, 64],
finest_scale=112),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=2,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
reg_decoded_bbox=True,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0num_dec_layerlambda_2),
loss_bbox=dict(type='GIoULoss', loss_weight=10.0num_dec_layerlambda_2)))],
bbox_head=[dict(
type='CoATSSHead',
num_classes=2,
in_channels=256,
stacked_convs=1,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
ratios=[1.0],
octave_base_scale=8,
scales_per_octave=1,
strides=[8, 16, 32, 64, 128]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[0.1, 0.1, 0.2, 0.2]),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0num_dec_layerlambda_2),
loss_bbox=dict(type='GIoULoss', loss_weight=2.0num_dec_layerlambda_2),
loss_centerness=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0num_dec_layerlambda_2)),],
# model training and testing settings
train_cfg=[
dict(
assigner=dict(
type='HungarianAssigner',
cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'),
iou_cost=dict(type='IoUCost', iou_mode='giou', weight=2.0))),
dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=4000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)),
dict(
assigner=dict(type='ATSSAssigner', topk=9),
allowed_border=-1,
pos_weight=-1,
debug=False),],
test_cfg=[
dict(max_per_img=100),
dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.0,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100)),
dict(
nms_pre=1000,
min_bbox_size=0,
score_thr=0.0,
nms=dict(type='nms', iou_threshold=0.6),
max_per_img=100),
# soft-nms is also supported for rcnn testing
# e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
])

img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

train_pipeline, NOTE the img_scale and the Pad's size_divisor is different

from the default setting in mmdet.

train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='AutoAugment',
policies=[
[
dict(
type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
multiscale_mode='value',
keep_ratio=True)
],
[
dict(
type='Resize',
# The radio of all image in train dataset < 7
# follow the original impl
img_scale=[(400, 4200), (500, 4200), (600, 4200)],
multiscale_mode='value',
keep_ratio=True),
dict(
type='RandomCrop',
crop_type='absolute_range',
crop_size=(384, 600),
allow_negative_crop=True),
dict(
type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
multiscale_mode='value',
override=True,
keep_ratio=True)
]
]),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=1),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]

test_pipeline, NOTE the Pad's size_divisor is different from the default

setting (size_divisor=32). While there is little effect on the performance

whether we use the default setting or use size_divisor=1.

test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=1),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]

data = dict(
samples_per_gpu=1,
workers_per_gpu=1,
train=dict(filter_empty_gt=False, pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))

optimizer

optimizer = dict(
type='AdamW',
lr=2e-4,
weight_decay=1e-4,
paramwise_cfg=dict(
custom_keys={
'backbone': dict(lr_mult=0.1),
'sampling_offsets': dict(lr_mult=0.1),
'reference_points': dict(lr_mult=0.1)
}))
optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2))

learning policy

lr_config = dict(policy='step', step=[11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
Here is the part of the bug information:
The model and loaded state dict do not match exactly

size mismatch for query_head.cls_branches.0.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]).
size mismatch for query_head.cls_branches.0.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for query_head.cls_branches.1.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]).
size mismatch for query_head.cls_branches.1.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for query_head.cls_branches.2.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]).
size mismatch for query_head.cls_branches.2.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for query_head.cls_branches.3.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]).
size mismatch for query_head.cls_branches.3.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for query_head.cls_branches.4.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]).
size mismatch for query_head.cls_branches.4.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for query_head.cls_branches.5.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]).
size mismatch for query_head.cls_branches.5.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for query_head.cls_branches.6.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]).
size mismatch for query_head.cls_branches.6.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).
size mismatch for roi_head.0.bbox_head.fc_cls.weight: copying a param with shape torch.Size([81, 1024]) from checkpoint, the shape in current model is torch.Size([3, 1024]).
size mismatch for roi_head.0.bbox_head.fc_cls.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([3]).
size mismatch for roi_head.0.bbox_head.fc_reg.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([8, 1024]).
size mismatch for roi_head.0.bbox_head.fc_reg.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([8]).
size mismatch for bbox_head.0.atss_cls.weight: copying a param with shape torch.Size([80, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 256, 3, 3]).
size mismatch for bbox_head.0.atss_cls.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).

TempleX98 · 2024-05-29T13:17:29Z

These mismatched weights are the classifier branches. This is normal for the fine-tuning setting and does not affect the performance.

LeonBytes · 2024-07-09T18:48:05Z

These mismatched weights are the classifier branches. This is normal for the fine-tuning setting and does not affect the performance.

Thank you for your answer. I want to train my own dataset, i.e. transfer learning. My dataset has only two categories, which are not in the 80 categories of pre-training. During the training process, it also prompts size mismatch. Is this also normal? (I have modified num_classes and the corresponding class name and in the co_deformable_detr_r50_1x_coco.py added
load_from = './checkpoints/co_deformable_detr_r50_1x_coco.pth'
resume_from = None)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The model and loaded state dict do not match exactly #140

The model and loaded state dict do not match exactly #140

LeonBytes commented May 22, 2024

TempleX98 commented May 29, 2024 •

edited

Loading

LeonBytes commented Jul 9, 2024

The model and loaded state dict do not match exactly #140

The model and loaded state dict do not match exactly #140

Comments

LeonBytes commented May 22, 2024

model settings

train_pipeline, NOTE the img_scale and the Pad's size_divisor is different

from the default setting in mmdet.

test_pipeline, NOTE the Pad's size_divisor is different from the default

setting (size_divisor=32). While there is little effect on the performance

whether we use the default setting or use size_divisor=1.

optimizer

learning policy

TempleX98 commented May 29, 2024 • edited Loading

LeonBytes commented Jul 9, 2024

TempleX98 commented May 29, 2024 •

edited

Loading