Questions about two stage training in PETR #11

flyinglynx · 2022-08-05T03:40:00Z

Thank you for sharing the code! I notice that, PETR is set to be two-stage in the code, i.e., top K proposals from encoder output are selected as the query embedding as well as the initial reference point in the decoder. This is also very similar to the two-stage version of Deformable-DETR.

However, in section 3.3 of the paper, the authors mentioned that the query embeddings are randomly initialized and learnt, which is not a two-stage way. I wonder if the reported results are from two-stage models or one-stage ones. Besides, how much improvement can the two-stage variant bring?

dae-sun · 2022-08-05T03:48:03Z

In PETR, they used randomly initialized queries without query positional encoding from reference points. So, section 3.3 seems correct with the code of this repo.

dae-sun · 2022-08-05T03:49:35Z

while two-stage deformable DETR embedded their queries with their initial bboxes.

flyinglynx · 2022-08-05T05:52:32Z

In PETR, they used randomly initialized queries without query positional encoding from reference points. So, section 3.3 seems correct with the code of this repo.

Thank you for your answer! I check the code and i think that, the two-stage mode (default setting in the code) denotes that the initial reference points for the decoder are initialized from the top 100 proposals, while the query embedding vectors are still randomly initialized.

I check the code in opera/models/utils/transformer.py, line 856-908, encoder features with high confidence are selected as proposals, and K keypoint coordinates are predicted. The keypoints are used as initial reference points in the decoder (Note that the deformable cross-attention use 17 reference points). Hence, i think there are still some difference between section 3.3, where the locations of initial reference points are randomly initialized and learnt. I am a little curious here how much improvement this modification can bring. Actually, such setting is quite reasonable.

This setting is very close to the recent DINO, where they only use positional information from encoder proposals and use randomly initialized content vectors for query. DINO says this will yield better performance.

dae-sun · 2022-08-05T06:06:14Z

Hence, I think there is still some difference between section 3.3, where the locations of initial reference points are randomly initialized and learned.
-> sorry, I checked it. The initial reference point P0 is a randomly-initialized matrix and jointly updated with the model parameters during training. I also think it's weird.

This setting is very close to the recent DINO, where they only use positional information from encoder proposals and use randomly initialized content vectors for queries. DINO says this will yield better performance.
-> The DINO uses mixed query selection that set initial reference points as content queries and uses randomly initialized positional encoding while this repo set randomly initialized values as content queries also.

Thank you for your feedback :)

dae-sun · 2022-08-05T06:13:01Z

I check the code and I think that the two-stage mode (default setting in the code) denotes that the initial reference points for the decoder are initialized from the top 100 proposals, while the query embedding vectors are still randomly initialized.

-> Yes, I think so too!

flyinglynx · 2022-08-05T09:19:09Z

The DINO uses mixed query selection that set initial reference points as content queries and uses randomly initialized positional encoding while this repo set randomly initialized values as content queries also.

-> I have not finished reading DINO's code. But I think the idea of only passing position information of proposals are quite similar here. Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about two stage training in PETR #11

Questions about two stage training in PETR #11

flyinglynx commented Aug 5, 2022

dae-sun commented Aug 5, 2022

dae-sun commented Aug 5, 2022

flyinglynx commented Aug 5, 2022

dae-sun commented Aug 5, 2022 •

edited

Loading

dae-sun commented Aug 5, 2022

flyinglynx commented Aug 5, 2022

Questions about two stage training in PETR #11

Questions about two stage training in PETR #11

Comments

flyinglynx commented Aug 5, 2022

dae-sun commented Aug 5, 2022

dae-sun commented Aug 5, 2022

flyinglynx commented Aug 5, 2022

dae-sun commented Aug 5, 2022 • edited Loading

dae-sun commented Aug 5, 2022

flyinglynx commented Aug 5, 2022

dae-sun commented Aug 5, 2022 •

edited

Loading