Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load weights from original pretrained model #1

Open
adrelino opened this issue Feb 24, 2020 · 11 comments
Open

Load weights from original pretrained model #1

adrelino opened this issue Feb 24, 2020 · 11 comments

Comments

@adrelino
Copy link

Thanks for your effort to migrate the slightly modified Faster-RCNN from Caffe to PyTorch!

Your README.md states that

The detectron2 system with exact same model and weight as the Caffe VG Faster R-CNN provided in bottom-up-attetion.

The features extracted from this repo is compatible with LXMERT code and pre-trained models here. The original [bottom-up-attetion] is implemented based on Caffe, which is not easy to install and is inconsistent with the training code in PyTorch. Our project thus transfers the weights and models to detectron2 that could be few-line installed and has PyTorch front-end.

When going through your code and extracting the diff towards detectron2, I could see how you transferred the model from bottom-up-attention using the additional options:

  • CAFFE_MAXPOOL
  • PROPOSAL_GENERATOR.HID_CHANNELS
  • ROI_BOX_HEAD.RES5HALVE

see defaults.py and faster_rcnn_R_101_C4_caffemaxpool.yaml.

However, I cannot see in the code how the original weights are transferred.

Is it possible to load the weights from the original Caffe alternative pretrained model resnet101_faster_rcnn_final.caffemodel from bottom-up-attention#demo into your modified PyTorch/detectron2 model?

@airsplay
Copy link
Owner

Thanks. I actually converted the Caffe weight to PyTorch and dump it. The converted Detectron2 weight is saved on my server here http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe.pkl and it is loaded to the model here:

cfg.MODEL.WEIGHTS = "http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe.pkl"
.

Trying to load weights from CAFFE was a disaster... It took me some time to fix everything. Besides the name difference of weights, there are mainly three tricky parts:

  1. The anchors of detectron2 and Caffe are saved differently.
  2. In Caffe, the first class is the background while in Detectron2, the last class in the background.
  3. The 2-class softmax function is used in Caffe to calculate Probs, however, the sigmoid function is used in Detectron2.

All these tricky differences (It's really a hard time to locate them...) are carefully handled in the provided weight http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe.pkl.

I am willing to provide the conversion code (it is badly written and organized...) if you are interested in.

@airsplay
Copy link
Owner

By the way, I also found a bug in the previous Caffe implementation which introduces some disagreement in the proposed objects.

The Caffe anchor generator starting from here has shown the desired anchors generated by the script. However, if you printed out the anchors by running this python file. You would definitely be surprised that these anchors do not match the printed one!

Whatever, this issue has also been fixed by this py-butd repo!

@adrelino
Copy link
Author

adrelino commented Feb 27, 2020

Ok, thanks for the hints, yes I also figured that your .pkl file would contain the converted weights, but is it derived from the pretrained model or from the alternative pretrained model from bottom-up-attention/#demo? Anyways I would also be interested in the conversion code.

I continued my work on my adrelino/py-bottom-up-attention-extracted to compare the original 36 features from bottom up attention with newly generated ones.

1. Reference

As an example, I extracted the first line of

36 features per image (fixed):

from bottom-up-attention/#pretrained-features
via head -n 1 trainval_36/trainval_resnet101_faster_rcnn_genome_36.tsv > trainval_36_trainval_resnet101_faster_rcnn_genome_36.tsv_head-n_1
and saved it here:
trainval_36_trainval_resnet101_faster_rcnn_genome_36.tsv_head-n_1.
It contains the features for image COCO_train2014_000000150367.jpg with id 150367

2. Generated:

I run
python demo/detectron2_mscoco_proposal_maxnms.py to re-generate the features on image COCO_train2014_000000150367.jpg, but the resulting .tsv file train2014_d2obj36_batch.tsv is different than the reference above.

Base64

To investigate further, I base64 decoded and numpy reshaped the fields boxes and features, but the boxes already don't correspond:

python tools/compare_tsv.py 
data/mscoco_imgfeat/train2014_d2obj36_batch.tsv
vs.
data/mscoco_imgfeat/trainval_36_trainval_resnet101_faster_rcnn_genome_36.tsv_head-n_1
compare each line
1 unequal
lines unequal, checking columns
id: 150367
True image_id 150367
True image_w 640
True image_h 480
True num_boxes 36
False boxes
[[  0.        188.5551    535.74023   480.       ]
 [391.2551    239.6029    475.4248    328.97556  ]
 [119.180885  138.27368   290.65204   235.4842   ]
 [  2.8959718   0.        244.67422   141.791    ]
 [299.64725   265.55765   388.95096   348.58557  ]
 [281.2899    149.15913   348.04684   263.60767  ]
 [152.14993   350.28207   349.89642   452.60596  ]
 [386.367      39.579952  524.26495   157.42542  ]
 [315.0122    365.19833   398.1106    454.09235  ]
 [262.68954   366.97705   314.90378   444.80786  ]
 [  0.          1.1936646 582.2236    240.3336   ]
 [440.60825   158.0549    500.29575   194.05539  ]
 [452.13193   385.67383   543.4787    425.13986  ]
 [116.60308   133.08482   447.29678   330.63364  ]
 [  3.7774842 316.48044    67.91557   365.57773  ]
 [151.07129   153.60335   192.82156   188.1479   ]
 [392.41113     3.8454957 632.65576   224.55     ]
 [585.87494     1.1048218 639.1245    107.25152  ]
 [504.7712     47.508728  580.7663    157.84637  ]
 [160.06209     0.        214.58      108.320755 ]
 [403.6871     66.14704   437.82407   103.37439  ]
 [206.1182      3.6981018 372.3026     81.27689  ]
 [ 65.026146  215.8811    250.89737   349.80844  ]
 [ 40.752773  171.78362   217.24268   257.4409   ]
 [222.46846   263.7919    282.78467   338.67926  ]
 [539.40594     3.344104  613.6925     48.21457  ]
 [353.35086   240.06107   550.5021    389.00613  ]
 [ 72.10183     1.4079803 140.93443    28.015684 ]
 [125.49993    23.131165  470.27658   135.51962  ]
 [215.8018     60.098904  389.03024   126.324234 ]
 [388.0399     10.245563  537.3754     67.513374 ]
 [472.96646    96.88859   639.6357    470.88174  ]
 [292.469     301.09494   640.        480.       ]
 [297.42258   133.76959   338.74496   181.6238   ]
 [219.10094   386.8395    248.67676   425.58932  ]
 [  1.0515381 416.78702   520.20654   480.       ]]
vs.
[[  0.         277.6827     583.3728     479.2       ]
 [389.56876    239.18001    475.52667    330.16315   ]
 [300.33197    265.5234     387.11652    347.883     ]
 [118.00602    138.60199    289.6468     235.06424   ]
 [  0.           0.         252.79614    140.41837   ]
 [156.60852    352.786      344.48508    450.08588   ]
 [280.75128    146.12346    347.84033    265.48737   ]
 [386.7497      44.484303   522.03094    154.81404   ]
 [338.34506      0.         639.2        164.73306   ]
 [321.5712     358.26862    377.40723    448.2745    ]
 [279.20203      0.         584.5702     132.85336   ]
 [263.4151     366.8589     314.9948     444.99268   ]
 [440.21368    156.93144    500.5992     193.9488    ]
 [  0.           0.         608.7625     350.63208   ]
 [452.19727    384.70477    542.59       425.482     ]
 [  0.86831665 315.65247     70.51543    366.01852   ]
 [149.9029     152.9564     193.36127    188.3182    ]
 [ 27.668285     4.043219   181.89998    120.18465   ]
 [276.54065     54.26455    639.2        454.42773   ]
 [141.57083    140.46927    455.4457     344.03836   ]
 [584.67584      0.         639.2        107.73751   ]
 [169.18588    359.8093     271.23724    456.87762   ]
 [158.44717      0.         214.74826    108.832054  ]
 [  0.         302.57465     78.91972    395.2979    ]
 [  0.          18.055298   286.9795     464.57227   ]
 [503.02118     49.59502    582.64844    156.79427   ]
 [ 75.55986    327.47873    421.8147     463.39697   ]
 [403.1822      65.44243    437.54654    103.289085  ]
 [277.84735    345.26016    382.9519     455.149     ]
 [306.05084    353.73135    386.62222    454.88385   ]
 [242.39853    355.86627    333.31012    453.9502    ]
 [203.40878      2.5947204  379.92694     80.30624   ]
 [263.6872     144.34995    479.49268    319.98187   ]
 [ 26.716516   131.85942    322.95587    259.661     ]
 [ 39.608967   170.50273    219.11008    257.4533    ]
 [ 67.062485   213.83627    249.61057    352.60757   ]]
False features
...

Do you have any idea what could be the reason?

@airsplay
Copy link
Owner

Thanks. If the boxes are the same, the model would predict the same feature as butd. Meanwhile, The discrepancy of proposed boxes is caused by two reasons:

  1. The main reason is that NMS implementations are different. I use the maxnms method while the butd repo uses a fancy nms methods which is much slower. I have checked that these two proposing methods give almost the same results in training/pre-training/fine-tuning.

  2. The second reason is the anchors in butd are wrongly shifted. As I mentioned previously:

    The Caffe anchor generator starting from here has shown the desired anchors generated by the script. However, if you printed out the anchors by running this python file. You would definitely be surprised that these anchors do not match the printed one!

    Since the anchors are shifted in butd, the boxes proposed by butd could not touch the edges of the images (e.g., 640. and 480. in your example) but would be very close to it (i.e., 639.2 and 479.2). At the same time, you would find that the boxes proposed by my model would touch the edges because I fixed this bug.

@MIL-VLG
Copy link

MIL-VLG commented Apr 27, 2020

@adrelino @airsplay
Hi guys,
We have a parallel work to reimplement the bottom-up-attention in PyTorch, and we also use Detectron2 as our backend coincidently :)

We have converted the Caffe model (with K in [10,100]) and tested to extract the same visual features (with a tiny deviation <0.01) as the original Caffe version.

Our repo: https://github.com/MILVLG/bottom-up-attention.pytorch

@airsplay
Copy link
Owner

Great to know that. I have added the link to your project to the Readme file.

BTW, I noticed that the attribute head is closed here. It would be better if this part is made available.

@MIL-VLG
Copy link

MIL-VLG commented May 10, 2020

Great to know that. I have added the link to your project to the Readme file.

BTW, I noticed that the attribute head is closed here. It would be better if this part is made available.

we have added this feature in our latest update : )

@airsplay
Copy link
Owner

Cool :).

@Einstone-rose
Copy link

Hi, in README, you said, "the features extracted from this repo is compatible with LXMERT code and pre-trained models here", that is say, if i want to directly use the extracted features, can i dowlnoad the features (MSCOCO) in this repro https://github.com/airsplay/lxmert ?

@184446223
Copy link

Can your work achieve a fixed extraction of 36 regions?

@184446223
Copy link

Great to know that. I have added the link to your project to the Readme file.
BTW, I noticed that the attribute head is closed here. It would be better if this part is made available.

we have added this feature in our latest update : )

Can your work achieve a fixed extraction of 36 regions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants