Bottom-up Attention with Detectron2

The detectron2 system with exactly the same model and weight as the Caffe VG Faster R-CNN provided in bottom-up-attetion.

The original bottom-up-attetion is implemented based on Caffe, which is not easy to install and is inconsistent with the training code in PyTorch. Our project thus transfers the weights and models to detectron2 that could be few-line installed and has PyTorch front-end.

The features extracted from this repo is compatible with LXMERT code and pre-trained models here. Results have been locally verified.

Installation

git clone https://github.com/airsplay/py-bottom-up-attention.git
cd py-bottom-up-attention

# Install python libraries
pip install -r requirements.txt
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

# Install detectron2
python setup.py build develop

# or if you are on macOS
# MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build develop

# or, as an alternative to `setup.py`, do
# pip install [--editable] .

Demos

Object Detection

demo vg detection

Feature Extraction

With Attributes:

Single image: demo extraction
Single image (Given boxes): demo extraction

Without Attributes:

Single image: demo extraction
Single image (Given boxes): demo extraction

Feature Extraction Scripts for MS COCO

Note: this script does not include attribute. If you want to use attributes, please modify it according to the demo

For MS COCO (VQA): vqa script

Note

If the weight is not automatically downloaded, please try manual downloading:

wget --no-check-certificate https://nlp1.cs.unc.edu/models/faster_rcnn_from_caffe_attr.pkl -P ~/.torch/fvcore_cache/models/
wget --no-check-certificate https://nlp1.cs.unc.edu/models/faster_rcnn_from_caffe.pkl -P ~/.torch/fvcore_cache/models/

The default weight is same to the 'alternative pretrained model' in the original github here, which is trained with 36 bbxes. If you want to use the original detetion trained with 10~100 bbxes, please use the following weight:
```
http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe_attr_original.pkl
```

External Links

The orignal CAFFE implementation https://github.com/peteanderson80/bottom-up-attention, and its docker image.
bottom-up-attention.pytorch maintained by MIL-LAB.

Proof of Correctness

As shown in demo

Note: You might find a little difference between the caffe features and pytorch features in this verification demo. It is because the verification uses the setup "Given box" instead of "Predicted boxes". If the features are extracted from scratch (i.e., features with predicted boxes), they are exactly the same.

Detailed explanation is here; "Given box" will use feature with the final predicted boxes (after box regression), however, the extracted features will use the features of the proposals. I illustrate this in below:

Feature extraction (using predicted boxes):

ResNet --> RPN --> RoiPooling + Res5 --> Box Regression --> BOX
                                      |-------------------> Feature --> Label
                                                                  |-> Attribute

Feature extraction (using given boxes):

ResNet --> RPN --> RoiPooling + Res5 --> Box Regression --> BOX
                                           |--> RoIPooling + Res5 --> Feature --> Label
                                                                              |-> Attribute

References

Detectron2:

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}

Bottom-up Attention:

@inproceedings{Anderson2017up-down,
  author = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang},
  title = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
  booktitle={CVPR},
  year = {2018}
}

LXMERT:

@inproceedings{tan2019lxmert,
  title={LXMERT: Learning Cross-Modality Encoder Representations from Transformers},
  author={Tan, Hao and Bansal, Mohit},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
.circleci		.circleci
.github		.github
configs		configs
datasets		datasets
demo		demo
detectron2		detectron2
dev		dev
docker		docker
docs		docs
original_demo		original_demo
projects		projects
tests		tests
tools		tools
.clang-format		.clang-format
.flake8		.flake8
.gitignore		.gitignore
GETTING_STARTED.md		GETTING_STARTED.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
original_README.md		original_README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bottom-up Attention with Detectron2

Installation

Demos

Object Detection

Feature Extraction

Feature Extraction Scripts for MS COCO

Note

External Links

Proof of Correctness

References

About

Releases

Packages

Languages

License

HimariO/py-bottom-up-attention

Folders and files

Latest commit

History

Repository files navigation

Bottom-up Attention with Detectron2

Installation

Demos

Object Detection

Feature Extraction

Feature Extraction Scripts for MS COCO

Note

External Links

Proof of Correctness

References

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages