SelfAttnScoring-MPDocVQA

Official Implementation for ICDAR2024 paper "Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism"

Dataset

Please find the MP-DocVQA dataset in RRC Task 4. More details can be found in Ruben's GitHub repo.

Once you've acquired the dataset and placed it in your folder, be sure to update lines 9-10 in the dataset.py file accordingly.

Train the model

All the hyperparameters can be modified within the train.py. To train the model, just do python train.py.

Weights

The well trained weights for the scoring module can be found in scoring_pix2struct.model.ANLS0.6199.

Benchmark

Please find the leaderboard HERE, and you can find this method named "(OCR-Free) Retrieval-based Baseline".

Citation

If you find our work helpful for your research or use it as a baseline model, please cite our paper as follows:

@inproceedings{kang2024multi,
  title={Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism},
  author={Kang, Lei and Tito, Rub{\`e}n and Valveny, Ernest and Karatzas, Dimosthenis},
  booktitle={International Conference on Document Analysis and Recognition},
  year={2024},
  organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
fonts		fonts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_extreme_valid_npy.py		create_extreme_valid_npy.py
dataset.py		dataset.py
metrics.py		metrics.py
prob_model.py		prob_model.py
scoring_pix2struct.model.ANLS0.6199		scoring_pix2struct.model.ANLS0.6199
seed.py		seed.py
train.py		train.py
util_log.py		util_log.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SelfAttnScoring-MPDocVQA

Dataset

Train the model

Weights

Benchmark

Citation

About

Releases

Packages

Languages

License

leitro/SelfAttnScoring-MPDocVQA

Folders and files

Latest commit

History

Repository files navigation

SelfAttnScoring-MPDocVQA

Dataset

Train the model

Weights

Benchmark

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages