Official Implementation for ICDAR2024 paper "Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism"
Please find the MP-DocVQA dataset in RRC Task 4. More details can be found in Ruben's GitHub repo.
Once you've acquired the dataset and placed it in your folder, be sure to update lines 9-10 in the dataset.py
file accordingly.
All the hyperparameters can be modified within the train.py
. To train the model, just do python train.py
.
The well trained weights for the scoring module can be found in scoring_pix2struct.model.ANLS0.6199
.
Please find the leaderboard HERE, and you can find this method named "(OCR-Free) Retrieval-based Baseline".
If you find our work helpful for your research or use it as a baseline model, please cite our paper as follows:
@inproceedings{kang2024multi,
title={Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism},
author={Kang, Lei and Tito, Rub{\`e}n and Valveny, Ernest and Karatzas, Dimosthenis},
booktitle={International Conference on Document Analysis and Recognition},
year={2024},
organization={Springer}
}