Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing results from papers #21

Open
orionw opened this issue Jun 9, 2023 · 7 comments
Open

Reproducing results from papers #21

orionw opened this issue Jun 9, 2023 · 7 comments

Comments

@orionw
Copy link

orionw commented Jun 9, 2023

Hi there! Great work - this is a very interesting line of research!

I was hoping to replicate your results on BEIR but seem to be having some trouble. For example, in both InPars v1 and v2 papers you mention using a learning rate of 1e−3, but I can't find any example scripts that use that (in legacy or otherwise, they seem to use 3e-4). When I use the hyperparameters from the papers (or the default example), I am getting much worse results.

I'm sure it's just some config that I'm missing from reading the papers/code, but if you happen to have the commands that reproduce the numbers in the paper I'd really appreciate it!

Thanks for your time!

@lhbonifacio
Copy link
Collaborator

Hey @orionw
Thank you for your interest in our work!
Could you give us more information about how are you trying to replicate the results? (the dataset you are using, are you generating new synthetic data or using the data we made available, are you fine-tuning/evaluating using TPU/GPU,....)
And regarding the learning rate, we used 3e-4 (we are going to correct it).

Moreover, we are about to release a reproduction paper of InPars with further details on how to reproduce the results.

Thank you!

@orionw
Copy link
Author

orionw commented Jun 21, 2023

Thanks for the reply @lhbonifacio!

I've tried a couple datasets (SciFact, SciDocs) but can't reproduce it. I'm using GPUs and the code in inpars not in legacy. I am generating new questions using huggingface models (not the available InPars v1 questions and I haven't seen the InPars v2 generated questions publicly available).

I've tried several learning rates (including 3e-4) and optimizers but for both of them any amount of re-ranker fine-tuning on the synthetic docs makes the performance worse than just using castorini/monot5-3b-msmarco-10k without fine-tuning (and performance is worse than reported in the paper).

If you have the fine-tuning hyperparameters for any of the BEIR runs that would be great (optimizer, learning rate, scheduler, steps, etc.).

Obviously with non-determinism there will be randomness in the generated questions and in training, but I was hoping to minimize differences due to model training.

@cramraj8
Copy link

cramraj8 commented Nov 7, 2023

Hi @orionw , I wonder if you fine-tune from the castorini/monot5-3b-msmarco-10k checkpoint or from t5-base checkpoint. Any luck on sorting this out ?

@orionw
Copy link
Author

orionw commented Nov 7, 2023

Hi @cramraj8! I didn't use t5-base but I don't think they did either? I never did sort it out and moved on from this as it didn't seem like it would be released soon.

If they do (or you have time to figure it out), would love to see it be reproducible.

@cramraj8
Copy link

cramraj8 commented Nov 9, 2023

@orionw Got it. I tried to generate unsupervised data by other tools, and in all cases the performance seem to drop for some cases.

@cramraj8
Copy link

cramraj8 commented Apr 8, 2024

Hi @orionw , I did found out the reason behind the performance drop and proposed an effective solution in my recent NAACL paper. You can find it here - https://arxiv.org/pdf/2404.02489.pdf

@orionw
Copy link
Author

orionw commented Apr 8, 2024

Awesome @cramraj8! Thank you, I'm very excited to read the paper 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants