-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing results from papers #21
Comments
Hey @orionw Moreover, we are about to release a reproduction paper of InPars with further details on how to reproduce the results. Thank you! |
Thanks for the reply @lhbonifacio! I've tried a couple datasets (SciFact, SciDocs) but can't reproduce it. I'm using GPUs and the code in I've tried several learning rates (including 3e-4) and optimizers but for both of them any amount of re-ranker fine-tuning on the synthetic docs makes the performance worse than just using If you have the fine-tuning hyperparameters for any of the BEIR runs that would be great (optimizer, learning rate, scheduler, steps, etc.). Obviously with non-determinism there will be randomness in the generated questions and in training, but I was hoping to minimize differences due to model training. |
Hi @orionw , I wonder if you fine-tune from the |
Hi @cramraj8! I didn't use If they do (or you have time to figure it out), would love to see it be reproducible. |
@orionw Got it. I tried to generate unsupervised data by other tools, and in all cases the performance seem to drop for some cases. |
Hi @orionw , I did found out the reason behind the performance drop and proposed an effective solution in my recent NAACL paper. You can find it here - https://arxiv.org/pdf/2404.02489.pdf |
Awesome @cramraj8! Thank you, I'm very excited to read the paper 🙏 |
Hi there! Great work - this is a very interesting line of research!
I was hoping to replicate your results on BEIR but seem to be having some trouble. For example, in both InPars v1 and v2 papers you mention using a learning rate of 1e−3, but I can't find any example scripts that use that (in legacy or otherwise, they seem to use 3e-4). When I use the hyperparameters from the papers (or the default example), I am getting much worse results.
I'm sure it's just some config that I'm missing from reading the papers/code, but if you happen to have the commands that reproduce the numbers in the paper I'd really appreciate it!
Thanks for your time!
The text was updated successfully, but these errors were encountered: