Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In train.py test() process, almost every output for pred_answer is a single character like 'e' #3

Open
Wu-tn opened this issue Jun 17, 2024 · 6 comments

Comments

@Wu-tn
Copy link

Wu-tn commented Jun 17, 2024

Hi,
I met the question in test() process.

@Wu-tn
Copy link
Author

Wu-tn commented Jun 18, 2024

Hi,
I found another question that in inference_dense.py , faiss generate almost the same 100 passages for every question in train.json, I follow your steps in train_dense.py that install Luyu/co-condenser-wiki in hugginface and train it with the wikipedia-nq in https://github.com/luyug/Dense , I wonder which step that i make a mistake?

@sunnweiwei
Copy link
Owner

Hi! It's strange that the retrieval results are the same. Maybe you could try using this model (https://huggingface.co/Luyu/co-condenser-marco-retriever) to run dense retrieval inference and see if the results are normal?

@Wu-tn
Copy link
Author

Wu-tn commented Jun 20, 2024

Hi,
It is necessary to train the pre-trained co-condenser model on wikipedia-nq dataset or directly use it to encode corpus and query?

@sunnweiwei
Copy link
Owner

This model (https://huggingface.co/Luyu/co-condenser-marco-retriever) has been trained on MS MARCO, so can directly be used to encode the corpus and the query.

@Wu-tn
Copy link
Author

Wu-tn commented Jun 21, 2024

Thanks, I will try it!!!

@Wu-tn
Copy link
Author

Wu-tn commented Jun 21, 2024

By the way, it is possible to provide the 9.pt for download?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants