Tutorial 4, FAQ retrieval: all results have near-identical score #2668

stevenhaley · 2022-06-16T10:51:15Z

stevenhaley
Jun 16, 2022

I've run Tutorial 4, Utilising Existing FAQs, and noted that all of the returned answers have almost equal scores, between 0.500 and 0.502. Is this expected? This happened when I ran the colab without any changes. Additionally, it happens whether the question exactly matches an existing question in the database, e.g. "What is a novel coronavirus?", or whether the question is completely made up. This seems odd, since the Extractive Q&A systems provide a score where it meaningfully ranges between 0 and 1.

My use-case is merging a set of results from an extractive Q&A pipeline and results from a FAQ-retrieval pipeline. I'm not sure how to do this if FAQ answers are always 0.5, regardless of how well they actually matched the question.

In short - can I make the FAQ answers have a more useful score?

Answered by masci

Jun 17, 2022

Hi @stevenhaley thanks for raising the issue. In the tutorial we're using the wrong similarity algorithm, indeed you should see a warning when running the example:

WARNING - haystack.nodes.retriever._embedding_encoder -  You are using a Sentence Transformer with the dot_product function. We recommend using cosine instead. This can be set when initializing the DocumentStore

We opened a PR to fix the tutorial.

View full answer

masci · 2022-06-17T08:04:12Z

masci
Jun 17, 2022

Hi @stevenhaley thanks for raising the issue. In the tutorial we're using the wrong similarity algorithm, indeed you should see a warning when running the example:

WARNING - haystack.nodes.retriever._embedding_encoder -  You are using a Sentence Transformer with the dot_product function. We recommend using cosine instead. This can be set when initializing the DocumentStore

We opened a PR to fix the tutorial.

1 reply

stevenhaley Jun 17, 2022
Author

I had seen the warning but didn't realise how significant the change would be, nor how easy it would have been to try it out. Oops 😞. Thanks for the reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial 4, FAQ retrieval: all results have near-identical score #2668

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Tutorial 4, FAQ retrieval: all results have near-identical score #2668

stevenhaley Jun 16, 2022

Replies: 1 comment · 1 reply

masci Jun 17, 2022

stevenhaley Jun 17, 2022 Author

stevenhaley
Jun 16, 2022

Replies: 1 comment 1 reply

masci
Jun 17, 2022

stevenhaley Jun 17, 2022
Author