Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Test Strategy #41

Open
4 of 5 tasks
svenseeberg opened this issue Sep 25, 2024 · 6 comments
Open
4 of 5 tasks

Performance Test Strategy #41

svenseeberg opened this issue Sep 25, 2024 · 6 comments
Labels
analysis Analyse/comparative study of features component:chat Chat Back End

Comments

@svenseeberg
Copy link
Member

svenseeberg commented Sep 25, 2024

We want to do performance testing on our different modules:

  1. Embedding model (already done?)
  2. Chunking methods: Test chunking strategies #38
  3. Prompt
  4. LLM Evaluate MiniLLM Performance #10
  5. Translation models

3 of the above mentioned components should be fixed, while we change one of them and test different approaches with our benchmark questions.

Benchmark questions in order of their priority and based on our user stories:

  1. simple question: How can I learn German?
  2. simple question with complicated words: I need to know the German language for a job. What do I need to do?
  3. question with no answer in content: When was JFK assassinated?
  4. complicated question (double question, more context, etc): How can a 17 years old person from Ukraine learn German?
  5. malformed question (spelling / grammar mistakes): I are Ukraina. Need job.
@svenseeberg svenseeberg changed the title Testing-Strategy Performance Test Strategy Sep 25, 2024
@svenseeberg svenseeberg added this to the v3 Basic Answer Retrieval milestone Sep 25, 2024
@svenseeberg svenseeberg added the analysis Analyse/comparative study of features label Sep 25, 2024
@steffenkleinle
Copy link
Member

Possible training data with questions about integreat content: https://huggingface.co/datasets/digitalfabrik/integreat-qa
The questions are relatively simple and well phrased, so only cover a subsection of cases mentioned above.

@svenseeberg
Copy link
Member Author

svenseeberg commented Oct 1, 2024

Tests based on 9f57f80 (llama3.1:8b, skip questions with no matching documents, chunking at h2 tags)

  1. You can learn German through SPEAK's online language learning groups, where you can interact with others and meet daily for 90 minutes over 2 weeks. Alternatively, you can use various online resources such as Mein Deutschbuch, Deutsch-Uni Online (DUO), and the Basic language course from Deutsche Welle to study at your own pace.
  2. To learn German for a job in Germany, you should take a vocational German course such as DeuFöV. This will help improve your speaking skills and prepare you for the workplace. You can find more information on how to apply for these courses through the Jobcenter or Employment Agency.
  3. no answer
  4. A 17-year-old person from Ukraine can learn German through SPEAK's online language courses, which include interactive groups and video conferencing. Additionally, they can use various free apps and websites such as Ankommen, Serlo ABC, Mein Deutschbuch, and Deutsch-Uni Online to learn German at their own pace. Some of these resources also offer placement tests and certification.
  5. no answer

@svenseeberg
Copy link
Member Author

svenseeberg commented Oct 1, 2024

I need to know the German language for a job. What do I need to do?

Does not always yield a result. It seems that in 1 of 4 cases the message is not classified as a question that requires an answer.

This was referenced Oct 1, 2024
@svenseeberg
Copy link
Member Author

svenseeberg commented Oct 4, 2024

Another interesting prompt:

Is there a cinema in Munich that shows English movies?

{
  "answer": "I don't know. The provided context does not mention cinemas or movie showings in Munich.",
  "sources": [
    "/muenchen/en/culture-leisure-sport/general-information/",
    "/muenchen/en/culture-leisure-sport/be-creative/youth-theatre-workshop-in-the-bellevue-di-monaco/",
    "/muenchen/en/culture-leisure-sport/meet-people/meetings-in-the-neighbourhood/"
  ],
  "details": [
    {
      "source": "/muenchen/en/culture-leisure-sport/be-creative/youth-theatre-workshop-in-the-bellevue-di-monaco/",
      "score": 0.7928134202957153
    },
    {
      "source": "/muenchen/en/culture-leisure-sport/general-information/",
      "score": 0.855070948600769
    },
    {
      "source": "/muenchen/en/culture-leisure-sport/meet-people/meetings-in-the-neighbourhood/",
      "score": 1.0023198127746582
    }
  ],
  "status": "success"
}

@svenseeberg svenseeberg added the component:chat Chat Back End label Oct 4, 2024
@svenseeberg
Copy link
Member Author

Another test question with frequent bad results:

Hi I'm from Afghanistan and 17 years old. How can I learn German?

@svenseeberg
Copy link
Member Author

svenseeberg commented Oct 16, 2024

We tried to get more consistent documents from Milvus (see #60) with flat indexes but still got varying results. The only possible conclusion: the embedding model is producing different vectors for the same query.

*edit: see #61 (comment)

Another observation: the chunking (and chunk encoding) might be problematic as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Analyse/comparative study of features component:chat Chat Back End
Projects
None yet
Development

No branches or pull requests

2 participants