Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model won´t query when enabling Rerank in settings-local.yaml[BUG] #2032

Open
8 of 9 tasks
Castolus opened this issue Aug 2, 2024 · 0 comments
Open
8 of 9 tasks
Labels
bug Something isn't working

Comments

@Castolus
Copy link

Castolus commented Aug 2, 2024

Pre-check

  • I have searched the existing issues and none cover this bug.

Description

When using a different Embeddings Model than the project´s default (BAAI/bge-small-en-v1.5), i get some isues when i enable Rerank. If I use the defaul model, everything works fine. If I change to a different Embeddings Model (with different dimensions value), I see that my model only uses the ammount of tokens of my "System Prompt", but it won´t retrieve the info from Qdrant.

As a test, i deleted my Qdrant DB and ingested a bunch of documents on "Local Mode" with my settings set up, so that i can discard it is a problem of dimensionallity. Unfortunatelly, it happened again.

Steps to Reproduce

  1. Use this set up on your settings-local.yaml
llm:
  mode: llamacpp
  # Should be matching the selected model
  max_new_tokens: 4096
  context_window: 8192
  # Select your tokenizer. Llama-index tokenizer is the default.
  tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct #DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1
  prompt_style: "llama3"
  temperature: 0.6     #The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)

rag:
  similarity_top_k: 10 #6
  #This value controls how many "top" documents the RAG returns to use in the context.
  similarity_value: 0.45
  #This value is disabled by default.  If you enable these settings, the RAG will only use articles that meet a certain percentage score.
  rerank:
    enabled: true
    model: cross-encoder/ms-marco-MiniLM-L-2-v2
    top_n: 5

llamacpp:
  llm_hf_repo_id: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF 
  llm_hf_model_file: Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf 

embedding:
  # Should be matching the value above in most cases
  mode: huggingface
  ingest_mode: simple
  embed_dim: 768

huggingface:
  embedding_hf_model_name: jinaai/jina-embeddings-v2-base-de

vectorstore:
  database: qdrant
  1. Modify Line 113 (chat_memory_buffer.py) like this, so you can see on your console the initial tokens count:

     if initial_token_count > self.token_limit:
       raise ValueError(f'Initial token count {initial_token_count} exceeds token limit {self.token_limit}')
    

    print(f'Initial token count: {initial_token_count}. Token limit: {self.token_limit}.')

  2. Ask a question related to the ingested

  3. If Initial token count = the token count your system prompt, then Rerank is creating the bug. Otherwise it is fine.

  4. Disable Rerank, try again. It shouldn´t work fine.

  5. Enable rerank again and check again. It should not work.

Expected Behavior

Initial token count = System prompt token count + query token count

Actual Behavior

Initial token count = System prompt token count.

Environment

Windows 11. Local Mode. GPU Nvidia Quadro M620

Additional Information

No response

Version

v0.5.0

Setup Checklist

  • Confirm that you have followed the installation instructions in the project’s documentation.
  • Check that you are using the latest version of the project.
  • Verify disk space availability for model storage and data processing.
  • Ensure that you have the necessary permissions to run the project.

NVIDIA GPU Setup Checklist

  • Check that the all CUDA dependencies are installed and are compatible with your GPU (refer to CUDA's documentation)
  • Ensure an NVIDIA GPU is installed and recognized by the system (run nvidia-smi to verify).
  • Ensure proper permissions are set for accessing GPU resources.
  • Docker users - Verify that the NVIDIA Container Toolkit is configured correctly (e.g. run sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi)
@Castolus Castolus added the bug Something isn't working label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant