Model won´t query when enabling Rerank in settings-local.yaml[BUG] #2032

Castolus · 2024-08-02T08:28:51Z

Pre-check

I have searched the existing issues and none cover this bug.

Description

When using a different Embeddings Model than the project´s default (BAAI/bge-small-en-v1.5), i get some isues when i enable Rerank. If I use the defaul model, everything works fine. If I change to a different Embeddings Model (with different dimensions value), I see that my model only uses the ammount of tokens of my "System Prompt", but it won´t retrieve the info from Qdrant.

As a test, i deleted my Qdrant DB and ingested a bunch of documents on "Local Mode" with my settings set up, so that i can discard it is a problem of dimensionallity. Unfortunatelly, it happened again.

Steps to Reproduce

Use this set up on your settings-local.yaml

llm:
  mode: llamacpp
  # Should be matching the selected model
  max_new_tokens: 4096
  context_window: 8192
  # Select your tokenizer. Llama-index tokenizer is the default.
  tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct #DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1
  prompt_style: "llama3"
  temperature: 0.6     #The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)

rag:
  similarity_top_k: 10 #6
  #This value controls how many "top" documents the RAG returns to use in the context.
  similarity_value: 0.45
  #This value is disabled by default.  If you enable these settings, the RAG will only use articles that meet a certain percentage score.
  rerank:
    enabled: true
    model: cross-encoder/ms-marco-MiniLM-L-2-v2
    top_n: 5

llamacpp:
  llm_hf_repo_id: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF 
  llm_hf_model_file: Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf 

embedding:
  # Should be matching the value above in most cases
  mode: huggingface
  ingest_mode: simple
  embed_dim: 768

huggingface:
  embedding_hf_model_name: jinaai/jina-embeddings-v2-base-de

vectorstore:
  database: qdrant

Modify Line 113 (chat_memory_buffer.py) like this, so you can see on your console the initial tokens count:
```
 if initial_token_count > self.token_limit:
   raise ValueError(f'Initial token count {initial_token_count} exceeds token limit {self.token_limit}')
```
print(f'Initial token count: {initial_token_count}. Token limit: {self.token_limit}.')
Ask a question related to the ingested
If Initial token count = the token count your system prompt, then Rerank is creating the bug. Otherwise it is fine.
Disable Rerank, try again. It shouldn´t work fine.
Enable rerank again and check again. It should not work.

Expected Behavior

Initial token count = System prompt token count + query token count

Actual Behavior

Initial token count = System prompt token count.

Environment

Windows 11. Local Mode. GPU Nvidia Quadro M620

Additional Information

No response

Version

v0.5.0

Setup Checklist

Confirm that you have followed the installation instructions in the project’s documentation.
Check that you are using the latest version of the project.
Verify disk space availability for model storage and data processing.
Ensure that you have the necessary permissions to run the project.

NVIDIA GPU Setup Checklist

Check that the all CUDA dependencies are installed and are compatible with your GPU (refer to CUDA's documentation)
Ensure an NVIDIA GPU is installed and recognized by the system (run nvidia-smi to verify).
Ensure proper permissions are set for accessing GPU resources.
Docker users - Verify that the NVIDIA Container Toolkit is configured correctly (e.g. run sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi)

The text was updated successfully, but these errors were encountered:

Castolus added the bug Something isn't working label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model won´t query when enabling Rerank in settings-local.yaml[BUG] #2032

Model won´t query when enabling Rerank in settings-local.yaml[BUG] #2032

Castolus commented Aug 2, 2024 •

edited by jaluma

Loading

Model won´t query when enabling Rerank in settings-local.yaml[BUG] #2032

Model won´t query when enabling Rerank in settings-local.yaml[BUG] #2032

Comments

Castolus commented Aug 2, 2024 • edited by jaluma Loading

Pre-check

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Information

Version

Setup Checklist

NVIDIA GPU Setup Checklist

Castolus commented Aug 2, 2024 •

edited by jaluma

Loading