You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and none cover this bug.
Description
When using a different Embeddings Model than the project´s default (BAAI/bge-small-en-v1.5), i get some isues when i enable Rerank. If I use the defaul model, everything works fine. If I change to a different Embeddings Model (with different dimensions value), I see that my model only uses the ammount of tokens of my "System Prompt", but it won´t retrieve the info from Qdrant.
As a test, i deleted my Qdrant DB and ingested a bunch of documents on "Local Mode" with my settings set up, so that i can discard it is a problem of dimensionallity. Unfortunatelly, it happened again.
Steps to Reproduce
Use this set up on your settings-local.yaml
llm:
mode: llamacpp
# Should be matching the selected model
max_new_tokens: 4096
context_window: 8192
# Select your tokenizer. Llama-index tokenizer is the default.
tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct #DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1
prompt_style: "llama3"
temperature: 0.6 #The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)
rag:
similarity_top_k: 10 #6
#This value controls how many "top" documents the RAG returns to use in the context.
similarity_value: 0.45
#This value is disabled by default. If you enable these settings, the RAG will only use articles that meet a certain percentage score.
rerank:
enabled: true
model: cross-encoder/ms-marco-MiniLM-L-2-v2
top_n: 5
llamacpp:
llm_hf_repo_id: lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
llm_hf_model_file: Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
embedding:
# Should be matching the value above in most cases
mode: huggingface
ingest_mode: simple
embed_dim: 768
huggingface:
embedding_hf_model_name: jinaai/jina-embeddings-v2-base-de
vectorstore:
database: qdrant
Modify Line 113 (chat_memory_buffer.py) like this, so you can see on your console the initial tokens count:
Confirm that you have followed the installation instructions in the project’s documentation.
Check that you are using the latest version of the project.
Verify disk space availability for model storage and data processing.
Ensure that you have the necessary permissions to run the project.
NVIDIA GPU Setup Checklist
Check that the all CUDA dependencies are installed and are compatible with your GPU (refer to CUDA's documentation)
Ensure an NVIDIA GPU is installed and recognized by the system (run nvidia-smi to verify).
Ensure proper permissions are set for accessing GPU resources.
Docker users - Verify that the NVIDIA Container Toolkit is configured correctly (e.g. run sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi)
The text was updated successfully, but these errors were encountered:
Pre-check
Description
When using a different Embeddings Model than the project´s default (BAAI/bge-small-en-v1.5), i get some isues when i enable Rerank. If I use the defaul model, everything works fine. If I change to a different Embeddings Model (with different dimensions value), I see that my model only uses the ammount of tokens of my "System Prompt", but it won´t retrieve the info from Qdrant.
As a test, i deleted my Qdrant DB and ingested a bunch of documents on "Local Mode" with my settings set up, so that i can discard it is a problem of dimensionallity. Unfortunatelly, it happened again.
Steps to Reproduce
Modify Line 113 (chat_memory_buffer.py) like this, so you can see on your console the initial tokens count:
print(f'Initial token count: {initial_token_count}. Token limit: {self.token_limit}.')
Ask a question related to the ingested
If Initial token count = the token count your system prompt, then Rerank is creating the bug. Otherwise it is fine.
Disable Rerank, try again. It shouldn´t work fine.
Enable rerank again and check again. It should not work.
Expected Behavior
Initial token count = System prompt token count + query token count
Actual Behavior
Initial token count = System prompt token count.
Environment
Windows 11. Local Mode. GPU Nvidia Quadro M620
Additional Information
No response
Version
v0.5.0
Setup Checklist
NVIDIA GPU Setup Checklist
nvidia-smi
to verify).sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
)The text was updated successfully, but these errors were encountered: