[IDEA] Support other quantizations #654
Replies: 3 comments 2 replies
-
Nevermind, for the first one, I got it. I symlinked the files from my model folder and renamed: mistral-7b-instruct-v0.2.Q5_K_M.gguf to mistral-7b-instruct-v0.2.Q5_K_M.gguf3.gguf Still curious about the second one though! |
Beta Was this translation helpful? Give feedback.
-
Nice! The symlink was to allow mistral-7b-instruct-v0.2 with the Q5_K_M quantization to work? What's the response quality? Maybe also try some of the other higher-quality Mistral fine-tunes like OpenChat-0106
Try the docs on setting up an OpenAI compatible proxy server to use whatever model you want. Let me know if that doesn't work? PS: Converting this issue into a Github discussion for now |
Beta Was this translation helpful? Give feedback.
-
Thank you! I tend to generally use Q5_K_M or Q6 if available, and anecdotally, they are more coherent/sensible responses than Q4. I'm using it for chatting with a bunch of articles, raise questions to improve my skills. I also tend to use 13B/34B models at times, so switching the models is a boon. The symlink was there just so I needn't make a copy of the file, but for some reason when I had the name as just "mistral-7b-instruct-v0.2.Q5_K_M.gguf" it was giving me internal server error, but this seemed to work "mistral-7b-instruct-v0.2.Q5_K_M.gguf3.gguf". Thank you for the docs link! I'm not sure how I missed it! |
Beta Was this translation helpful? Give feedback.
-
Hi!
Maybe I overlooked the documentation, but is there a way to:
Beta Was this translation helpful? Give feedback.
All reactions