guidance load models as int8? #110
Replies: 4 comments 3 replies
-
I know this doesn't answer your question about int8, but might still be helpful- I got guidance working with GPTQ, allowing you to use 4bit quantized models. It was pretty easy, I just took the model-loading code from GPTQ and then wrote a small subclass of Here's the raw code. It's quite hacky, and can certainly be improved, but for experimenting it does the job. I wrote more about it here. |
Beta Was this translation helpful? Give feedback.
-
To reduce the VRAM, you can use GPTQ-for-LLaMa.
You may check my code to load the wizard-mega-13B-GPTQ. Hope this may help you! |
Beta Was this translation helpful? Give feedback.
-
You can simply initialize the model and tokenizer yourself, even with peft. import torch
from peft import PeftModelForCausalLM as PeftCls
from transformers import AutoModelForCausalLM as ModelCls
from transformers import AutoTokenizer as TkCls
import guidance
model_path = "/path/to/model"
peft_path = "/path/to/peft"
use_peft = True
model: ModelCls = ModelCls.from_pretrained(
model_path,
cache_dir=cache_dir,
device_map="auto",
load_in_8bit=True,
torch_dtype=torch.float16,
)
if use_peft:
model: PeftCls = PeftCls.from_pretrained(model, peft_path)
tokenizer: TkCls = TkCls.from_pretrained(model_path)
guidance.llm = guidance.llms.Transformers(model=model, tokenizer=tokenizer) |
Beta Was this translation helpful? Give feedback.
-
@andysalerno Hey, thanks for sharing your work on the GPTQ integration. How did things go with this? Are you still using Guidance or have you moved on? It seems to be a dead project, but I'm struggling to see what people are using instead. LMQL might be a viable alternative, but the syntax seems overly-complex compared to Guidance. It would be good to know what you thought of Guidance overall. (I still haven't gotten it working locally on a 4090, but am starting to wonder if I'm wasting my time). Cheers :) |
Beta Was this translation helpful? Give feedback.
-
how do i call guidance but laod the models as int8 so i can fit them on even an 80Gb GPU?
Beta Was this translation helpful? Give feedback.
All reactions