-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Problem loading EXL2 in rc_054 #561
Comments
I've been working on this same problem with a Llama-3.1 exl2 model against rc054. I think part of the problem is that the model.safetensors.index.json file is pointing at files that do not exist. For example, in the turboderp Llama 3.1 8B repository model.safetensors.index.json is calling for "model-00004-of-00004.safetensors" which doesn't exist in that exl2 repo (there is a single model shard called output.safetensors). This problem also exists in other Llama 3.1 exl2 repositories (e.g., two 70B repositories LoneStriker/Meta-Llama-3.1-70B-Instruct-4.0bpw-h6-exl2 and turboderp/Llama-3.1-70B-Instruct-exl2) I had claude write me a script to introspect into the safetensors file and generate a corrected model.safetensors.index.json file (see below). This gets rid of the error that it cannot find any model weights but there are further problems.: First, it errors that "ValueError: torch.bfloat16 is not supported for quantization method exl2. Supported dtypes: [torch.float16]" Suppose you try to fix this by explicitly setting the --dtype to float16 (I'm not sure if this is legitimate or not), then you run into a further error (also below). I looked at the code for a fair bit and tried to debug, but I think this perhaps is because the exl2 model is quantizing lm_head whereas the loader is not expecting this to be quantized? So, possible actionable fixes for aphrodite:
It is possible I do not understand the issues in enough detail to have pinned this down. Any other information you have on this topic would be much appreciated. Much appreciate your wonderful work on this project.
Introspection script below to re-write the models.safetensors.index.json file from analysis of the safetensors files. import json
from safetensors import safe_open
import os
def get_weights_in_file(filename):
weights = {}
with safe_open(filename, framework="pt", device="cpu") as f:
for key in f.keys():
tensor = f.get_tensor(key)
weights[key] = {
"shape": list(tensor.shape),
"dtype": str(tensor.dtype)
}
return weights
def create_model_index(directory):
weight_map = {}
metadata = {"total_size": 0}
for filename in os.listdir(directory):
if filename.endswith('.safetensors'):
full_path = os.path.join(directory, filename)
file_size = os.path.getsize(full_path)
metadata["total_size"] += file_size
weights = get_weights_in_file(full_path)
for weight_name in weights:
weight_map[weight_name] = filename
index = {
"metadata": metadata,
"weight_map": weight_map
}
return index
def write_model_index(directory, output_file="model.safetensors.index.json"):
index = create_model_index(directory)
output_path = os.path.join(directory, output_file)
with open(output_path, 'w') as f:
json.dump(index, f, indent=2)
print(f"Created {output_file} with {len(index['weight_map'])} weights mapped.")
if __name__ == "__main__":
import sys
if len(sys.argv) < 2:
print("Usage: python script_name.py <directory_path>")
sys.exit(1)
directory = sys.argv[1]
write_model_index(directory) |
Exl2 is currently broken in the rc_054 branch. Please read the PR description at #481 |
When will exl2 be support again? or Should I stick to 0.5.3 until the problem is resolved? |
Your current environment
🐛 Describe the bug
When trying to run any exl2 model, the following error occurs, even if you manually download the model and specify the path for it:
The text was updated successfully, but these errors were encountered: