-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save tensors in lower precision #273
Save tensors in lower precision #273
Conversation
inseq-team#202 Adds functionality for saving feature attributions objects and tensors in float16 or float8 format, depending on `scores_precision` parameters. Tensors are saved in huggingface safetensor format, and quantized using zeropoint quantization. Because safetensors are bytes objects, they are encoded with b64 to be saved in the output json and decoded upon reloading.
* Add device_map support * Fix device setter in HF model
…nseq into feature/score_precision
Hey @LuukSuurmeijer, thanks a lot for this PR! I had a look and added some very minor fixes (add a Literal type for the allowed precision strings, added a docstring for the new parameter in import torch
from inseq import load_model, FeatureAttributionOutput
saliency_mt_model = load_model("Helsinki-NLP/opus-mt-en-it", "attention")
out_path = "tmp_attr_8bit.json"
out = saliency_mt_model.attribute("This is a test.", device="cpu", show_progress=False)
out.save(out_path, scores_precision="float8", overwrite=True)
loaded_out = FeatureAttributionOutput.load(out_path)
assert torch.allclose(
out.sequence_attributions[0].source_attributions,
loaded_out.sequence_attributions[0].source_attributions,
atol=1e-02,
) You get an error in the parsing of the JSON metadata header. From a very quick exploration, it seems like this is caused by the selection of the header |
The json decode error seemed to be a one-off error with quantizing to 8bit. I managed to reproduce the error even without |
Hi @LuukSuurmeijer, The code still had some issues due to FP8 conversion not handling |
Description
Added support for saving attributions in a lower tensor precision.
Upon saving, tensors are transformed to hugginface safetensors. Then they are optionally quantized to float16, int8 or uint8 (if there are no negative values) using zeropoint quantization. The quantization parameters are stored in the safetensor object to recover the float32 values upon loading. Safetensors are bytes objects, so they need to be base64 encoded to be written to JSON.
List of changes:
save
has an extra parameterscores_precision
with default valuefloat32
FeatureAttributionSequenceOutput
has two new private methods:_convert_to_safetensors
and_recover_from_safetensors
in order to convert the object's tensors from torch tensors to safetensors and viceversa. They are used in saving / loading respectively.torch_utils
,convert_to_safetensor
anddequantize_safetensor
that converts a tensor both ways respectivelytest_attribution.py
This is my first PR on this project and first time properly diving into inseq, so please be critical and help me improve the feature! There are several points where I am not sure about the implementation:
FeatureAttributionSequenceOutput
torch_utils
functions? I saw that most of them do not have unit tests, but am happy to add themAll tests run clean with no errors.
Related Issue
issue 202
Type of Change
Checklist
CODE_OF_CONDUCT.md
document.CONTRIBUTING.md
guide.make fix-style
.make test
.