v0.4.0: Perturbation-based methods, Int8 backward attribution, contrastive step function and more
What’s Changed
Perturbation-based Attribution Methods (#145)
Thanks to @nfelnlp, this version introduces the PerturbationAttributionRegistry
and the OcclusionAttribution
(occlusion
) and LimeAttribution
(lime
) methods, both adapted from Captum's original implementations.
-
Our implementation of Occlusion (Zeiler and Fergus, 2014 estimates feature importance by replacing each input token embedding with a baseline (default: UNK) and computing the difference in output, producing coarse-grained attribution scores (1 per token).
-
LIME (Ribeiro et al. 2016) trains an interpretable surrogate model by sampling points around a specified input example and using model evaluations at these points to train a simpler interpretable ‘surrogate’ model, such as a linear model. We adapt the implementation by Atanasova et al. for usage in the generative setting.
Attribute bitsandbytes
Int8 Quantized Models (#163)
Since the 0.37 release of bitsandbytes
, efficient matrix multiplication backward is enabled for all int8-quantized models loaded with 🤗 Transformers. In this release we support attributing int8 models with attribution methods relying on a backward pass (e.g. integrated_gradients
, saliency
). In the following simple example, we attribute the generation steps of a quantized GPT-2 1.5B model using the input_x_gradient
method, with the whole process requiring less than 6GB of GPU RAM:
import inseq
from transformers import AutoModelForCausalLM
hf_model = AutoModelForCausalLM.from_pretrained("gpt2-xl", load_in_8bit=True, device_map="auto")
inseq_model = inseq.load_model(hf_model, "input_x_gradient", tokenizer="gpt2-xl")
out = inseq_model.attribute("Hello world, this is the Inseq", generation_args = {"max_new_tokens": 20})
out.show()
Contrastive and Uncertainty-weighted Attribution (#166)
This release introduces two new pre-registered step functions, contrast_prob_diff
and mc_dropout_prob_avg
.
-
contrast_prob_diff
computes the difference in probability between a generation target (e.g.All the dogs are barking loudly
) and a contrastive alternative (e.g.All the dogs are crying strongly
) at every generation step, with the constraint of having a 1-1 token correspondence between the two strings. If used asattributed_fn
inmodel.attribute
, it corresponds to the Contrastive Attribution setup by Yin and Neubig, 2022. See -
mc_dropout_prob_avg
computes an uncertainty-weighted estimate of each generated token's probabilities usingn_mcd_steps
of the Monte Carlo Dropout method. If used as an attributed function instead of vanillaprobability
it can produce more robust attribution scores at the cost of more computation.
See this tutorial in the documentation for a reference on how to register and use custom attributed functions.
Multilingual MT and Factual Information Location Examples (#166)
Inseq documentation contains two new examples:
-
Attributing Multilingual MT Models shows how to use Inseq to attribute the generations of multilingual MT models like M2M100 and NLLB, which require setting target language flags before generation.
-
Locating Factual Knowledge in GPT-2 shows how layer-specific attribution methods can be used to obtain intermediate attributions of language models like GPT-2. Using the quantized and contrastive attribution approaches described above, the example reproduces some observations made by Meng et al. 2022 on the localization of factual knowledge in large language models.
All Merged PRs
🚀 Features
- Add OcclusionAttribution and LimeAttribution (#145) @nfelnlp
bitsandbytes
compatibility (#163 (#163) @gsarti
🔧 Fixes & Refactoring
- Demo Release Changes (#166) @gsarti
- Fix EOS baseline for models with
pad_token_id
!= 0 (#165) @gsarti
👥 List of contributors
This release wouldn't have been possible without the contributions of these amazing folks. Thank you!