Skip to content

v0.4.0: Perturbation-based methods, Int8 backward attribution, contrastive step function and more

Compare
Choose a tag to compare
@github-actions github-actions released this 27 Feb 16:17
· 112 commits to main since this release
cb8080c

What’s Changed

Perturbation-based Attribution Methods (#145)

Thanks to @nfelnlp, this version introduces the PerturbationAttributionRegistry and the OcclusionAttribution (occlusion) and LimeAttribution (lime) methods, both adapted from Captum's original implementations.

  • Our implementation of Occlusion (Zeiler and Fergus, 2014 estimates feature importance by replacing each input token embedding with a baseline (default: UNK) and computing the difference in output, producing coarse-grained attribution scores (1 per token).

  • LIME (Ribeiro et al. 2016) trains an interpretable surrogate model by sampling points around a specified input example and using model evaluations at these points to train a simpler interpretable ‘surrogate’ model, such as a linear model. We adapt the implementation by Atanasova et al. for usage in the generative setting.

Attribute bitsandbytes Int8 Quantized Models (#163)

Since the 0.37 release of bitsandbytes, efficient matrix multiplication backward is enabled for all int8-quantized models loaded with 🤗 Transformers. In this release we support attributing int8 models with attribution methods relying on a backward pass (e.g. integrated_gradients, saliency). In the following simple example, we attribute the generation steps of a quantized GPT-2 1.5B model using the input_x_gradient method, with the whole process requiring less than 6GB of GPU RAM:

import inseq
from transformers import AutoModelForCausalLM

hf_model = AutoModelForCausalLM.from_pretrained("gpt2-xl", load_in_8bit=True, device_map="auto")
inseq_model = inseq.load_model(hf_model, "input_x_gradient", tokenizer="gpt2-xl")
out = inseq_model.attribute("Hello world, this is the Inseq", generation_args = {"max_new_tokens": 20})
out.show()

Contrastive and Uncertainty-weighted Attribution (#166)

This release introduces two new pre-registered step functions, contrast_prob_diff and mc_dropout_prob_avg.

  • contrast_prob_diff computes the difference in probability between a generation target (e.g. All the dogs are barking loudly) and a contrastive alternative (e.g. All the dogs are crying strongly) at every generation step, with the constraint of having a 1-1 token correspondence between the two strings. If used as attributed_fn in model.attribute, it corresponds to the Contrastive Attribution setup by Yin and Neubig, 2022. See

  • mc_dropout_prob_avg computes an uncertainty-weighted estimate of each generated token's probabilities using n_mcd_steps of the Monte Carlo Dropout method. If used as an attributed function instead of vanilla probability it can produce more robust attribution scores at the cost of more computation.

See this tutorial in the documentation for a reference on how to register and use custom attributed functions.

Multilingual MT and Factual Information Location Examples (#166)

Inseq documentation contains two new examples:

  • Attributing Multilingual MT Models shows how to use Inseq to attribute the generations of multilingual MT models like M2M100 and NLLB, which require setting target language flags before generation.

  • Locating Factual Knowledge in GPT-2 shows how layer-specific attribution methods can be used to obtain intermediate attributions of language models like GPT-2. Using the quantized and contrastive attribution approaches described above, the example reproduces some observations made by Meng et al. 2022 on the localization of factual knowledge in large language models.

All Merged PRs

🚀 Features

🔧 Fixes & Refactoring

👥 List of contributors

This release wouldn't have been possible without the contributions of these amazing folks. Thank you!

@gsarti @nfelnlp and @lsickert