Releases: TransformerLensOrg/TransformerLens
v1.8.1
v1.8.0
What's Changed
- Add status icons to the readme by @alan-cooney in #415
- Improve attention masking and key-value caching. by @UFO-101 in #386
- Remove Python 3.7 support by @alan-cooney in #423
- Update docs to show prepend_bos and padding_side are optional by @UFO-101 in #418
- Add Python 3.11 Support by @alan-cooney in #425
- Return the current residual when stop_at_layer is not None. by @UFO-101 in #420
- Add contributing instructions by @alan-cooney in #426
Full Changelog: v1.7.0...v1.8.0
v1.7.0
What's Changed
- Add start_at_layer parameter to HookedTransformer by @UFO-101 in #382
- fix: Set IGNORE value in mask to -torch.inf by @connor-henderson in #366
- bug fix attention inf by @jbloomAus in #389
- Bugfix attn left padding by @jbloomAus in #390
- Add developer tooling defaults & VS Code extensions by @alan-cooney in #394
- Fix llama tokenization issue by using tokenizer initialized with add_bos_token=True by @soheeyang in #379
- Move towards consistent commenting/docstring style by @alan-cooney in #395
- Speed up docs generation by @alan-cooney in #396
- Add
hook_attn_in
by @ArthurConmy in #336 - Added explicit dtype option by @neelnanda-io in #388
- Support PyTorch 2 with Poetry by @alan-cooney in #397
- Remove .venv from formatting checks by @alan-cooney in #401
- Add auto-organise imports on save in VsCode by @alan-cooney in #410
- Fix SolidGoldMagikarp tokenization test by @ArthurConmy in #408
- Add a summary docstring to the sub-modules by @alan-cooney in #407
- Remove unused root directory files by @alan-cooney in #406
- Simplify Docs Generation by @alan-cooney in #398
- Fix GitHub pages deploy by @alan-cooney in #412
- Fix optional types in HookedTransformer by @alan-cooney in #403
- Improve Hooked Transformer Docs by @alan-cooney in #400
- Improve API Docs Organization by @alan-cooney in #399
- Add docstring testing by @alan-cooney in #402
- Fix tokenization in utils.test_prompt by @Felhof in #334
New Contributors
- @UFO-101 made their first contribution in #382
- @connor-henderson made their first contribution in #366
Full Changelog: v1.6.1...v1.7.0
v1.6.1
What's Changed
- Add support for left padding by @soheeyang in #344
- Added gated MLP Hooks by @neelnanda-io in #374
- added support for pythia 160m seeds by @will-hath in #377
- Remove lru caching of weights by @ArthurConmy in #381
- Implement
hook_mlp_in
for parallel attention/MLP models by @ArthurConmy in #380
New Contributors
- @will-hath made their first contribution in #377
Full Changelog: v1.6.0...v1.6.1
v1.6.0
What's Changed
- Fix FactoredMatrix bug by @callummcdougall in #367
- Fix to automatically infer add_special_tokens for tokenizer by @soheeyang in #370
Full Changelog: v1.5.0...v1.6.0
(Release requested by @callummcdougall for bugfix),
v1.5.0
What's Changed
- Fix generate() by adding greedy decoding code for do_sample=False by @soheeyang in #358
- Updated readme by @neelnanda-io in #360
- Fix bug in rotary embedding for models other than llama and gpt-neo by @soheeyang in #365
- Switch to beartype by @dkamm in #325
Full Changelog: v1.4.0...v1.5.0
v1.4.0
Note: There is a bug in GPT-J in this version.
What's Changed
- Halve GPU memory when loading by @slavachalnev in #333
- Update to
hook_mlp_in
by @ArthurConmy in #316 names_filter
bug fix by @ArthurConmy in #321- [Ready] Enable Pytorch GPU acceleration for M1 chips by @luciaquirke in #326
- Introduce Global prepend_bos Attribute to HookedTransformer by @soheeyang in #343
- Fix hook_result shape comment by @ckkissane in #347
- Support for reduced precision (#104) by @glerzing in #317
- Added tiny pythia models by @neelnanda-io in #350
- Add Llama-2 7B and 13B models by @ArthurConmy in #352
- Fix API docs by @Smaug123 in #339
- Enhance the API for default_prepend_bos by @soheeyang in #345
- Integrate StableLM (#254) by @glerzing in #354
- add colab buttons to demos by @ckkissane in #359
- Remove n_devices assert in config by @slavachalnev in #357
- Updated readme by @neelnanda-io in #351
- Scalar multiplication by @matthiasdellago in #355
New Contributors
- @soheeyang made their first contribution in #343
- @Smaug123 made their first contribution in #339
- @matthiasdellago made their first contribution in #355
Full Changelog: v1.3.0...v1.4.0
v1.3.0
What's Changed
- fix outdated link in Exploratory Analysis Demo by @daspartho in #259
- Finish patching docs by @ckkissane in #261
- Fix
from_pretrained
withredwood_attn_2l
by @ArthurConmy in #268 - Added list of demos to tutorial section. by @JayBaileyCS in #263
- Improving head detector by @MatthewBaggins in #255
- Optimize imports in HookedTransformer by @rusheb in #260
- Baidicoot main - Implemented functionality for loading mingpt-style models off HF (e.g. othello-gpt) by @jbloomAus in #272
- Upgrade to typeguard 3 by @dkamm in #269
- Install autoformatting tools and add formatting checks to CI by @rusheb in #270
- Add TransformerLens logo to docs and GitHub by @koayon in #273
- Wrap docstrings and comments in HookedTransformer by @luciaquirke in #274
- Format array in test_transformer_lens.py by @rusheb in #275
- Introducing HookedEncoder by @rusheb in #276
- Add tests for tokenization methods by @Aprillion in #280
- Fix broken link in issue template by @rusheb in #278
- Various memory solutions. Ultimately used gc to "hide" memory issue which should be solved soon. by @jbloomAus in #296
- FactoredMatrix getitem (#224) by @glerzing in #295
- Add tiny stories by @Felhof in #292
- from_pretrained custom parameters (#288) by @glerzing in #298
- Add better
__name__
annotation tofull_hook
s by @ArthurConmy in #302 - Multiple minor corrections by @glerzing in #301
- Add get_basic_config util function by @adamyedidia in #294
- Fix bug: HookedEncoder not being moved to GPU by @rusheb in #307
- Fix tokenization tests on GPU by @rusheb in #308
- Add prepend option to
model.add_hook
by @ArthurConmy in #303 - Fix tiny stories model names by @Felhof in #305
- Add
hook_mlp_in
by @ArthurConmy in #313 - Ignore some functions in the documentation (#310) by @glerzing in #312
- Add assertion to refactor_factored_attn_matrices by @ArthurConmy in #320
- Update evals.py to not directly call cuda, instead have default cuda … by @dennis-akar in #324
- Add SVD interpretability feature to TransformerLens by @JayBaileyCS in #311
- Fix svd tests on GPU by @slavachalnev in #330
- Reduce memory use when loading model by @slavachalnev in #327
New Contributors
- @MatthewBaggins made their first contribution in #255
- @koayon made their first contribution in #273
- @luciaquirke made their first contribution in #274
- @Aprillion made their first contribution in #280
- @glerzing made their first contribution in #295
- @Felhof made their first contribution in #292
- @dennis-akar made their first contribution in #324
Full Changelog: v1.2.2...v1.3.0
v1.2.2
What's Changed
Too many commit messages so let's summarise them.
General Features
- Pipeline Parallelism
- Cache now doesn't move tensors across devices unless told to
New Models:
- Redwood 2L
- New Pythia Models
- LLaMA
Analysis Features:
- Add apply_ln to stack_head_results and stack_neuron_results
- Context Manager for Hooks
- Attention Head Detectors
Thanks to all the Contributors!
Many thanks to: @rusheb, @ckkissane, @slavachalnev, @JayBaileyCS, @zshn-gvg, @jbloomAus, @adzcai, @adamyedidia, @ArthurConmy, @bryce13950, @daspartho, @haileyschoelkopf, @0amp
Full Changelog: v1.2.1...v1.2.2
v1.2.1
New minor release with a variety of improvements relating to testing, documentation and development. Transition from torchtyping to jaxtyping is one the most significant changes.
What's Changed
- Replace torchtyping with jaxtyping by @dkamm in #171
- Run poetry lock by @rusheb in #178
- Add
verbose
flag to disable tqdm onmodel.generate(...)
by @afspies in #185 - Make
tracr
plot show outside on colab by @ArthurConmy in #184 - Add positional_embedding_type to model properties table by @ckkissane in #176
- Run
poetry lock --check
in CI by @rusheb in #182 - Slice: doc and tests by @Xmaster6y in #166
- Separate tests into unit and acceptance tests by @rusheb in #191
- Grokking demo by @neelnanda-io in #193
- Configure coverage reports to measure branch coverage by @rusheb in #192
- Silence DeprecationWarning for distutils by @rusheb in #187
- Clone pos embed by @slavachalnev in #194
- Add pos embed hook tests by @slavachalnev in #196
- Fix test command in Readme by @valedan in #197
- Test constructor of FactoredMatrix by @rusheb in #188
- Add helper for logit attribution by @dkamm in #135
- issue and pr templates by @jbloomAus in #203
New Contributors
- @Xmaster6y made their first contribution in #166
- @slavachalnev made their first contribution in #194
- @valedan made their first contribution in #197
Full Changelog: v1.2...v1.2.1