Skip to content

Issues: TransformerLensOrg/TransformerLens

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

[Bug Report] Global and Local Attn layer order of Gemma2 is wrong? complexity-moderate Moderately complicated issues for people who have intermediate experience with the code implementation-inaccuracy Any issues related to our implementation being off from the official version
#778 opened Nov 9, 2024 by huangxt39
[Bug Report] use_past_kv_cache yields weird outputs when used with Bloom model family complexity-moderate Moderately complicated issues for people who have intermediate experience with the code
#776 opened Nov 8, 2024 by degenfabian
1 task done
[Proposal] prepend_bos should by default be set to false for the Bloom model family complexity-moderate Moderately complicated issues for people who have intermediate experience with the code
#774 opened Nov 8, 2024 by degenfabian
1 task done
[Question] Would it be possible to adopt TransformerLens on models with a different layernorm implementation? complexity-high Very complicated changes for people to address who are quite familiar with the code question Further information is requested
#773 opened Nov 8, 2024 by Steven-Yiran
[Question] compatibility for 'Qwen/Qwen2.5-14B' complexity-moderate Moderately complicated issues for people who have intermediate experience with the code model-request Any issues related to requesting additional model support
#762 opened Oct 25, 2024 by hgftrdw45ud67is8o89
[Proposal] Ensure TransformerLens does not load from hugging face when config is passed in complexity-moderate Moderately complicated issues for people who have intermediate experience with the code
#754 opened Oct 11, 2024 by hamind
1 task done
[Bug Report] hook_normalized is inconsistent between RMSNorm and LayerNorm breaking-change bug Something isn't working complexity-moderate Moderately complicated issues for people who have intermediate experience with the code
#747 opened Oct 6, 2024 by neelnanda-io
[Proposal] Add example of collecting activations from a single layer. demo Creating a demo or tutorial
#746 opened Oct 5, 2024 by adamkarvonen
1 task done
[Bug Report] Q cannot be reshaped correctly when model is loaded in 4bit bug Something isn't working needs-investigation Issues that need to be recreated, or investigated before work can be done
#737 opened Sep 28, 2024 by po13on
Fine tune model and using this framework needs-information More information is needed from the issue creator before moving forward. question Further information is requested
#730 opened Sep 26, 2024 by nitay16
[Proposal] Guide to adding new models complexity-moderate Moderately complicated issues for people who have intermediate experience with the code documentation Improvements or additions to documentation
#729 opened Sep 26, 2024 by deven367
1 task done
[Bug Report] Review current matmul function usages bug Something isn't working complexity-high Very complicated changes for people to address who are quite familiar with the code
#720 opened Sep 10, 2024 by bryce13950
1 task done
[Proposal] Add MVP Support For 1-2 Models Per-Modality complexity-high Very complicated changes for people to address who are quite familiar with the code discussion No action needed yet
#710 opened Aug 31, 2024 by 4gatepylon
1 task done
[Proposal] Add support for TracrBench complexity-high Very complicated changes for people to address who are quite familiar with the code new-architecture This card involves adding a new architecture .
#704 opened Aug 14, 2024 by HannesThurnherr
How to get the Activation cache while the LLM is generating new tokens? complexity-moderate Moderately complicated issues for people who have intermediate experience with the code
#697 opened Aug 7, 2024 by Meehaohao
[Bug Report] Gemma-2-2b-it output logit doesn't match with huggingface complexity-high Very complicated changes for people to address who are quite familiar with the code implementation-inaccuracy Any issues related to our implementation being off from the official version
#693 opened Aug 2, 2024 by yeutong
1 task done
[Bug Report] Different results from HuggingFace when using the GPT2 small example complexity-high Very complicated changes for people to address who are quite familiar with the code implementation-inaccuracy Any issues related to our implementation being off from the official version needs-investigation Issues that need to be recreated, or investigated before work can be done
#685 opened Jul 27, 2024 by nreHieW
1 task done
[Proposal] Expand quantization model support complexity-high Very complicated changes for people to address who are quite familiar with the code
#684 opened Jul 26, 2024 by miguel-kjh
[Bug Report] Qwen model implementation is too inaccurate complexity-high Very complicated changes for people to address who are quite familiar with the code implementation-inaccuracy Any issues related to our implementation being off from the official version needs-investigation Issues that need to be recreated, or investigated before work can be done
#683 opened Jul 23, 2024 by bryce13950
1 task done
[Proposal] Allow tied embeddings complexity-moderate Moderately complicated issues for people who have intermediate experience with the code enhancement New feature or request
#671 opened Jul 12, 2024 by neelnanda-io
ValueError: microsoft/Phi-3-mini-128k-instruct not found. complexity-moderate Moderately complicated issues for people who have intermediate experience with the code model-request Any issues related to requesting additional model support
#670 opened Jul 12, 2024 by joykirat18
[Proposal] Allow recent versions of beartype complexity-simple Simple issues, which may be good for beginners tooling Anything pertaining to outside tools used within the codebase
#665 opened Jul 10, 2024 by jettjaniak
1 task done
[Bug Report] Pythia output inconsistent across batch sizes when use_split_qkv_input=True bug Something isn't working complexity-high Very complicated changes for people to address who are quite familiar with the code implementation-inaccuracy Any issues related to our implementation being off from the official version
#661 opened Jul 8, 2024 by oliveradk
1 task done
ProTip! Adding no:label will show everything without a label.