-
Notifications
You must be signed in to change notification settings - Fork 304
Issues: TransformerLensOrg/TransformerLens
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug Report] Global and Local Attn layer order of Gemma2 is wrong?
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
implementation-inaccuracy
Any issues related to our implementation being off from the official version
#778
opened Nov 9, 2024 by
huangxt39
[Bug Report] use_past_kv_cache yields weird outputs when used with Bloom model family
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
#776
opened Nov 8, 2024 by
degenfabian
1 task done
[Proposal] prepend_bos should by default be set to false for the Bloom model family
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
#774
opened Nov 8, 2024 by
degenfabian
1 task done
[Question] Would it be possible to adopt TransformerLens on models with a different layernorm implementation?
complexity-high
Very complicated changes for people to address who are quite familiar with the code
question
Further information is requested
#773
opened Nov 8, 2024 by
Steven-Yiran
[Question] compatibility for 'Qwen/Qwen2.5-14B'
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
model-request
Any issues related to requesting additional model support
#762
opened Oct 25, 2024 by
hgftrdw45ud67is8o89
[Proposal] Ensure TransformerLens does not load from hugging face when config is passed in
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
#754
opened Oct 11, 2024 by
hamind
1 task done
[Bug Report] hook_normalized is inconsistent between RMSNorm and LayerNorm
breaking-change
bug
Something isn't working
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
#747
opened Oct 6, 2024 by
neelnanda-io
[Proposal] Add example of collecting activations from a single layer.
demo
Creating a demo or tutorial
#746
opened Oct 5, 2024 by
adamkarvonen
1 task done
[Bug Report] Q cannot be reshaped correctly when model is loaded in 4bit
bug
Something isn't working
needs-investigation
Issues that need to be recreated, or investigated before work can be done
#737
opened Sep 28, 2024 by
po13on
Fine tune model and using this framework
needs-information
More information is needed from the issue creator before moving forward.
question
Further information is requested
#730
opened Sep 26, 2024 by
nitay16
[Proposal] Guide to adding new models
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
documentation
Improvements or additions to documentation
#729
opened Sep 26, 2024 by
deven367
1 task done
[Bug Report] Review current matmul function usages
bug
Something isn't working
complexity-high
Very complicated changes for people to address who are quite familiar with the code
#720
opened Sep 10, 2024 by
bryce13950
1 task done
[Proposal] Add MVP Support For 1-2 Models Per-Modality
complexity-high
Very complicated changes for people to address who are quite familiar with the code
discussion
No action needed yet
#710
opened Aug 31, 2024 by
4gatepylon
1 task done
[Proposal] Add support for TracrBench
complexity-high
Very complicated changes for people to address who are quite familiar with the code
new-architecture
This card involves adding a new architecture .
#704
opened Aug 14, 2024 by
HannesThurnherr
How to get the Activation cache while the LLM is generating new tokens?
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
#697
opened Aug 7, 2024 by
Meehaohao
[Bug Report] Gemma-2-2b-it output logit doesn't match with huggingface
complexity-high
Very complicated changes for people to address who are quite familiar with the code
implementation-inaccuracy
Any issues related to our implementation being off from the official version
#693
opened Aug 2, 2024 by
yeutong
1 task done
[Bug Report] Different results from HuggingFace when using the GPT2 small example
complexity-high
Very complicated changes for people to address who are quite familiar with the code
implementation-inaccuracy
Any issues related to our implementation being off from the official version
needs-investigation
Issues that need to be recreated, or investigated before work can be done
#685
opened Jul 27, 2024 by
nreHieW
1 task done
[Proposal] Expand quantization model support
complexity-high
Very complicated changes for people to address who are quite familiar with the code
#684
opened Jul 26, 2024 by
miguel-kjh
[Bug Report] Qwen model implementation is too inaccurate
complexity-high
Very complicated changes for people to address who are quite familiar with the code
implementation-inaccuracy
Any issues related to our implementation being off from the official version
needs-investigation
Issues that need to be recreated, or investigated before work can be done
#683
opened Jul 23, 2024 by
bryce13950
1 task done
[Proposal] Allow tied embeddings
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
enhancement
New feature or request
#671
opened Jul 12, 2024 by
neelnanda-io
ValueError: microsoft/Phi-3-mini-128k-instruct not found.
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
model-request
Any issues related to requesting additional model support
#670
opened Jul 12, 2024 by
joykirat18
does run_with_cache method support data parallel , how can I do it ?
#669
opened Jul 12, 2024 by
Yang-bug-star
[Proposal] Allow recent versions of beartype
complexity-simple
Simple issues, which may be good for beginners
tooling
Anything pertaining to outside tools used within the codebase
#665
opened Jul 10, 2024 by
jettjaniak
1 task done
[Bug Report] Pythia output inconsistent across batch sizes when use_split_qkv_input=True
bug
Something isn't working
complexity-high
Very complicated changes for people to address who are quite familiar with the code
implementation-inaccuracy
Any issues related to our implementation being off from the official version
#661
opened Jul 8, 2024 by
oliveradk
1 task done
Previous Next
ProTip!
Adding no:label will show everything without a label.