-
Notifications
You must be signed in to change notification settings - Fork 1k
Issues: EleutherAI/gpt-neox
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Training crashes when "(hidden_size * num_kv_heads) / (num_attention_heads * num_attention_heads)" is not an integer.
bug
Something isn't working
#1314
opened Oct 28, 2024 by
tiandeyu-cs
DeeperSpeed cannot support BFloat16 and PipelineParallelism
bug
Something isn't working
#1307
opened Oct 15, 2024 by
jahatef
Error with rotary embeddings and BFloat16
bug
Something isn't working
#1305
opened Oct 15, 2024 by
jahatef
Allow training without knowing num_iters
feature request
New feature or request
#1268
opened Sep 6, 2024 by
StellaAthena
How to Load Model from pytorch_model.bin into Trained Model for Text Generation?
feature request
New feature or request
#1254
opened Jul 15, 2024 by
lieh1203
what's the biggest dataset you've tried?
bug
Something isn't working
#1253
opened Jul 15, 2024 by
exnx
Assertion Error when Setting pipe_parallel_size or model_parallel_size in GPT-NeoX
bug
Something isn't working
#1251
opened Jul 10, 2024 by
lieh1203
batch_input and elapsed time per iteration suddenly slow down during model training
bug
Something isn't working
#1248
opened Jun 29, 2024 by
Yuhanleeee
LoRA Support
feature request
New feature or request
#1204
opened Apr 23, 2024 by
Quentin-Anthony
4 tasks
My servers used for multi-node training do not have ssh. How can I launch multi-node training using the torchrun command?
feature request
New feature or request
#1203
opened Apr 23, 2024 by
dingning97
PyTorch Lightning Fused optimizer step
feature request
New feature or request
#1160
opened Feb 29, 2024 by
jahatef
Tests fail when run with pytest --forked
bug
Something isn't working
#1132
opened Jan 25, 2024 by
segyges
[BUG?] Higher "gradient_accumulation_steps" still increases memory usage a lot
bug
Something isn't working
#1123
opened Jan 15, 2024 by
exnx
Create Singularity Container
feature request
New feature or request
good first issue
Good for newcomers
help wanted
This issue needs assistance
#1119
opened Jan 11, 2024 by
Quentin-Anthony
Integrate TransformerEngine
feature request
New feature or request
#1098
opened Dec 21, 2023 by
Quentin-Anthony
Interoperability and GPT-NeoX
documentation
Improvements or additions to documentation
question
#1058
opened Oct 12, 2023 by
StellaAthena
Support for Mosaic Models
feature request
New feature or request
#1057
opened Oct 6, 2023 by
rajveer43
[BUG] Inconsistent loss between Something isn't working
overlap_comm=true
and overlap_comm=false
bug
#1004
opened Jul 27, 2023 by
0x6b64
Convert HF Llama Checkpoints to Neox Checkpoints
feature request
New feature or request
#994
opened Jul 10, 2023 by
sxthunder
AssertionError: zero stage 1 requires an optimizer
bug
Something isn't working
good first issue
Good for newcomers
help wanted
This issue needs assistance
#987
opened Jul 4, 2023 by
yonglianglan
How to preserve Pythia's sampling order but for different batch size.
bug
Something isn't working
#984
opened Jul 3, 2023 by
lintangsutawika
Why we need to average LayerNorm values over mp ranks when converting to HFformat checkpoint?
#983
opened Jun 26, 2023 by
forceshorty
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.