Releases · linkedin/Liger-Kernel

01 Oct 20:55

shimizust

v0.3.1

1520999

v0.3.1: Patch Release Latest

Latest

Summary

This patch release brings important updates and fixes to Liger-Kernel. Notable changes include:

KLDiv calculation fix: KLDiv now functions correctly with larger vocab sizes
SwiGLU/GeGLU casting fix: Program IDs are now cast to int64 in SwiGLU/GeGLU kernels to prevent memory errors with larger dimensions.
AutoLigerKernelForCausalLM fix: The model now properly passes through all original keyword arguments
Post-init model patching fix: Fix to post-init model patching to ensure HF Trainer integration works correctly
Relaxed transformers dependency: Improve compatibility with a broader range of versions.

What's Changed

Remove debug print statement by @EdoardoLuciani in #247
[Easy] Cast program_id to int64 in SwiGLU/GeGLU kernels by @hansonw in #251
Fix a comment typo in flce by @Tcc0403 in #256
Fix AutoLigerKernelForCausalLM to pass through original kwargs by @shimizust in #263
Update contributing guide for adding a new model by @shivam15s in #260
chore: Add Qwen2.5 and Phi3.5 to Readme by @tyler-romero in #265
rename cuda mode to gpu mode by @msaroufim in #267
Fix sharing a ResBlock layer for each head in Medusa example by @chiwanpark in #269
Fix/kldiv by @S1ro1 in #262
Post-init model patching fix by @shimizust in #280
Relaxed transformers dependency by @shimizust in #270
Disable gemma2 and qwen2_vl tests by @shimizust in #288
Release version 0.3.1 by @shimizust in #286

New Contributors

@EdoardoLuciani made their first contribution in #247
@msaroufim made their first contribution in #267

Full Changelog: v0.3.0...v0.3.1

Contributors

hansonw, chiwanpark, and 7 other contributors

Assets 2

13 Sep 21:45

qingquansong

v0.3.0

793785f

v0.3.0 Release Note

Opening Thoughts

Thank you, everyone! Your overwhelming support continues to fuel our passion for innovation. With your engagement, we've pushed the boundaries further in this release!

We are hosting our 1st IRL event, 'Scaling AI Infra - GPUs, Kernels, LLMs and More'. We will discuss Liger-Kernel and invite speakers to talk about DeepSpeed, SGLang, and the TensorCore team. Please RSVP at our event page.

What's New

🌐 Large Vision Language Model Support

Welcome Qwen-VL, our first venture into the large vision language models! This expansion allows more versatility in applying our solutions across different AI domains.

✨ Patch Kernels on Model Instances

Enhancing flexibility, our latest API update supports model name string and instance as input, streamlining the integration with Hugging Face's SFT trainer. This enhancement ensures that you can easily patch Liger kernels into your models, whether you're starting from scratch or adapting an existing model setup.

🚀 SWIFT Trainer Integration

We're excited to be integrated into the SWIFT Trainer Framework. This integration signifies our commitment to delivering cutting-edge tools that empower the community toward enhancing training efficiency across all supported models.

🔧 New Kernels and Features

KL Divergence Kernel: Dive deeper into model behaviors with our new KL divergence kernel, perfect for those needing model distillation, alignment, and beyond.
Experimental Kernel for Embedding: Explore acceleration possibilities with our experimental kernel that optimizes embedding operations.
Extended Cross Entropy Functionality: Now we support label smoothing and sum reduction, enabling more robust training and flexible loss calculations for neural networks.

Get Involved and Stay Tuned

Join us on our journey! Connect with us on our CUDA MODE server's Discord channel, and don't forget to follow our official account on X for the latest updates: https://x.com/liger_kernel.

A Look Ahead

We're not stopping here! Looking forward, we plan to expand our support to include even more model families and to explore further optimizations and innovative features. Your feedback is invaluable, so please keep it coming as we shape the future of Liger together!

🌟 Acknowledgments

Your contributions make a difference! Thanks to everyone who has starred, contributed, and provided feedback. Each contribution enriches our community and helps us grow stronger together.

What's Changed

Skip Tests for GPUs Not Supporting bf16 by @austin362667 in #159
[Operators] LayerNorm Kernels + LigerLayerNorm by @AndreSlavescu in #169
README: ensure modeling code is patched before model instantiation by @tmm1 in #170
Updated wave snippet to use AutoLigerKernelForCausalLM by @shimizust in #181
[Documentation] LayerNorm added to README by @AndreSlavescu in #180
Remove torch compile from benchmark scripts by @shimizust in #183
Update release guide by @yundai424 in #167
Extract forward/backward core computation bits outside of torch autograd context for easy reuse by @qingquansong in #178
custom Embedding kernel by @AndreSlavescu in #135
Feat/functional api by @S1ro1 in #172
[feat] FusedLinearCrossEntropy support for Mixtral by @ryankert01 in #136
[Docs] Update README to include LigerEmbedding by @AndreSlavescu in #186
compute quantiles for memory usage by @kvignesh1420 in #187
TypoFixed repo_foward -> rope_forward by @LucioPalmucci in #191
Switch Lightning 1 GPU example to Qwen2 0.5B instruct model with 1024 max seq length by @qingquansong in #193
[BUILD] Add pyproject.toml by @AndreSlavescu in #150
ci fix by @AndreSlavescu in #202
Update the casting logic of RMSNorm by @lancerts in #201
Update test_rms_norm.py by @lancerts in #203
Refactored benchmark tests by @shimizust in #196
Update layer_norm.py by @lancerts in #207
Uplift kernel APIs to top level by @austin362667 in #210
Feat: Kl Divergence kernel by @S1ro1 in #194
minor refactor of rms and layernorm by @lancerts in #213
Fix compatibility issue on triton=2.3.1 by @Tcc0403 in #219
Elaborate ack section by @ByronHsu in #222
Add license in ack section by @ByronHsu in #224
Reference Unsloth in header by @momochen in #216
Add label smoothing for cross entropy by @Tcc0403 in #198
Added HF use-case benchmark script by @shimizust in #223
(fix) fix pyproject.toml by @wizyoung in #218
Update swiglu and geglu forward: zeros_like -> empty_like by @IvanYashchuk in #217
add repr infomation for layer_norm and rms_norm by @wizyoung in #220
(fix) fix pyproject.toml by @wizyoung in #226
Refactor/benchmarking visualizer by @S1ro1 in #212
Feat: add kl div to readme by @S1ro1 in #229
Monkeypatch for Qwen2-VL by @tyler-romero in #175
Optimize fused_linear_cross_entropy when weight does not require grads by @hansonw in #237
SWIFT Trainer Integration by @tastelikefeet in #240
Add label smoothing to FLCE and unit tests by @Tcc0403 in #244
Restore monkey patched modules by @austin362667 in #232
Support for patching post-model initialization by @shimizust in #199
Reduction support for CrossEntropy and Division by 0 Fix by @shivam15s in #153
Release Liger-Kernel version 0.3.0 by @qingquansong in #246

New Contributors

@austin362667 made their first contribution in #159
@tmm1 made their first contribution in #170
@S1ro1 made their first contribution in #172
@ryankert01 made their first contribution in #136
@kvignesh1420 made their first contribution in #187
@LucioPalmucci made their first contribution in #191
@momochen made their first contribution in #216
@wizyoung made their first contribution in #218
@IvanYashchuk made their first contribution in #217
@hansonw made their first contribution in #237
@tastelikefeet made their first contribution in #240

Full Changelog: v0.2.1...v0.3.0

Contributors

tmm1, hansonw, and 18 other contributors

Assets 2

29 Aug 22:36

yundai424

v0.2.1

e5d6ad7

v0.2.1

Patch Release

Fix bug in Gemma patch function that FLCE and CE are both true by default ruh roh

What's Changed

Bug fix for gemma: fused_linear_cross_entropy flag and cross_entropy flag are mutual exclusive by @JasonZhu1313 in #168
Add gemma 7b it benchmark by @JasonZhu1313 in #166
bump patch ver by @yundai424 in #171

Full Changelog: v0.2.0...v0.2.1

Contributors

JasonZhu1313 and yundai424

Assets 2

29 Aug 18:51

yundai424

v0.2.0

c6fb35e

v0.2.0 Release Note

Opening Thoughts 🫶

Thank You!

We'd love to take this chance to express our sincere gratefulness to the community! 2500+ ⭐ , 10+ new contributors, 50+ PRs, plus integration into Hugging Face 🤗, axolotl and LLaMA-Factory in less than one week since going open sourced is totally beyond our expectation. Being able to work together with all the cool people in the community is a bliss and we can't wait for further collaborations down the road!

Looking Ahead

We look forward to further enhancing our collaboration with the community, to work together on a lot of cool stuff -- support for more model families, squeeze out all optimization opportunities for kernels, and, why not, llama.triton? 😉

Get Involved and Stay Tuned

Please feel free to join our discord channel hosted in CUDA MODE server, and follow our repo's official account on X: https://x.com/liger_kernel !

Welcome Phi3 and Qwen2 🚀

This release ships with support for other popular models including Phi3 and Qwen2. All existing kernels in Liger repo can be leveraged to boost your training with models from these families now. Please refer to our API guide for how to use.

Even Easier API ❤️

Experimenting with different model families and tired of having if-else everywhere just to switch between kernel patching functions? You can now try out our new model-agnostic API to apply Liger kernels. Still a one-liner, but more elegant :) For example:

from liger_kernel.transformers import AutoLigerKernelForCausalLM

# This AutoModel wrapper class automatically monkey-patches the
# model with the optimized Liger kernels if the model is supported.
model = AutoLigerKernelForCausalLM.from_pretrained(...)

More Features

Support optional bias term in FusedLinearCrossEntropy (#144)
Mistral is now equipped with the humongous memory reduction from FusedLinearCrossEntropy now (#93)
Gemma is now equipped with the humongous memory reduction from FusedLinearCrossEntropy now (#111)

Bug Fixes

Fixed import error when using triton>=3.0.0 on NGC containers (#79)
Fixed the missing offset in Gemma RMSNorm (#85) oops
Added back missing dataclass entries in efficiency callback (#116)
There was some confusion on which Gemma do we support, we now support all! (#125)
Fallback to torch native linear + CrossEntropy when without label (#128)
Match the exact dtype up and downcasting in Llama & Gemma for RMSNorm (#92)
Address the bug that RoPE gets very slow when using dynamic sequence length (#149)

What's Changed

Updated test tolerances for H100 by @shimizust in #55
Update README.md by @lancerts in #58
Update benchmark result of Medusa for batch size = 6 setup by @JasonZhu1313 in #59
Add star graph by @shivam15s in #60
Add monkey patch for Qwen2 models by @chiwanpark in #69
Add pytest and datasets to dev dependencies by @chiwanpark in #68
Fix typos by @pchng in #77
Remove unused images in examples/medusa/docs/images/ by @pchng in #78
chore: update cross_entropy.py by @eltociear in #84
Fix incorrect import for triton 3 by @arvindsun in #79
update install from source guide by @yundai424 in #86
Fix Gemma RMSNorm by @davidgonmar in #85
Fix example bugs by @qingquansong in #88
Make tests passing on AMD GPU with 24GB ram by @helloworld1 in #90
modified: README.md by @leaf-soba in #91
pytest without need to dealing with PYTHONPATH by @helloworld1 in #95
Update test_cross_entropy.py by @lancerts in #94
Add FusedLinerCrossEntropy support for Mistral by @Tcc0403 in #93
Remove duplicate images by @qingquansong in #107
Add Qwen benchmarks by @shivam15s in #108
Fix Mixtral typo by @Tcc0403 in #109
Explicitly add dependencies in req.txt for medusa example by @JasonZhu1313 in #110
Add convergence tests and trainer integration test for Qwen2 by @Tcc0403 in #105
[Bug fix] Efficiency callback missing dataclass entries by @tyler-romero in #116
Monkeypatch for Phi3 by @tyler-romero in #76
Add FusedLinearCrossEntropy to Gemma by @Luke-Chesley in #111
Makefile command for env-report by @tyler-romero in #114
[WIP] Fix confusion on Gemma by @yundai424 in #121
[tiny] reformat code by @tyler-romero in #122
Revert "[WIP] Fix confusion on Gemma (#121)" by @yundai424 in #123
Fix gemma 1 and 2 support by @yundai424 in #125
Adding AutoLigerKernelForCausalLM by @shimizust in #115
fallback to torch native linear+CE when without label by @yundai424 in #128
Add code to save medusa heads and model by @JasonZhu1313 in #130
Add FusedLinerCrossEntropy support for Phi3 by @tyler-romero in #103
Add GPU CI support by @helloworld1 in #134
Make GPU CI optional until it is more stable by @helloworld1 in #141
Add gemma lightning example for single L40 GPU by @qingquansong in #120
feat: correct casts in RMSNorm to match references by @davidgonmar in #92
Bias for fused linear cross entropy by @davidgonmar in #144
Rerun FLCE benchmark after bias added by @ByronHsu in #148
updated sl to be non-constexpr by @AndreSlavescu in #149
update readme to use absolute paths by @shaoruu in #157
fix convergence test, phi3 import and update benchmark by @yundai424 in #155
bump lowest HF version by @yundai424 in #158
Add missing tf_keras to req.txt by @JasonZhu1313 in #161
Re-enable GPU CI enforce by @helloworld1 in #142
Bump package ver by @yundai424 in #163
Update version in setup.py to 0.2.0 by @yundai424 in #164

New Contributors

@chiwanpark made their first contribution in #69
@pchng made their first contribution in #77
@eltociear made their first contribution in #84
@arvindsun made their first contribution in #79
@davidgonmar made their first contribution in #85
@leaf-soba made their first contribution in #91
@Tcc0403 made their first contribution in #93
@tyler-romero made their first contribution in #116
@Luke-Chesley made their first contribution in #111
@AndreSlavescu made their first contribution in #149
@shaoruu made their first contribution in #157

Full Changelog: v0.1.1...v0.2.0

Contributors

helloworld1, pchng, and 17 other contributors

Assets 2

23 Aug 05:25

ByronHsu

v0.1.1

b418557

v0.1.1: Add readme on pypi

What's Changed

Fix unwanted scale/bias while testing and simplify _test_memory function by @shivam15s in #50
Update README by @JacobHelwig in #44
Added metadata for PyPI and bumped version by @shimizust in #52
Replace model / data with public HF path, update readme by @JasonZhu1313 in #53

New Contributors

@JacobHelwig made their first contribution in #44

Full Changelog: v0.1.0...v0.1.1

Contributors

JasonZhu1313, shivam15s, and 2 other contributors

Assets 2

20 Aug 18:50

shimizust

v0.1.0

27d2d51

v0.1.0: First Public Release

What's Changed

Update PR template and contribution guide by @lancerts in #20
Add GeGLU and updage readme by @yundai424 in #3
Added CI workflow with checkstyle job by @shimizust in #27
Create bug_report.yaml and feature_request.yaml by @lancerts in #29
Update feature_request.yaml and bug_report.yaml by @lancerts in #30
update gif by @zain-merchant in #31
Add lightning trainer and HF trainer fine-tuning example by @yundai424 in #17
use correct fsdp act ckpt & redo benchmark by @ByronHsu in #32
Update README.md by @ByronHsu in #33
Update README.md with Kernel descriptions by @qingquansong in #34
remove mfu and non used methods by @zain-merchant in #35
Byhsu/readme 3 by @ByronHsu in #37
Zain/singletest by @zain-merchant in #38
Add deepspeed to lightning example by @yundai424 in #36
Update README.md by @lancerts in #39
improve rms norm code quality by @ByronHsu in #43
Refactored convergence tests to be portable by @shimizust in #41
Added more generic monkey patch function by @shimizust in #42
Remove override dependency by @shivam15s in #45
Changed pointer variable names for clarity for SwiGLU by @zain-merchant in #46
Update CONTRIBUTING.md by @lancerts in #47
Release version 0.1.0 by @shimizust in #49

New Contributors

@shimizust made their first contribution in #27
@shivam15s made their first contribution in #45

Full Changelog: v0.0.1...v0.1.0

Contributors

lancerts, zain-merchant, and 5 other contributors

Assets 2

15 Aug 20:41

ByronHsu

v0.0.1

aebb8f6

v0.0.1 pre release Pre-release

Pre-release

What's Changed

Update Readme.md by @lancerts in #1
Update Readme.md by @lancerts in #8
Added requirements from licensing team including notice and contributing by @zain-merchant in #5
Test GitHub PR setting by @ByronHsu in #12
Update README.md with bib cite by @qingquansong in #13
Create pull_request_template.md by @lancerts in #9
Update README.md bib by @qingquansong in #15
Update pull_request_template.md by @lancerts in #14
Updated readme gif by @zain-merchant in #18
make fused linear+CE default for llama by @yundai424 in #22
create rms norm tensor at input.device instead of device 0 by @ByronHsu in #21
Add medusa patching code and example job with memory efficient Liger Kernel by @JasonZhu1313 in #11
forward compatibility with triton 3.0.0 for tanh by @yundai424 in #24
ignore e203 in flake8 to resolve black conflict by @ByronHsu in #25
src directory polishing by @ByronHsu in #23
Polish test/ and others by @ByronHsu in #26

New Contributors

@lancerts made their first contribution in #1
@zain-merchant made their first contribution in #5
@qingquansong made their first contribution in #13
@yundai424 made their first contribution in #22
@JasonZhu1313 made their first contribution in #11

Full Changelog: 0.0.2...v0.0.1

Contributors

lancerts, zain-merchant, and 4 other contributors

Assets 2

Releases: linkedin/Liger-Kernel

v0.3.1: Patch Release

Summary

What's Changed

New Contributors

Contributors

v0.3.0 Release Note

Opening Thoughts

What's New

🌐 Large Vision Language Model Support

✨ Patch Kernels on Model Instances

🚀 SWIFT Trainer Integration

🔧 New Kernels and Features

Get Involved and Stay Tuned

A Look Ahead

🌟 Acknowledgments

What's Changed

New Contributors

Contributors

v0.2.1

Patch Release

What's Changed

Contributors

v0.2.0 Release Note

Opening Thoughts 🫶

Thank You!

Looking Ahead

Get Involved and Stay Tuned

Welcome Phi3 and Qwen2 🚀

Even Easier API ❤️

More Features

Bug Fixes

What's Changed

New Contributors

Contributors

v0.1.1: Add readme on pypi

What's Changed

New Contributors

Contributors

v0.1.0: First Public Release

What's Changed

New Contributors

Contributors

v0.0.1 pre release

What's Changed

New Contributors

Contributors