v0.4.7
What's Changed
- Apply prompt style for tp.py and sequentially.py by @Andrei-Aksionov in #1629
- Fix prompt docstring in Python API by @rasbt in #1635
- Update windows cpu-tests.yml by @rasbt in #1630
- Remove NumPy < 2.0 pin by @rasbt in #1631
- Fix kv-cache issue in Python API streaming mode by @rasbt in #1633
- Updates installation requirements to install minimal required packages for basic use by @rasbt in #1634
- Faster safetensors conversion when downloading model by @awaelchli in #1624
- Add Sebastian as code owner by @awaelchli in #1641
- Add missing super() call in data modules by @awaelchli in #1639
- Update Lightning version to 2.4.0 pre by @awaelchli in #1640
- Add tunable kvcache with error handling for nonsense inputs. by @apaz-cli in #1636
- Use Python API in serve code by @rasbt in #1644
- Fix autodownload + conversion issue by @rasbt in #1645
- Properly clear kv-cache by @rasbt in #1647
- Fix error raising where max_returned_tokens > max_seq_length_setting by @rasbt in #1648
- Add quantization support to litgpt serve by @rasbt in #1646
- Bump for 0.4.7 release by @rasbt in #1649
Full Changelog: v0.4.6...v0.4.7