Lot's of accuracy improvements! A number of models are behaving closer to how they behave in Transformers, and a new internal configuration has been added to allow for more ease of use!
What's Changed
- fix the bug that attention_mask and past_kv_cache cannot work together by @yzhhr in #772
- Set prepend_bos to false by default for Bloom model family by @degenfabian in #775
- Fix that if use_past_kv_cache is set to True models from the Bloom family produce weird outputs. by @degenfabian in #777
New Contributors
- @yzhhr made their first contribution in #772
- @degenfabian made their first contribution in #775
Full Changelog: v2.8.1...v2.9.0