forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 50
Pull requests: HabanaAI/vllm-fork
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[New Feature][Habana main] spec decode PR2 - Medusa, MLP, Eagle
#461
opened Nov 5, 2024 by
xuechendi
Loading…
Resolved alibi bias issue due to porting flat PA pr
#437
opened Oct 28, 2024 by
tannervoas742
Loading…
[PoC] Add max padding ratio to padding aware scheduler
#407
opened Oct 18, 2024 by
kzawora-intel
•
Draft
WA for OOM in qwen 2 - sync after loading weights
#398
opened Oct 16, 2024 by
michalkuligowski
Loading…
[bucketing overhaul 2/n] Delegate bucket management to HPUBucketingContext
#395
opened Oct 15, 2024 by
kzawora-intel
Loading…
[New Feature][Habana-Main] speculative_decoding HPU support
#375
opened Oct 8, 2024 by
xuechendi
Loading…
Add bucket calibration, allow reading/writing bucketing configs to file
#345
opened Sep 27, 2024 by
kzawora-intel
Loading…
Optimize LoRA mask creation
habana
Issues or PRs submitted by Habana Labs
#285
opened Sep 13, 2024 by
SanjuCSudhakaran
•
Draft
[build] Changes for RH build
external
Issues or PRs submitted by external users
#190
opened Aug 15, 2024 by
Xaenalt
Loading…
ProTip!
no:milestone will show everything without a milestone.