Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 579337372e694d900a4e01899b81fe0afcf82c10
Merge: e7044a61c 30b0f2156
Author: Max de Bayser <[email protected]>
Date:   Tue Oct 8 10:34:17 2024 -0300

    Merge branch 'bert' into roberta_embedding

    Signed-off-by: Max de Bayser <[email protected]>

commit 30b0f2156bccbfb11def0d7902acb8b56d24a98a
Merge: 80c18855f 8c746226c
Author: Max de Bayser <[email protected]>
Date:   Tue Oct 8 10:33:05 2024 -0300

    Merge branch 'upstream_main' into bert

commit 8c746226c956f7c8a4672689fee91c7d22befed6
Author: Brendan Wong <[email protected]>
Date:   Mon Oct 7 22:51:43 2024 -0700

    [Frontend] API support for beam search for MQLLMEngine (#9117)

commit 80c18855fcff195175b7046923c4b0c3815f141a
Author: laishzh <[email protected]>
Date:   Mon Oct 7 12:04:34 2024 +0800

    feat: update with origin/main

commit 6440795f407c652ecdb045d1b141913afdb8b5e1
Merge: 04b0bc6ff 487678d04
Author: laishzh <[email protected]>
Date:   Mon Oct 7 11:28:19 2024 +0800

    Merge branch 'origin/main'

commit 04b0bc6ff534495a9627f5548767f5bfb95268e8
Author: laishzh <[email protected]>
Date:   Mon Oct 7 02:54:55 2024 +0800

    feat: revert embedding_block_manager

commit 352d8b2641d11ffa0e153462fd89b54525998843
Merge: 3fbfdf429 107d9c207
Author: laishzh <[email protected]>
Date:   Mon Oct 7 00:45:52 2024 +0800

    Merge remote-tracking branch 'maxdebayser/bert'

commit e7044a61cebf6b9229a50a8396fdef104e799a9e
Merge: a14b4e39d 107d9c207
Author: Max de Bayser <[email protected]>
Date:   Wed Oct 2 18:04:38 2024 -0300

    Merge branch 'bert' into roberta_embedding

commit 107d9c207808c6f070ef086e3ea748cecbc9d809
Merge: 57bdd6049 7f60520de
Author: Max de Bayser <[email protected]>
Date:   Wed Oct 2 17:52:52 2024 -0300

    Merge branch 'upstream_main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit a14b4e39d26eb953c569ebb219aa3cb7203699ec
Merge: 08f1781d6 57bdd6049
Author: Max de Bayser <[email protected]>
Date:   Thu Sep 26 17:25:28 2024 -0300

    Merge branch 'bert' into roberta_embedding

    Signed-off-by: Max de Bayser <[email protected]>

commit 57bdd6049129b43244d3c70ea876e784762e96e9
Merge: 2c8a5b922 7193774b1
Author: Max de Bayser <[email protected]>
Date:   Thu Sep 26 17:15:18 2024 -0300

    Merge branch 'upstream_main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit 3fbfdf42966c7324466e266dc6d4b5c26131aee5
Merge: 2c8a5b922 873edda6c
Author: laishzh <[email protected]>
Date:   Thu Sep 26 23:23:39 2024 +0800

    Merge remote-tracking branch 'origin/main'

    # Conflicts:
    #	vllm/inputs/data.py

commit 08f1781d6bd49653bd62ffdfde4f86d903f0c65a
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 17:04:35 2024 -0300

    add head size 32

    Signed-off-by: Max de Bayser <[email protected]>

commit 2c8a5b9224ce9e26b2e43bb2312be91e2c74de9c
Merge: 15be7fa8b f2bd246c1
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:48:10 2024 -0300

    Merge branch 'main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit 30c875e9e61f1e9e4d556014f49362adff76269a
Merge: afd997ba9 464a90f4e
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:59:23 2024 -0300

    Merge branch 'bert' into roberta_embedding

commit 464a90f4e09165ab724de26b35e9d7913c5d6560
Merge: 15be7fa8b f2bd246c1
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:48:10 2024 -0300

    Merge branch 'main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit afd997ba9f6ec2513145c0ca469a15783e0c96e5
Merge: 7d0ecb90c 15be7fa8b
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:14:29 2024 -0300

    Merge branch '5447' into roberta_embedding

commit 15be7fa8bce185f64fafecaabdb8c828e83f4ad8
Author: laishzh <[email protected]>
Date:   Mon Sep 9 23:04:44 2024 +0800

    feat: fix lint

commit 0ea4da1c549bf35c8456c47729da46dd33481cac
Author: laishzh <[email protected]>
Date:   Mon Sep 9 23:01:22 2024 +0800

    feat: fix lint

commit 776dcbdae9d693dbd6546b7784712c06e6ef473c
Merge: 3ff2d3637 4ef41b847
Author: laishzh <[email protected]>
Date:   Mon Sep 9 10:32:46 2024 +0800

    Merge branch 'main' of https://github.com/vllm-project/vllm

    # Conflicts:
    #	vllm/core/embedding_model_block_manager.py

commit 3ff2d36375d9560f87c56860ffff8a774a217cf9
Author: laishzh <[email protected]>
Date:   Mon Sep 9 10:29:01 2024 +0800

    feat: some changes on test_embedding.py

commit e351bfd0febe4bbf8030fcd07f39eef5cce97641
Author: laishzh <[email protected]>
Date:   Sun Sep 8 23:50:18 2024 +0800

    feat: bert embedding implemented, but still have some bugs with mistral,

commit 7d0ecb90c5034d41f0d9b38eede25f50bf941e3d
Author: Max de Bayser <[email protected]>
Date:   Wed Aug 28 16:35:25 2024 -0300

    Add support for Roberta embedding models

    It's almost identical to the Bert models

    Signed-off-by: Max de Bayser <[email protected]>

commit 612cf1a969fa46105c3685b2eb025cde6416747d
Author: laishzh <[email protected]>
Date:   Tue Aug 27 15:19:50 2024 +0800

    feat: modify test_embedding

commit fc1f2b7ceb69f9588799820831145babf29aaa64
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:39:33 2024 +0800

    chore: fix lint

commit d09860763500b85193230588386f0e3d515e231c
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:24:51 2024 +0800

    feat: remove embedding_model_block_manager.py

commit 37f698b4241a42c9634030e372e419b47e2a1e9c
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:16:34 2024 +0800

    feat: move BertEmbeddingModel to the end of file

commit 6f006f5ad698d76599e0b005520e65921042d07b
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:06:21 2024 +0800

    chore: fix lint

commit bfd7ec9e043cf304e6dea024912eb2a18c786bd6
Author: laishzh <[email protected]>
Date:   Mon Aug 19 14:59:06 2024 +0800

    feat: model input

commit 8b107a24a4ef9abb194686066c3bebc6923c6876
Author: laishzh <[email protected]>
Date:   Mon Aug 19 13:41:49 2024 +0800

    feat: fix lint

commit e15d0cce60e3f39f2aaf8c3f62314a6d6b4ea091
Merge: b76da51c0 f710fb526
Author: laishzh <[email protected]>
Date:   Mon Aug 19 12:45:26 2024 +0800

    Merge branch 'main' into main

commit b76da51c0d9ba1b4e39d432b8fb557ed8319034f
Author: laishzh <[email protected]>
Date:   Mon Aug 19 11:35:22 2024 +0800

    feat: enc_dec_runner base

commit b99d783bd852eb4cae228fcd8faf3344cd9a6fed
Author: laishzh <[email protected]>
Date:   Sun Aug 18 00:49:57 2024 +0800

    feat: remove embedding block space manager

commit 7e1196d25054d76d92b3777bc077d3cffd742599
Author: laishzh <[email protected]>
Date:   Sat Aug 17 14:43:32 2024 +0800

    fix: fix hint

commit ce9a599194dbc3a208a6a4a21fdccaaa5c26ece8
Author: laishzh <[email protected]>
Date:   Sat Aug 17 02:18:54 2024 +0800

    feat: bos_token_id

commit 275f49de32136eb9e4298d42aa85a1e2dc56924c
Author: laishzh <[email protected]>
Date:   Sat Aug 17 01:03:55 2024 +0800

    feat: embedding model prompt

commit 0b3f55c66e5eb40808f46ebde3c38213478050c7
Author: laishzh <[email protected]>
Date:   Fri Aug 16 15:12:51 2024 +0800

    feat: fix lint

commit 91e23d8ad2b45790590889d6ee437702f5003792
Author: laishzh <[email protected]>
Date:   Fri Aug 16 15:04:30 2024 +0800

    feat: fix lint

commit 7657af3f49cdb567bc96b44157c89f18cc4d0a22
Author: laishzh <[email protected]>
Date:   Fri Aug 16 15:01:26 2024 +0800

    feat: fix lint

commit f2158848b9abd839c515c568acd592d0416c6682
Author: laishzh <[email protected]>
Date:   Fri Aug 16 11:21:54 2024 +0800

    chore: recover

commit a0ad0df28c9de89bdd66b587502f6af9265065be
Author: laishzh <[email protected]>
Date:   Fri Aug 16 11:15:28 2024 +0800

    chore: recover unchanged files

commit 872e79531b39d1bf12ea81ddcd5bf919dd97265d
Author: laishzh <[email protected]>
Date:   Thu Aug 15 21:40:55 2024 +0800

    feat: embedding model forward

commit 682c455bb0b8c950e1e00b43a6841f433f62db97
Author: laishzh <[email protected]>
Date:   Thu Aug 15 14:36:40 2024 +0800

    feat: recover sequence

commit aca786e4359ef55d0af006199728c8b941558579
Author: laishzh <[email protected]>
Date:   Thu Aug 15 13:44:03 2024 +0800

    feat: default bos_token_id of encoder model

commit 76b47fb1b7920fb50a889f19e1c1421e4385d1ca
Author: laishzh <[email protected]>
Date:   Thu Aug 15 13:18:53 2024 +0800

    chore: recover

commit 37bcba01408d37b192063e2ee2b9ac1c3087393c
Author: laishzh <[email protected]>
Date:   Wed Aug 14 17:47:05 2024 +0800

    feat: full pipeline

commit 63fb7a582cef08ec29a8b30024a01602dc5ee636
Author: laishzh <[email protected]>
Date:   Wed Aug 14 02:39:31 2024 +0800

    WIP: bert embedding

commit 53c5148e9f5024f2eb6a83bbf7af191dc88fe555
Author: laishzh <[email protected]>
Date:   Tue Aug 13 16:11:53 2024 +0800

    (WIP)feat: EmbeddingModelRunner support encoder model

commit 12a9869b5324fa9a4f7090eb8967c81f47f87f75
Merge: 59bf8c44d 97a6be95b
Author: laishzh <[email protected]>
Date:   Tue Aug 13 11:22:44 2024 +0800

    Merge remote-tracking branch 'origin/main'

    # Conflicts:
    #	.buildkite/test-pipeline.yaml
    #	examples/offline_inference_encoder_decoder.py
    #	tests/conftest.py
    #	tests/core/test_scheduler_encoder_decoder.py
    #	tests/kernels/test_encoder_decoder_attn.py
    #	tests/models/test_bart.py
    #	tests/worker/test_encoder_decoder_model_runner.py
    #	vllm/core/scheduler.py
    #	vllm/engine/llm_engine.py
    #	vllm/inputs/__init__.py
    #	vllm/inputs/data.py
    #	vllm/model_executor/models/bart.py
    #	vllm/sequence.py
    #	vllm/utils.py
    #	vllm/worker/enc_dec_model_runner.py
    #	vllm/worker/worker.py

commit 59bf8c44dd79c832a37949d0698bacef6ecc2136
Merge: a40828921 a936faa57
Author: laishzh <[email protected]>
Date:   Thu Jul 25 23:02:34 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit a936faa57000aca5be159de260fae8c8849148b6
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:52:50 2024 -0400

    removed prefix caching from enc/dec modelrunner

commit 4bb7fc442f67dd162a001900e485d02d64fa24ed
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:45:03 2024 -0400

    removed chunked prefill logic/docstring text from enc/dec modelrunner

commit f0abcc27e642dda6371eb1440de519166642a9e7
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:37:45 2024 -0400

    format

commit d1751db42bac1baf50b5fa542c770fbab13ba9ff
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:35:45 2024 -0400

    removed flashinfer references from enc/dec modelrunner

commit 64685acfe52177d1e01362ece71d3faab73e8e45
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:13:44 2024 -0400

    Sequence docstring

commit 035d90dfc21bbc12d12d2368a2d5d5175ead31ca
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:01:31 2024 -0400

    updated RequestOutput docstring

commit 1bb7ad9f2f5e4c84e283c5c0c59006d817440609
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:59:34 2024 -0400

    updated RequestOutput docstring

commit 47c5548936cd7bfe476d31e8248e3208a8a663d1
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:53:23 2024 -0400

    checked out examples/offline_inference.py from main

commit 3327e5be3b07bc35a607a1f4fa1fba2fc4f5904e
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:49:44 2024 -0400

    removed lora & vision & mm code from enc/dec modelrunner

commit 175ea95baf0537209a8aa0e9c94f711f794f0f51
Merge: c2cc010ac 316a41ac1
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:25:53 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c2cc010acc1bb632bb7297da970ff865b22c7f27
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 01:33:04 2024 -0400

    Removed lora from enc/dec model runner

commit fb5a2bcb2baa984b884ba8bdd6293dd06cb8756b
Merge: 393515eb0 9e169a4c6
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 00:52:21 2024 -0400

    upstream merge

commit 393515eb07a84c3d1604f0c0bc52eb2d8f7c5ae0
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 00:50:27 2024 -0400

    formatting

commit 47b4eb2a06bf0811f143668fbfe1f8c2caedc827
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 00:50:08 2024 -0400

    fixed bug caused by upstream refactoring

commit bed9bcd356c3526f5697ddfc2052d5bfca5fa9d2
Merge: 0af58ec10 740374d45
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 21:04:09 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 0af58ec10ac6eb9cab3f78abfa62390ade9ca64c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 05:10:20 2024 -0400

    responses to feedback

commit d82b27346b444778eeba42e015ac716883c37f76
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 05:01:27 2024 -0400

    enc/dec example comments'

commit 4b5b2cf956141e3adbc22a7a2aa2ebbb9bad8979
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:51:48 2024 -0400

    removed unnecessary argument reordering

commit ed4a56b9ca31cdf06033611887114920318ad397
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:46:49 2024 -0400

    formatting

commit 5a270ff49f3ebafecf8fb45e090f08d705aa416a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:46:32 2024 -0400

    refactoring

commit 02114bdcd5a832c3610318a8d0b8cfb26070f3ef
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:31:32 2024 -0400

    _free_seq_group() -> _free_seq_group_cross_attn_blocks()

commit be58d8ab92fd4ddab1f48b246a5233ee3a71bcf0
Merge: c493d4029 ccc4a7325
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:20:18 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c493d402929d023a0924018a928502cb05605a2f
Merge: f36ffb569 5e8ca973e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 00:34:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit f36ffb5695b0694947f4ae9e7417cc1afa85e19c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 00:33:47 2024 -0400

    example includes prompt zipper

commit 61d2ad2cc7791b6e32c8678b8e88ed99bbab4118
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 00:28:20 2024 -0400

    fixed bugs in handling non-text formats for individual prompts

commit dd784b5423ba21fc6b8188908df417d128376a1f
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 21:37:19 2024 -0400

    typing fix

commit 0b29fd27f17f2751550262f218e6ef1afbef7087
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 21:35:25 2024 -0400

    enc/dec handles empty str and None decoder prompts correctly

commit aa01d71f90f0c3cda8a7ea419ff4f1fb6dc9d13c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 20:56:51 2024 -0400

    empty-string decoder input is now handled for encoder/decoder

commit 4a6e39e67c2bb4c2d685df9031cbf64956be4255
Merge: 7e7bbd9e1 87525fab9
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 20:16:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 7e7bbd9e16900449e350bf8634d584e4b1a5c2f0
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 16:57:41 2024 -0400

    deleted unnecessary dependency

commit 229847b431469bd17b2d13f3651b322c7b280274
Merge: 059273f3c 1bedf210e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 16:56:27 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 059273f3ca43947413572a0014c1437a53e33b8a
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 16:56:07 2024 -0400

    wip

commit b283544d820bfd96ac80845d2ddd7ad057cca6e9
Merge: 48a742d41 b01937f0c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 04:15:18 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_correctness' into infra_enc_dec_model_runner_reviews

commit 48a742d4155cba0ffc7effb1c9fdad0706493c43
Merge: 427032a08 bb2fc0807
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 04:15:03 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit b01937f0ce29bc9e417e85cb4dd18ddb47a98e3b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 04:14:06 2024 -0400

    set up None/empty str tests which are not passing

commit c51a1682be7443ec7d32062491868bd49c631eb8
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 01:47:43 2024 -0400

    fixed bug in how conftest was handling HF encoder/decoder outputs; disabled HF engram repeat checks

commit 427032a085cd48701f7abf64518563929a844d6c
Merge: 14831b09d fea59c771
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 17:14:13 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 14831b09da05f6d8e689568c77f7dfc5c33895ab
Merge: c43a6ed19 b90b6b6ff
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 13:52:34 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner

commit b90b6b6ffb4417ec64b382e9211273bca1eebbb7
Merge: b174c7ab2 739b61a34
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 13:51:35 2024 -0400

    upstream merge

commit a40828921c18faf70f4239d90e599da4311b284e
Merge: 7ace684da c43a6ed19
Author: laishzh <[email protected]>
Date:   Mon Jul 22 19:00:06 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit c43a6ed191e76f81bfd27f25e2ca8bac1fc01bcc
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 04:03:59 2024 -0400

    commented out BART TP=4

commit b174c7ab2da60e24a2ca576eccee671541ae142a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 04:02:56 2024 -0400

    bart is parallelized, modulo an unfortunate hack for QKVParallelLinear in cross-attention

commit 3551b6bf56ab74228c923b698e59a88b06bac81c
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:59:22 2024 -0400

    fixed bug where underlying Attention was constructed using full head-count

commit fdf71de8557d588ff3b5767e96df09de4e9278d5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:48:35 2024 -0400

    parallelized enc/dec cross-attention, using a slight hack

commit 9bbed43ab159063a8dff27587dae909b11e1a703
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:20:20 2024 -0400

    parallelized LM head

commit 74abe22287374c9dd801ef059692016ef09777cb
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:01:07 2024 -0400

    encoder attention & decoder self-attention parallelized

commit e5bb9de596bd7f4b5d85ab6d0a2440cae06f982a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 02:33:02 2024 -0400

    all attention layer output linears are parallelized

commit fb3227f68714ba6ed00e67e8a242db88288cdb8e
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 02:25:12 2024 -0400

    parallelized BART learned positional embedding

commit 00198a633605b786c5f1fdef007c965d6284b39b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 02:22:01 2024 -0400

    BART MLPs parallelized

commit abbb42749a628f5d199b62046200a6eb85025ab8
Merge: a33b50171 a16cabb90
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:54:59 2024 -0400

    Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_runner_parallel_bart

commit a16cabb9029d86221a69975935622dd53084a554
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:54:22 2024 -0400

    equalized some generation/sampling config settings between enc/dec HF,vLLM, nonetheless still not perfect match

commit a33b50171b6147ad1ff3db16adef4bb3a7819b33
Merge: 584c01e87 32967c1ca
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:35:22 2024 -0400

    Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_runner_parallel_bart

commit 32967c1ca7d706f1e59cbd604b58588210aeeee3
Merge: c00e0a8b5 89c1c6a19
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:30:53 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c00e0a8b561a8243080ef40b1c1b8f0b8257d026
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 00:28:29 2024 -0400

    CommonMetadataBuilder sets block_tables constructor arg of metadata

commit a22f56c8bbb1dde2bd3a440bb0c037ed73ca17e1
Merge: ffa99b2dd 42de2cefc
Author: Andrew Feldman <[email protected]>
Date:   Sun Jul 21 22:28:38 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit ffa99b2dd61cfe21222a98ed2f95d608d6f6a8a2
Merge: 41ccf0c8c 9364f74ee
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 20 16:08:20 2024 -0400

    additional merge

commit 41ccf0c8ce9079a89ace594a3a0f2eb573c2d6c0
Merge: 9fdd04705 a5314e869
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 20 16:06:16 2024 -0400

    wip merge

commit 7ace684da139b43f38a4ebc328e17056ebfbe18a
Merge: fe7786c8a c092ed476
Author: laishzh <[email protected]>
Date:   Fri Jul 19 00:27:56 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit 584c01e875e12d870312ab210dec809325482ae3
Merge: 69f0379d2 9fdd04705
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 16:59:40 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner_parallel_bart

commit 9fdd0470597025057a473eb8e20946f71db54daf
Merge: c092ed476 5f0b9933e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 16:59:18 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 69f0379d24323958dd9b332884f7c57a222acfc6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:23:42 2024 -0400

    wip:

commit d7bd617c84880f477a0ce7ae3d1de1215e26748f
Merge: 31e335fd2 c092ed476
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:13:04 2024 -0400

    Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_runner_parallel_bart

commit c092ed47621f9061395ea3e89386c997f856c6b3
Merge: 949ac02c5 2fa4623d9
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:09:14 2024 -0400

    merged in upstream changes; left some formatting issues which I expect to be fixed upstream

commit 31e335fd206985f5b3791b6a3cfaa021d21d3629
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:03:58 2024 -0400

    wip activation parallelization

commit 88c058e8fe5ae00b39f88f57be745d1b819dbca5
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 12:23:31 2024 -0400

    wip parallelizing BART

commit 949ac02c5694069edf3338b2202717dffda276e6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 11:18:01 2024 -0400

    formatting

commit 6c940f886950ba0ae77ccb9002a161cf95b686ad
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 11:00:34 2024 -0400

    modified HF behavior in BART test to be truly greedy

commit f15eacf140810512335a7ac422b09788a1c1964e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 10:55:46 2024 -0400

    wip

commit 180884605ffd911c553c6b2585c2993204e4a629
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 09:34:42 2024 -0400

    formatting

commit 1f8c52fac27ed8f10b94a3ecb08e15c1118c186a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 09:34:29 2024 -0400

    tweaks to enc/dec example

commit 9da8fb3ef77b64c0152e3699513053e1ea4e21a4
Merge: 94c904fb5 a9a2e74d2
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 09:24:19 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 94c904fb5ff01f7e1c93b8d4a5f195ca2bea5bc0
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 08:43:16 2024 -0400

    wip parallel bart but encountering GPU count issue

commit 9f5a02c21e785704114f8c15bb829f4fe4cded55
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 08:27:53 2024 -0400

    RequestOutput & SequenceGroup now include encoder prompt in output, as does encoder/decoder example.

commit 597a07da54fa4c399e42bccbb4a14957d782e37c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:59:42 2024 -0400

    refactor

commit f54f2762f4b4d14165371e3dfc300f1ef3afa9b6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:53:12 2024 -0400

    wip refactoring

commit cac6283f60f1edc55950eaae54e74db0902ebfd8
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:25:58 2024 -0400

    added encoder/decoder example to examples test

commit b277180575d7d9c85708e2622cc6c32afbc0a383
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:17:40 2024 -0400

    formatting

commit 50ad5ffc753d1e7b39dfd55822ac0e405533168d
Merge: ef9462321 e09ce759a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:16:28 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit ef94623218a718a437526917a8c95e933d614ee9
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:16:10 2024 -0400

    added examples utils w/ context manager for backend override; applied to enc/dec example to force XFormers

commit aee5f1615347dcfe2acea9abe16ac61df3404a99
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 06:14:51 2024 -0400

    fixed sequence bug

commit 3656dc6c843cbf41b99ab4b0c88a974d1cedba2e
Merge: 0cc14abc5 5fa6e9876
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 05:23:05 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 0cc14abc5a5569c6ae641c5d3efc0251fd946507
Merge: 1c6e06d0b 10383887e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 02:10:34 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 1c6e06d0be66bf8cbf98cc8429a060b60bb65700
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 02:10:12 2024 -0400

    bugfix

commit 31127faf0c4637c6b80540c9693c7d5f135416d5
Merge: c2ff615de 1d094fd7c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:48:22 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c2ff615deebea4457721a457103d8e405346b1a5
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:44:16 2024 -0400

    format

commit f8dd4a5955ec478720531c47945ddc26e450f743
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:43:52 2024 -0400

    fixed scheduler bug

commit ef80c85f7dd3febc9c76c793427c444f9e62caa6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:35:57 2024 -0400

    wip

commit 03aea187652fc0418d9a66f7eb5af6bc53c9e535
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:34:45 2024 -0400

    wip

commit 16c9aa2278e7f9d9b5f5ccffb085b0142a7e20ec
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 22:36:44 2024 -0400

    bugfix

commit 159c7bcf47aa86e4abbd88ad72a34e196c56626e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 21:58:15 2024 -0400

    fixed decoder-only bug

commit aea8d34385a64d6e6efa87729fee8fa4c4f15818
Merge: 713d095b4 7f62077af
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 21:09:06 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 713d095b4036404f4580225720da17d7d4e431cb
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 14:49:17 2024 -0400

    incorporated encoder sequence into request-add functionality

commit 87ed3b6fe380f75ebdafd3bc4da003b42802c18c
Merge: 97d81f0a5 94162beb9
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 14:17:29 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 97d81f0a53506cf6292f24117e8ecbfca5803805
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 14:17:09 2024 -0400

    encoder/decoder input processing; formatting

commit e534ffc156479d1b4dbec905ccc0877b746cc068
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 13:25:27 2024 -0400

    wip

commit 3c7e19d3d0e4c53ca363f40712fe2df160be1d9e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 10:44:23 2024 -0400

    zip enc/dec prompts; formatting

commit 850a97e812662645452989341eb44b79aa4b3276
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 10:25:38 2024 -0400

    bart parallel vocab

commit 42ac66b469891ba3085eaa1265c2bd9d445e0839
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:59:04 2024 -0400

    VllmRunner encoder/decoder methods

commit 796d7a3e7f8a67b644f6a88446e4162a09a1fbac
Merge: 374880f71 7508a3dc3
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:55:37 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 374880f71d6f81bd2a933b237ff6fa46e0324e6b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:49:30 2024 -0400

    input preparation now includes encoder-oriented input setup:

commit c5846ac9b31777d131bb0e3af2ad62a74eab1978
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:40:46 2024 -0400

    Hfrunner greedy logprobs limit

commit 92d9f486b2455ff5ea5215eb61b9cb1e375b17ff
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:33:41 2024 -0400

    conftest: encoder/decoder example prompts

commit 54ff1420cac3edccff6c751e4930f7fa1b3be247
Merge: ddaf0ade2 7a3d2a5b9
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:28:46 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit ddaf0ade21142daafc504df83e15d31911dee497
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:28:21 2024 -0400

    wip

commit 914134749aee12e273f38273ed4cfda866ec837f
Merge: 251f899ea ec9933f4a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 16:33:24 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 251f899ea158af33ffe1367c57137ac9ed9212ad
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 16:33:10 2024 -0400

    wip

commit f85997b4bb63352fc1bad72b54eea358f89ec5b0
Merge: 46397c74e 64fdc08c7
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 13:30:57 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 46397c74e7c094d86d4f49fc3230cb92985d5fc5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 13:30:21 2024 -0400

    wip

commit 336a77d62d2d31a2ed6c9eba9e36190b50cca713
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 09:34:47 2024 -0400

    formatting

commit 8dccaa510a67e8de71811c13371468024843b71d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 09:34:14 2024 -0400

    correctly constructing enc/dec sequences

commit dd4031c8e3201ee2e874e40df69c1bd52e7c54be
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 09:11:34 2024 -0400

    wip but having wllm.commit_id error

commit 552551137b19a9e9c2ebc13856c8e5a22834ae1b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 08:51:18 2024 -0400

    Sequence may be constructed with encoder/decoder LLMInput configurations

commit 7b0803b1bb9fbf222be2b719729b3494ade79087
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:41:25 2024 -0400

    formatting?

commit 304caed04dcbc25b76d8e80321da00414ac7dc17
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:36:33 2024 -0400

    formatting

commit 6c953808f11122a0c5482786b41825a79788a9a4
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:25:01 2024 -0400

    wip engine is_encoder_decoder() setting

commit 78d3d3c00d30af324dbd1ca0973c1dd68d4cdb5b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:20:50 2024 -0400

    modified LLM.generate() error message

commit 10ed7145053546d2112ed98252dc46f782a04b72
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:18:13 2024 -0400

    Format

commit 83c5c43dd6e06d13d9d05c01882b6d705a5aefaa
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:14:34 2024 -0400

    prompt type checks

commit 94c083cabff971da175eca616ff4b2c94299573b
Merge: 64d71980c 0cca1646d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:00:30 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner

commit 0cca1646dce64fbdf2419b7f075e15da6264ee84
Merge: db5539a85 6ae1597dd
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:00:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 64d71980c823c167239d5c7338096a144586b7f3
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:59:49 2024 -0400

    wip

commit ff940f7adf771465e92a6fad350fb2f1aca4f694
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:18:58 2024 -0400

    formatting

commit 8b8d9812f7b7317448d4872db32cffcb45444c02
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:17:41 2024 -0400

    refactored AttentionType and related imports; skip BART test definitions entirely if on vllm CPU version (to avoid xformers import

commit 590a240fe53dd78e62c78f7ac0263b0c3fda6949
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:05:18 2024 -0400

    Formatting

commit 760355bfeea93c7b85cf440f597485e11a7357b1
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:04:43 2024 -0400

    bart test skipped on CPU version of vllm

commit db5539a85f83ceaa929e2c02129a1a174fa29424
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 05:00:25 2024 -0400

    format

commit 3d5bb888cfc10c835ff17c18ca367c930a335785
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 04:48:48 2024 -0400

    EncoderDecoderModelInput correctly handles encoder token/position fields

commit 447a5c7e10b09c1e5cff95e907198d6d050f1ffa
Merge: 9ce2da454 22e79ee8f
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 04:29:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 9ce2da45412de77bb358c2ce97521fa6a8b7990d
Merge: c5ceb2348 eeceadaec
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 13 19:26:27 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c5ceb23486c3f3ddd15faf8fcf06fcc1ba722fe1
Merge: 196f30cd7 41708e503
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 13 02:18:32 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 196f30cd7f25a682dc3d2320d994f949b00084a2
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 12 11:15:56 2024 -0400

    enc/dec decoder test working, sans sampling check

commit 9c898f5b28113ea53758c447175fd9cfd67b2066
Merge: 685604cfc f7160d946
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 12 09:41:15 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 685604cfcb90b6e74e37dbf5b5ee478e157f8191
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 12 09:40:42 2024 -0400

    wip modelrunner

commit f6499442e7c434c3ce4a187b344481988f106471
Merge: 9a63f51bd b422d4961
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 10 12:51:51 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner

commit 9a63f51bde8059fc361cc7abb2249ce1efb54163
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 10 12:50:40 2024 -0400

    wip model runner

commit fe7786c8a510d2280f3e25a8461474bb17ab8e11
Merge: 26b6271ca a5c28fca8
Author: laishzh <[email protected]>
Date:   Thu Jul 11 00:27:08 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit 6a71f8f4359dab04b9811b84d338db40dafa72bc
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 17:23:01 2024 -0400

    formatting

commit b4a461d983ed0215777c89f6b2ecbaa754422d4e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 17:18:56 2024 -0400

    formatting

commit d1343aac0fe6c0063f950e3600f9264aacb0836d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 17:07:43 2024 -0400

    scheduler test passes

commit c95adf50adcdc315f63b276f52ac9a6a2d35b5fa
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:49:34 2024 -0400

    scheduler supports encoder-/cross-attention & passes existing scheduler tests, but needs new encoder/decoder-specific tests

commit 4c01f1300161bb4a16fdc27612cdace516aedebc
Merge: 2c80185fb 4d6ada947
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:38:22 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 2c80185fb81602a9a39afe4137bc5f59bcb69f57
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:36:11 2024 -0400

    formatting

commit bd14d29177dda7bd1f2ddd41ccba71703dbaa07d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:17:24 2024 -0400

    wip scheduler

commit c90140fba9d3ec2ee8a065a267aef571e93c64db
Merge: 88e284a53 4f0e0ea13
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 17:55:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner2

commit 88e284a5344699e099e5510e5a353b9c5a54d0c7
Merge: db49d48f2 543aa4857
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 13:26:10 2024 -0400

    merge from main

commit db49d48f2a0913251385e324b28af06bd81cc121
Merge: 22d013c1d 6cd595c3c
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 11:15:43 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2

commit 6cd595c3c879d4ee603bb6a5bc0f1724647a5135
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:47:20 2024 -0400

    formatting

commit 5df73fc708bf3370a5f6d7f85cce4772d5c679b5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:47:04 2024 -0400

    xformers backend cleanup

commit d8a692b7dde0656696b726497030970aac0b53d3
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:39:37 2024 -0400

    cleaning up a number of backends & backends utils.py

commit 097aff2029e4560ae28bd7a7acf0f20509f803fe
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:36:05 2024 -0400

    vllm/attention/backends/flash_attn.py cleanup

commit 45fc9f71641bdd17c67997598463f12ead3998b2
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:35:00 2024 -0400

    vllm/attention/backends/blocksparse_attn.py cleanup

commit 5ee30fed1d27dbef98dc3e4512741c9ca301197c
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:31:09 2024 -0400

    vllm/attention/backends/abstract.py cleanup

commit 4f27946dcfb73f0a60420eb3ca6c9a74f6c6d3d1
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:27:35 2024 -0400

    tests/kernels/utils.py cleanup

commit a1bf65212cab0933b2520d8557a9d9132fff8c3d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:17:04 2024 -0400

    test_encoder_decoder_attn.py cleanup

commit 9ae6728ecfe48769f578b0fad3f8e3950daa683d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:46:58 2024 -0400

    fixed specific point-changes requested by woosuk

commit 7ce9a51d4fb3e286fdaa3a3ba12e60d0908d2d64
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:38:03 2024 -0400

    merged in first pieces of woosuk feedback & latest main; formatting

commit e837a73be0b61434116d1f332a84266d05cd61fc
Merge: 07df0e158 7e0bc5725
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:36:30 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn_reviews' into infra_enc_dec_cross_attn

commit 7e0bc572541e6018a7cfcebd16ea08b26826b975
Merge: 13f5b5078 717f4bcea
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:35:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 07df0e158a60b7d2a90407eecc868eaa10a58180
Author: afeldman-nm <[email protected]>
Date:   Mon Jul 8 09:33:03 2024 -0400

    Update vllm/attention/layer.py

    Co-authored-by: Woosuk Kwon <[email protected]>

commit 5dbebbc6f3aafe706a5555119fefa519b71c4634
Author: afeldman-nm <[email protected]>
Date:   Mon Jul 8 09:32:43 2024 -0400

    Update vllm/attention/backends/torch_sdpa.py

    nit: This will reduce the number of line changes and make the code look better.

    Co-authored-by: Woosuk Kwon <[email protected]>

commit 13f5b5078cdd81f58ed88a653ecc8ddc0968c073
Merge: d81662c57 abad5746a
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 5 15:07:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 22d013c1de08aa8bc5747c513b12e0c3dd59d144
Merge: ba09fbcd6 d81662c57
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 4 00:24:29 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2

commit d81662c572948ca9e01db21ec5f14f71c9fd1764
Merge: 2f0eb9b59 3dd507083
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 22:59:32 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 2f0eb9b591f298879df48be6d0a74196cf32a5cf
Merge: 65e47db5a 966fe7214
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 18:58:24 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit ba09fbcd6b7efff359b1a0cef47c385d130b777d
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 11:32:18 2024 -0400

    refactored where a number of constants are stored, primarily constants related to encoder/decoder

commit b085795eefcf31303c5e38bd734544664b5757c5
Merge: 44c62708f 65e47db5a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 11:22:23 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2

commit 44c62708f3645f8a82b17a63849c1822a2dca645
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 10:15:57 2024 -0400

    manually merged BART code in from previous modelrunner attempt, it won't work tho until new modelrunner is finished

commit 65e47db5a59087af005e97df20f9d1a5be466a3c
Merge: 2828aa793 7cd2ebb02
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 07:52:12 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 2828aa7936adab0d2ee3b49ffb0cfd01848581ab
Merge: 5ff9c7686 af9ad46fc
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 30 20:16:34 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 5ff9c7686339f8d5f8e42060c1772f43468f2459
Merge: 8d36458fb 7836fdcc1
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 30 18:21:25 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 8d36458fb640e61fd70844739d107f41c0f3e631
Merge: 64981b535 75aa1442d
Author: Andrew Feldman <[email protected]>
Date:   Sat Jun 29 14:15:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 64981b535c557ada816b338f83cccf8c11ad0f83
Merge: 83d474e93 2cd402e16
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 28 15:37:00 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 83d474e93559ebbaf51194ef818f2308fd16ef1a
Merge: a5018499e 57f09a419
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 28 10:18:17 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit a5018499e3b8475749a8d1af80e14c8d172cf2c7
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 18:57:56 2024 -0400

    reverted unnecessarily vllm/utils.py changes

commit c8f8d59d4ce7e1a3c104bd417f256e9b8f954815
Merge: bcccc3486 c3dde367f
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 17:34:16 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit bcccc34863f5864307ef9c781471cef4e5d38ba8
Merge: 75756b91e 3fd02bda5
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 13:59:00 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 75756b91e3753a9c2a60dbae42b2e46d3612ece5
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 11:28:19 2024 -0400

    removed redundant elif

commit c24697fe82c844e13c820db916efef0a6b789374
Merge: 7ca0d7a39 e9d32d077
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 11:23:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 7ca0d7a399da475099cf501b1f4981a7dffc067a
Merge: 4dabe1974 294104c3f
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 26 19:37:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit a5c28fca8f5e21653c6e5874719467e08d3d8503
Merge: ba4e2c12e 4dabe1974
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 15:52:22 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 4dabe1974766c6db8fd6ce8b6688c25bbd85b633
Merge: e2a46e3b7 dd248f767
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 15:48:31 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit ba4e2c12e6f1a03e3381cabda8902d55df9a292e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 04:05:23 2024 -0400

    Removed unnecessary position arguments from BART routine; formatting

commit 41e31e861b01896a99fba2f2ea44b717164c4398
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:59:48 2024 -0400

    BART with new explanatory comments & passing formatting tests

commit e61385d90e40b423e1e5d98839413a76ffcd11fb
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:49:18 2024 -0400

    fixed bug caused by overzealous refactoring

commit 4400d7733f7dca2acffac916a00f5edc6a89e14e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:36:28 2024 -0400

    some reformatting

commit 5169a2a6518d5ae338001eae0eae6dad64bf52eb
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:25:40 2024 -0400

    removed unnecessary positions arguments from BART encoder, decoder forward()

commit d43141f20514e77963e1c13ba857b1d3cb71c210
Merge: 753bab068 e2a46e3b7
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:16:19 2024 -0400

    merge; a lot of formatting fixes to bart code but not fully passing

commit e2a46e3b7b9f9d1a9cc751046c3cddd1522620ed
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:53:35 2024 -0400

    formatting

commit 1a6e5a31846e2ef886b66e9cc9216ffe983d0ec0
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:52:04 2024 -0400

    moved make_tensor_with_pad() helper function back to vllm.utils

commit d23c28466765496049a1696d0a053a0a2505ce9a
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:38:08 2024 -0400

    typing and formatting; fixed escape sequences in comments

commit 2f0b05bb805513e73eb0609ea87b6367ec9d4803
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:35:34 2024 -0400

    typing and formatting

commit 47c9f396fdcd40895597423ebfefe585b014c2f3
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:32:52 2024 -0400

    removed attention_type

commit 06c7f7500140c574d20a12079dbd1ef83db29688
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:28:42 2024 -0400

    reorganized helper functions that were only being used for testing into tests/kernels/utils.py from vllm/utils.py

commit a178b7a8c9838665ee7e169471206b70d62e1b71
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:20:00 2024 -0400

    changed nested if/else to elif/else in xformers mask computation code

commit 597526a49e041ec99329add79ef272ce6e457b9e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:18:02 2024 -0400

    removed extra line

commit 125e5dc46724155f5d81e93a7644a3889e864a2f
Merge: 5ce2dd083 e9de9dd55
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:16:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 753bab06880a05726b2b8274a20d8f9d179c9576
Merge: 919bf88f8 e9de9dd55
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:14:20 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 919bf88f8925b2e60c765f309df655318c392c2e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:13:52 2024 -0400

    BART e2e test runs but does not pass

commit b7ff75fc3d3cb5d447503daa8a4a78aa6bf1a18d
Merge: 2d8429e1b ba991d5c8
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 19:25:24 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 2d8429e1b0002eccb7deaa805d25ebb6d5616187
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 18:47:19 2024 -0400

    fixed a number of bugs related to BART decode-phase; added support for the particular architecture alias used by bart-large-cnn

commit 8f9ee625557ec34ec29787b6b66ec760ff390e77
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 18:06:10 2024 -0400

    wip bart-cnn summarization example

commit d58e8c8464d5bcf41121a582b035f5f290658657
Merge: 6fd4c020a 1744cc99b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 15:50:28 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 6fd4c020a9c5ee8ecbf6e086d8b9dfefb3f8396f
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 15:42:09 2024 -0400

    fixed prompt processing bug that was preventing inference from starting

commit 7d2fcf90a6516be432ffd39f4571ed0a524438b2
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 15:39:07 2024 -0400

    BART passes profile run

commit 3b95225850af9b81a15142344c4c8bae7257a519
Merge: 8b8c40943 b8d5637c5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 13:19:42 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_bart' into infra_enc_dec_model_runner_reviews

commit 8b8c40943e2e0a4b104ca65c76441d3db03a017d
Merge: 42c364439 5ce2dd083
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 13:04:54 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 5ce2dd08345da9e5a19a913214e5a73ed4923c8d
Merge: ce88fa36e c24621295
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 12:55:03 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit b8d5637c510b42a6503d9b0c4d810fe3568314dd
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 12:50:25 2024 -0400

    wip bart

commit 59caabecf2666c33306625843908b1d9dc2ffa8b
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 21:42:39 2024 -0400

    BART almost passing profile_run()

commit f2dac1ce0ae1033b5143b8f1cd234e1eee5e67ee
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 20:13:05 2024 -0400

    wip

commit 082be510533d1e39008db19ca8754a91aa4879d3
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 19:36:46 2024 -0400

    loading tied weights

commit 42c36443981dd89c9defaf2f51c1481ddb0a5751
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 16:24:26 2024 -0400

    encoder decoder model runner fails for unsupported scenarios

commit 9ad5143ab290419d27fcde1287d9bea853a58be3
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 16:00:15 2024 -0400

    refactored backend constants

commit 001cb185141278b6ea3a2fbbf6200032104229e0
Merge: 6219d9590 ce88fa36e
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:40:19 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit ce88fa36e6cdbe0352348207a6a4dc405fcd9d76
Merge: ca68c63db f1e72cc19
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:39:06 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 6219d9590dfae14c574d598ce879af58fe97177f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:36:36 2024 -0400

    Formatting

commit 576c26c86a9b210fcca29180ed20fd15770f2660
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:35:11 2024 -0400

    first pass a BART load_weights; probably not handling qkv correctly

commit c11db0fd30e326d2273da95439c5087e83725b04
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:21:15 2024 -0400

    integrating BART weight loading code

commit 2123517ef5fc8a5593e693b7d28d8c217c729282
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:13:36 2024 -0400

    formatting

commit 97cad4b875ee09ebeff455a20fdf351eef9d2f16
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 14:40:40 2024 -0400

    wip BART model cleanup

commit 45a53877dc815398f1f190fa7e7d513db7928b6f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 14:28:59 2024 -0400

    pruning out training functionality & unnecessary code from BART

commit 30becae9d35d4b994bcd995c81603a97b93d0e3d
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 13:45:48 2024 -0400

    profiling fix; wip bart

commit d2ad2328e41ad7a8898ddbb37db8c1bfaf2ae803
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 13:37:27 2024 -0400

    wip bart integration

commit ed610b0b9a6abcdaf874d16225a441509a207076
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:09:51 2024 -0400

    pulled in bart model code

commit 28f0d2fff6752a90227aa8aa07ca32e43bee395d
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:06:56 2024 -0400

    pulled in bart code

commit 213dc597274da4c963510b1d72166d0a8eddbc7b
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:03:50 2024 -0400

    test_bart.py

commit 49c7162d70441963ec6c26430a8e36426fbfe1aa
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:01:59 2024 -0400

    formatting

commit 84c0dcc5fe2b653cb0517df523504a107055061a
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 11:58:45 2024 -0400

    scheduler tests

commit c15731710bd5c317638fef4d861567031d6126b8
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 11:30:25 2024 -0400

    free sequence groups

commit 614de4e13869f1b2938d1f30369bbb98752a20c6
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:54:25 2024 -0400

    formatting

commit b6d4383e141e1fc23ee0c8c6bb9a7d172949266a
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:46:15 2024 -0400

    enc/dec integrated in Scheduler.schedule()

commit 89b0e445bb32bbd5758bdcc05cd1bb869101029e
Merge: beec4f571 ca68c63db
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:27:42 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit ca68c63db6ef8b9fcd132e84ffc6db1b7c7f618f
Merge: e9d7ede3b bd620b01f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:26:54 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit beec4f5717d5c8193d70449c066f2aa469bf50b0
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:24:50 2024 -0400

    enc/dec support in LLMEngine._add_processed_request()

commit a1ab7a110c334f54dc451f1b273c3b0f0345332e
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 09:50:37 2024 -0400

    removing BART test

commit 7000573396666a58cf5ca06d626f2b4c2e4f8bb2
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 09:49:37 2024 -0400

    temporarily removing BART work

commit 1bd916c2f91f7b8d755a9142ee3daeb7d5e489cb
Merge: 2b2d2e9df bd620b01f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 09:38:05 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 2b2d2e9df2b1535883e36b8353a26d52200f7783
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 08:55:19 2024 -0400

    wip encoder/decoder API integration; WIP BART integration; WIP BART example

commit e9ecd25cb733b220785611056295ea9787b1ce47
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 05:48:50 2024 -0400

    added unoptimized BART example

commit 2fccd1832a0933dca8537e436449dad4d52fa0c3
Merge: de967174d 0f645112d
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:28:07 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner_bart

commit 0f645112de4e1784cd43be505e659f3d3bd56581
Merge: 58139e380 e9d7ede3b
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:27:25 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit e9d7ede3bfef92527a643809f4beb20cb780e7c0
Merge: 67ed41961 d9a252bc8
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:26:01 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit de967174dcbbdb5e81d975edf158416bcbeb74cd
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:25:36 2024 -0400

    wip bart test

commit 58139e3808060c550264c800e605129d0082af5c
Merge: f8569facd d9a252bc8
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 01:55:08 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit f8569facd10b0cbf05689bfc364831a37bb48b45
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 00:35:24 2024 -0400

    formatting

commit eb5819be6025f0e598831e7e13c0656e184e9524
Merge: a0068fc91 1f5674218
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 00:23:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit a0068fc9112c5acefe69f5a8e30470c73a90a039
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 00:21:05 2024 -0400

    Encoder/decoder model runner passes prefill/decode/empty-SG tests

commit f0094bd8a90cc26325f1ea7ca1506fc459a312c9
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 20 10:59:52 2024 -0400

    wip enc/dec modelrunner prepare_prompt test

commit 736cf45223517f5720aedc53b65258ee8a75a25c
Merge: 1581eb7f9 f9f9ae39e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:56:31 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner_bart

commit f9f9ae39eea1dd6367cec3b2e878e1d2f3bef4ad
Merge: a8a52d293 67ed41961
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:31:41 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 67ed419619301a39c04417b29c90822a837e6362
Merge: ea37e17ab 3730a1c83
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:29:04 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 1581eb7f978a83690e0aaa2b390be491b42ffb15
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:28:28 2024 -0400

    wip

commit fbec309f0cc8d94df6ba7ab3f71f172d30f73531
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 01:14:35 2024 -0400

    moved enc/dec error strings to top-level vllm utils

commit a8a52d2935d5a2ab969c05d498ec2423ae19507b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 23:39:15 2024 -0400

    some formatting fixes

commit 37aeed66141b10b0d43c8e6d56613806dc7108ff
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 23:35:11 2024 -0400

    enc dec model runner testable if only for encoder decoder model

commit e3ba61e368f0085fe64e8dae3d80494f5254164c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 22:44:23 2024 -0400

    wip

commit 3311aac9bddd474d0a7037b53c53dfc515df0bcc
Merge: f9314fd7d 59a1eb59c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 22:43:23 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit f9314fd7d1ae0d3146d7456eb41e6885f0055a5d
Merge: 89fdb8116 ea37e17ab
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 22:43:07 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit ea37e17ab5ad7c084c13bf8e8492039d6a9bcdbf
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 19:16:38 2024 -0400

    merge conflict; typing; formatting

commit 91cbaa63d35e72ed0c14b65ed7f79bffdda2da97
Merge: 525303c7c 2bd231a7b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 19:15:10 2024 -0400

    merge; resolve conflicts

commit 525303c7c61127900680ff06b6cc09610001b71e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 18:06:33 2024 -0400

    num encoder tokens

commit 5f8c7f6cd6776cbda8289a5cee28e5cd8b858f4d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 11:26:24 2024 -0400

    Moved attention type for attn_metadata to attention forward(); added NotImplement failures to backends in non-decoder-only scenarios

commit c3f7da7620921e14e6c7efabeb0c54fd3d08b30b
Merge: 7b9cb7f43 13db4369d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 11:01:28 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 7b9cb7f4339364b66180bf5cf7015f8fea67479d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 11:01:05 2024 -0400

    Replace attn_metadata.attention_type and attn_metadata._attn_type with attn_type argument to forward()

commit d0fd9e10ff13157183fc24dfcb558f83c716ead6
Merge: addde7d22 4ad7b53e5
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 09:58:57 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 26b6271caa9b776b0093b874ab94dc8df0bb36b9
Merge: 3ea38598e db5ec52ad
Author: laishzh <[email protected]>
Date:   Tue Jun 18 17:49:40 2024 +0800

    Merge branch 'vllm-project:main' into main

commit addde7d22cda9ab0d006538ec0f900ac593c9292
Merge: 47586807a 114d7270f
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 00:53:01 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 89fdb811629bfe86ce5aaf85e078ce953e03e700
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 00:52:29 2024 -0400

    first pass at _prepare_encoder_model_input()

commit c7bf81228dc06a1ed2c9d7e7e6f0d61e476e7e7b
Merge: 830a05126 47586807a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 17 10:37:42 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 47586807a3e8e75c6e9c27d1d17aeb22b0dff63d
Merge: 90aec385a e2b85cf86
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 17 10:35:45 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 830a051267732f60b04b99a15552ea984b9f43f8
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 17 01:16:25 2024 -0400

    format

commit e5c029926043518e63b85739d369b6cbbb9eddda
Merge: 9cb8ee685 90aec385a
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 22:59:32 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 90aec385a0e77574f5b575257e29b194f6974521
Merge: e229e0018 845a3f26f
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 22:50:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit e229e0018138698bf13135f067eaf32a8cbf9167
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 22:47:04 2024 -0400

    format

commit 4dccd51c91fd3c1ae3a9ecea4baa46cad2a5f7dd
Merge: b3c3411e2 f07d51332
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 20:26:41 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit b3c3411e26b7cf6f27604825d99a920c34605c9c
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 14 16:39:35 2024 -0400

    formatting

commit f06c6873d77962c7b27fc7f0c29397381dd0a5be
Merge: 708a4b39a e2afb03c9
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 14 16:38:18 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_encoder_only

commit 708a4b39a73c48e…
  • Loading branch information
maxdebayser committed Oct 11, 2024
1 parent bc59bd3 commit e755251
Show file tree
Hide file tree
Showing 18 changed files with 845 additions and 163 deletions.
6 changes: 6 additions & 0 deletions csrc/attention/attention_kernels.cu
Original file line number Diff line number Diff line change
Expand Up @@ -739,6 +739,9 @@ void paged_attention_v1_launcher(
// NOTE(woosuk): To reduce the compilation time, we only compile for the
// head sizes that we use in the model. However, we can easily extend this
// to support any head size which is a multiple of 16.
case 32:
LAUNCH_PAGED_ATTENTION_V1(32);
break;
case 64:
LAUNCH_PAGED_ATTENTION_V1(64);
break;
Expand Down Expand Up @@ -903,6 +906,9 @@ void paged_attention_v2_launcher(
// NOTE(woosuk): To reduce the compilation time, we only compile for the
// head sizes that we use in the model. However, we can easily extend this
// to support any head size which is a multiple of 16.
case 32:
LAUNCH_PAGED_ATTENTION_V2(32);
break;
case 64:
LAUNCH_PAGED_ATTENTION_V2(64);
break;
Expand Down
6 changes: 6 additions & 0 deletions csrc/cpu/attention.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,9 @@ void paged_attention_v1_impl_launcher(
int* seq_lens_ptr = seq_lens.data_ptr<int>();

switch (head_size) {
case 32:
LAUNCH_V1_ATTENTION_KERNEL(T, 64, BLOCK_SIZE);
break;
case 64:
LAUNCH_V1_ATTENTION_KERNEL(T, 64, BLOCK_SIZE);
break;
Expand Down Expand Up @@ -692,6 +695,9 @@ void paged_attention_v2_impl_launcher(
int* seq_lens_ptr = seq_lens.data_ptr<int>();

switch (head_size) {
case 32:
LAUNCH_V2_ATTENTION_KERNEL(T, 64, BLOCK_SIZE);
break;
case 64:
LAUNCH_V2_ATTENTION_KERNEL(T, 64, BLOCK_SIZE);
break;
Expand Down
16 changes: 16 additions & 0 deletions examples/offline_inference_bert_embedding.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from vllm import LLM

# Sample prompts.
prompts = [
"This is an example sentence.",
"Another example sentence.",
]

# Create an LLM.
model = LLM(model="bert-base-uncased", enforce_eager=True)
outputs = model.encode(prompts)

# Print the outputs.
for output in outputs:
print(output.outputs.embedding) # list of 768 floats
print(len(output.outputs.embedding))
13 changes: 10 additions & 3 deletions examples/offline_inference_embedding.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
from vllm import LLM
from vllm.inputs import build_decoder_prompts

# Sample prompts.
prompts = [
prompts = build_decoder_prompts([
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
])

# Create an LLM.
model = LLM(model="intfloat/e5-mistral-7b-instruct", enforce_eager=True)
model = LLM(
model="intfloat/e5-mistral-7b-instruct",
enforce_eager=True,
# NOTE: sliding_window is not supported by encoder_decoder_model
disable_sliding_window=True,
gpu_memory_utilization=0.95,
)
# Generate embedding. The output is a list of EmbeddingRequestOutputs.
outputs = model.encode(prompts)
# Print the outputs.
Expand Down
36 changes: 30 additions & 6 deletions tests/models/embedding/language/test_embedding.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,22 @@
import torch
import torch.nn.functional as F

from vllm.inputs import build_decoder_prompts

MODELS = [
"intfloat/e5-mistral-7b-instruct",
"BAAI/bge-multilingual-gemma2",
{
"name": "intfloat/e5-mistral-7b-instruct",
"is_decoder_only": True
},
{
"name": "BAAI/bge-multilingual-gemma2",
"is_decoder_only": True
},
{
"name": "bert-base-uncased",
"is_decoder_only": False,
"max_model_len": 512
},
]


Expand All @@ -26,7 +39,7 @@ def test_models(
hf_runner,
vllm_runner,
example_prompts,
model: str,
model: dict,
dtype: str,
) -> None:
# The example_prompts has ending "\n", for example:
Expand All @@ -37,11 +50,22 @@ def test_models(
# So we need to strip the input texts to avoid test failing.
example_prompts = [str(s).strip() for s in example_prompts]

with hf_runner(model, dtype=dtype, is_embedding_model=True) as hf_model:
model_name = model["name"]
is_decoder_only = model["is_decoder_only"]
max_model_len = model.get("max_model_len", 1024)
with hf_runner(model_name, dtype=dtype,
is_embedding_model=True) as hf_model:
hf_outputs = hf_model.encode(example_prompts)

with vllm_runner(model, dtype=dtype) as vllm_model:
vllm_outputs = vllm_model.encode(example_prompts)
with vllm_runner(
model_name,
dtype=dtype,
disable_sliding_window=True,
max_model_len=max_model_len,
) as vllm_model:
prompt_inputs = build_decoder_prompts(
example_prompts) if is_decoder_only else example_prompts
vllm_outputs = vllm_model.encode(prompt_inputs)

similarities = compare_embeddings(hf_outputs, vllm_outputs)
all_similarities = torch.stack(similarities)
Expand Down
2 changes: 1 addition & 1 deletion vllm/attention/ops/paged_attn.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class PagedAttention:

@staticmethod
def get_supported_head_sizes() -> List[int]:
return [64, 80, 96, 112, 120, 128, 192, 256]
return [32, 64, 80, 96, 112, 120, 128, 192, 256]

@staticmethod
def get_kv_cache_shape(
Expand Down
7 changes: 7 additions & 0 deletions vllm/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,13 @@ def is_encoder_decoder_model(self) -> bool:
(hasattr(self.hf_config, "text_config") and getattr(
self.hf_config.text_config, "is_encoder_decoder", False)))

@property
def is_encoder_model(self) -> bool:
is_encoder_decoder = getattr(self.hf_config, "is_encoder_decoder",
False)
is_decoder = getattr(self.hf_config, "is_decoder", False)
return is_encoder_decoder is False and is_decoder is False

@property
def is_embedding_model(self) -> bool:
"""Extract the embedding model flag."""
Expand Down
7 changes: 7 additions & 0 deletions vllm/core/embedding_model_block_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,16 @@ def free(self, seq: Sequence) -> None:
# No operation on free
return

def free_cross(self, seq: Sequence) -> None:
# No operation on free
return

def get_block_table(self, seq: Sequence) -> List[int]:
return None # type: ignore

def get_cross_block_table(self, seq: Sequence) -> List[int]:
return None # type: ignore

def get_num_free_gpu_blocks(self) -> int:
return 1

Expand Down
7 changes: 5 additions & 2 deletions vllm/inputs/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from .data import (EncoderDecoderLLMInputs, ExplicitEncoderDecoderPrompt,
LLMInputs, PromptType, SingletonPrompt, TextPrompt,
TokensPrompt, build_explicit_enc_dec_prompt,
to_enc_dec_tuple_list, zip_enc_dec_prompts)
TokensPrompt, build_decoder_prompt, build_decoder_prompts,
build_explicit_enc_dec_prompt, to_enc_dec_tuple_list,
zip_enc_dec_prompts)
from .registry import InputContext, InputRegistry

INPUT_REGISTRY = InputRegistry()
Expand All @@ -21,6 +22,8 @@
"ExplicitEncoderDecoderPrompt",
"LLMInputs",
"EncoderDecoderLLMInputs",
"build_decoder_prompt",
"build_decoder_prompts",
"build_explicit_enc_dec_prompt",
"to_enc_dec_tuple_list",
"zip_enc_dec_prompts",
Expand Down
12 changes: 12 additions & 0 deletions vllm/inputs/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,18 @@ def to_enc_dec_tuple_list(
for enc_dec_prompt in enc_dec_prompts]


def build_decoder_prompt(
prompt: _T2, ) -> ExplicitEncoderDecoderPrompt[SingletonPrompt, _T2]:
return build_explicit_enc_dec_prompt(encoder_prompt="",
decoder_prompt=prompt)


def build_decoder_prompts(
prompts: Iterable[_T2],
) -> List[ExplicitEncoderDecoderPrompt[SingletonPrompt, _T2]]:
return [build_decoder_prompt(prompt) for prompt in prompts]


def __getattr__(name: str):
if name == "PromptInput":
import warnings
Expand Down
19 changes: 14 additions & 5 deletions vllm/inputs/preprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
Optional["MultiModalDataDict"]]
DecoderPromptComponents = Tuple[Optional[str], Optional[List[int]],
Optional["MultiModalDataDict"]]
_DEFAULT_BOS_TOKEN_ID = 1


class InputPreprocessor:
Expand Down Expand Up @@ -52,7 +53,13 @@ def get_bos_token_id(self,
"is not initialized")
return None

return self.tokenizer.get_lora_tokenizer(lora_request).bos_token_id
bos_token_id = self.tokenizer.get_lora_tokenizer(
lora_request).bos_token_id

if bos_token_id is None and self.model_config.is_encoder_model:
bos_token_id = _DEFAULT_BOS_TOKEN_ID

return bos_token_id

def get_eos_token_id(self,
lora_request: Optional[LoRARequest] = None
Expand Down Expand Up @@ -84,9 +91,10 @@ def get_decoder_start_token_id(self) -> Optional[int]:
dec_start_token_id = getattr(self.model_config.hf_config,
'decoder_start_token_id', None)
if dec_start_token_id is None:
print_warning_once("Falling back on <BOS> for decoder start token "
"id because decoder start token id is not "
"available.")
if not self.model_config.is_encoder_model:
logger.warning(
"Falling back on <BOS> for decoder start token id "
"because decoder start token id is not available.")
dec_start_token_id = self.get_bos_token_id()

return dec_start_token_id
Expand Down Expand Up @@ -543,4 +551,5 @@ async def preprocess_async(
)

def is_encoder_decoder_model(self):
return self.model_config.is_encoder_decoder_model
return self.model_config.is_encoder_decoder_model \
or self.model_config.is_encoder_model
12 changes: 12 additions & 0 deletions vllm/model_executor/layers/pooler.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ class PoolingType(IntEnum):
"""Enumeration for different types of pooling methods."""
LAST = 0
ALL = 1
MEAN = 2


class Pooler(nn.Module):
Expand Down Expand Up @@ -50,6 +51,17 @@ def forward(
for prompt_len in prompt_lens:
pooled_data.append(hidden_states[offset:offset + prompt_len])
offset += prompt_len
elif self.pooling_type == PoolingType.MEAN:
# Calculate mean pooling
cumsum = torch.cumsum(hidden_states, dim=0)
start_indices = torch.cat([
torch.tensor([0], device=hidden_states.device),
torch.cumsum(prompt_lens[:-1], dim=0)
])
end_indices = torch.cumsum(prompt_lens, dim=0)
pooled_data = (
cumsum[end_indices - 1] - cumsum[start_indices] +
hidden_states[start_indices]) / prompt_lens.unsqueeze(1)
else:
raise ValueError(f"Invalid pooling type: {self.pooling_type}")

Expand Down
Loading

0 comments on commit e755251

Please sign in to comment.