Skip to content

Commit

Permalink
support qwen-1.8b
Browse files Browse the repository at this point in the history
  • Loading branch information
wangzhaode committed Dec 5, 2023
1 parent ccc1461 commit 1dee908
Show file tree
Hide file tree
Showing 10 changed files with 115 additions and 19 deletions.
79 changes: 79 additions & 0 deletions .github/model-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
name: model-test
on:
push:
branches:
- master
- 'feature/**'
paths:
- 'src/**'
- '.github/workflows/model-test.yml'
pull_request:
branches: [master]
paths:
- 'src/**'
- '.github/workflows/model-test.yml'

jobs:
llm-build:
name: ${{ matrix.os }}-build
env:
PACAGE_DIR: ${{ matrix.os }}-package
PACAGE_FILE: ${{ matrix.os }}-package.zip
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]

steps:
- uses: actions/checkout@v3
# linux and macos
- name: linux-macos-build-pack
if: matrix.os != 'windows-latest'
run: |
./script/build.sh
./script/package.sh $PACAGE_DIR
zip -r $PACAGE_FILE $PACAGE_DIR
# windows
- name: windows-build-pack
if: matrix.os == 'windows-latest'
run: |
.\script\build.ps1
.\script\package.ps1 windows-package
7z a -r windows-package.zip windows-package
# upload
- name: upload-zip
uses: actions/upload-artifact@v3
with:
path: ./*.zip

model-test:
needs: llm-build
name: ${{ matrix.model }}-${{ matrix.os }}-test
runs-on: ${{ matrix.os }}
env:
PACAGE_DIR: ${{ matrix.os }}-package
PACAGE_FILE: ${{ matrix.os }}-package.zip
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
model: [chatglm-6b, chatglm2-6b, codegeex2-6b, qwen-7b-chat, baichuan2-7b-chat, llama2-7b-chat]

steps:
- uses: actions/download-artifact@v3
with:
name: artifact
path: workspace
- name: linux-macos-test
if: matrix.os != 'windows-latest'
run: |
cd workspace
unzip $PACAGE_FILE
cd $PACAGE_DIR
./script/model_test.sh ${{ matrix.model }}
- name: windows-test
if: matrix.os == 'windows-latest'
run: |
cd workspace
7z x windows-package.zip
cd windows-package
./script/model_test.ps1 ${{ matrix.model }}
2 changes: 1 addition & 1 deletion .github/workflows/linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
model: [chatglm-6b, chatglm2-6b, chatglm3-6b, codegeex2-6b, qwen-7b-chat, baichuan2-7b-chat, llama2-7b-chat]
model: [qwen-1.8b, chatglm-6b, chatglm2-6b, chatglm3-6b, codegeex2-6b, qwen-7b-chat, baichuan2-7b-chat, llama2-7b-chat]

steps:
- uses: actions/download-artifact@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
runs-on: macos-latest
strategy:
matrix:
model: [chatglm-6b, chatglm2-6b, chatglm3-6b, codegeex2-6b, qwen-7b-chat, baichuan2-7b-chat, llama2-7b-chat]
model: [qwen-1.8b, chatglm-6b, chatglm2-6b, chatglm3-6b, codegeex2-6b, qwen-7b-chat, baichuan2-7b-chat, llama2-7b-chat]

steps:
- uses: actions/download-artifact@v3
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
runs-on: windows-latest
strategy:
matrix:
model: [chatglm-6b, chatglm2-6b, chatglm3-6b, codegeex2-6b, qwen-7b-chat, baichuan2-7b-chat, llama2-7b-chat]
model: [qwen-1.8b, chatglm-6b, chatglm2-6b, chatglm3-6b, codegeex2-6b, qwen-7b-chat, baichuan2-7b-chat, llama2-7b-chat]

steps:
- uses: actions/download-artifact@v3
Expand All @@ -54,4 +54,4 @@ jobs:
./script/download_model.ps1 ${{ matrix.model }}
cd build
.\Release\cli_demo ..\${{ matrix.model }} prompt.txt
Exit 0
Exit 0
35 changes: 22 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ llm模型导出onnx模型请使用[llm-export](https://github.com/wangzhaode/llm
| Qwen-7B-Chat | [![Download][download-qwen-7b-chat-onnx]][release-qwen-7b-chat-onnx] | [![Download][download-qwen-7b-chat-mnn]][release-qwen-7b-chat-mnn] |
| Baichuan2-7B-Chat | [![Download][download-baichuan2-7b-chat-onnx]][release-baichuan2-7b-chat-onnx] | [![Download][download-baichuan2-7b-chat-mnn]][release-baichuan2-7b-chat-mnn] |
| Llama-2-7b-chat | [![Download][download-llama2-7b-chat-onnx]][release-llama2-7b-chat-onnx] | [![Download][download-llama2-7b-chat-mnn]][release-llama2-7b-chat-mnn] |
| Qwen-1_8B-Chat | [![Download][download-qwen-1.8b-onnx]][release-qwen-1.8b-onnx] | [![Download][download-qwen-1.8b-mnn]][release-qwen-1.8b-mnn] |

其他版本:
- Qwen-1_8B-Chat-int8:[![Download][download-qwen-1.8b-mnn-int8]][release-qwen-1.8b-mnn-int8]

[download-chatglm-6b-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/chatglm-6b-onnx/total
[download-chatglm2-6b-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/chatglm2-6b-onnx/total
Expand All @@ -29,30 +33,38 @@ llm模型导出onnx模型请使用[llm-export](https://github.com/wangzhaode/llm
[download-qwen-7b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen-7b-chat-onnx/total
[download-baichuan2-7b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/baichuan2-7b-chat-onnx/total
[download-llama2-7b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/llama2-7b-chat-onnx/total
[download-qwen-1.8b-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen-1.8b-onnx/total
[release-chatglm-6b-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/chatglm-6b-onnx
[release-chatglm2-6b-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/chatglm2-6b-onnx
[release-chatglm3-6b-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/chatglm3-6b-onnx
[release-codegeex2-6b-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/codegeex2-6b-onnx
[release-qwen-7b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/qwen-7b-chat-onnx
[release-baichuan2-7b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/baichuan2-7b-chat-onnx
[release-llama2-7b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/llama2-7b-chat-onnx
[release-qwen-1.8b-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/qwen-1.8b-onnx
[download-chatglm-6b-mnn]: https://img.shields.io/github/downloads/wangzhaode/mnn-llm/chatglm-6b-mnn/total
[download-chatglm2-6b-mnn]: https://img.shields.io/github/downloads/wangzhaode/mnn-llm/chatglm2-6b-mnn/total
[download-chatglm3-6b-mnn]: https://img.shields.io/github/downloads/wangzhaode/mnn-llm/chatglm3-6b-mnn/total
[download-codegeex2-6b-mnn]: https://img.shields.io/github/downloads/wangzhaode/mnn-llm/codegeex2-6b-mnn/total
[download-qwen-7b-chat-mnn]: https://img.shields.io/github/downloads/wangzhaode/mnn-llm/qwen-7b-chat-mnn/total
[download-baichuan2-7b-chat-mnn]: https://img.shields.io/github/downloads/wangzhaode/mnn-llm/baichuan2-7b-chat-mnn/total
[download-llama2-7b-chat-mnn]: https://img.shields.io/github/downloads/wangzhaode/mnn-llm/llama2-7b-chat-mnn/total
[download-qwen-1.8b-mnn]: https://img.shields.io/github/downloads/wangzhaode/mnn-llm/qwen-1.8b-mnn/total
[download-qwen-1.8b-mnn-int8]: https://img.shields.io/github/downloads/wangzhaode/mnn-llm/qwen-1.8b-mnn-int8/total
[release-chatglm-6b-mnn]: https://github.com/wangzhaode/mnn-llm/releases/tag/chatglm-6b-mnn
[release-chatglm2-6b-mnn]: https://github.com/wangzhaode/mnn-llm/releases/tag/chatglm2-6b-mnn
[release-chatglm3-6b-mnn]: https://github.com/wangzhaode/mnn-llm/releases/tag/chatglm3-6b-mnn
[release-codegeex2-6b-mnn]: https://github.com/wangzhaode/mnn-llm/releases/tag/codegeex2-6b-mnn
[release-qwen-7b-chat-mnn]: https://github.com/wangzhaode/mnn-llm/releases/tag/qwen-7b-chat-mnn
[release-baichuan2-7b-chat-mnn]: https://github.com/wangzhaode/mnn-llm/releases/tag/baichuan2-7b-chat-mnn
[release-llama2-7b-chat-mnn]: https://github.com/wangzhaode/mnn-llm/releases/tag/llama2-7b-chat-mnn
[release-qwen-1.8b-mnn]: https://github.com/wangzhaode/mnn-llm/releases/tag/qwen-1.8b-mnn
[release-qwen-1.8b-mnn-int8]: https://github.com/wangzhaode/mnn-llm/releases/tag/qwen-1.8b-mnn-int8

### 速度

#### CPU 4线程速度: `prefill / decode` `tok/s`

| model | android(f16/32)| macos (f32) | linux (f32) | windows (f32) |
|:-----------------:|:--------------:|:-------------:|:--------------:|:--------------:|
| qwen-1.8b-int4 | 100.21 / 22.22 | 84.85 / 19.93 | 151.00 / 35.89 | 117.30 / 33.40 |
Expand All @@ -64,19 +76,16 @@ llm模型导出onnx模型请使用[llm-export](https://github.com/wangzhaode/llm
| baichuan2-7b-int4 | 13.87 / 6.08 | 17.21 / 6.10 | 30.11 / 10.87 | 26.31 / 9.84 |
| llama-2-7b-int4 | 17.98 / 5.17 | 19.72 / 5.06 | 34.47 / 9.29 | 28.66 / 8.90 |

- android
- 测试设备: XiaoMi12
- 处理器: Snapdragon 8gen1
- 内存大小: 8 GB
- macos
- 测试设备: MacBook Pro 2019
- 处理器: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
- 内存大小: 16 GB
- linux(wsl)/windows
- 测试设备: PC
- 处理器: Intel(R) Core(TM) i7-13700K @ 3.40 GHz
- 内存大小: 32 GB
- CPU 4线程速度: prefill / decode `tok/s`
测试的系统和设备信息如下,

| os | device | CPU | Memory |
|:--:|:-------:|:----:|:--------:|
| android | XiaoMi12 | Snapdragon 8gen1 | 8 GB |
| macos | MacBook Pro 2019 | Intel(R) Core(TM) i7-9750H CPU | 16 GB |
| linux | PC | Intel(R) Core(TM) i7-13700K | 32GB |
| windows | PC | Intel(R) Core(TM) i7-13700K | 32GB |




### 下载int4模型
Expand Down
3 changes: 3 additions & 0 deletions script/download_model.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ $block_num = 28
if ($model.Contains('7b')) {
$block_num = 32
}
if ($model.Contains('1.8b')) {
$block_num = 24
}
Invoke-WebRequest -Uri https://github.com/wangzhaode/mnn-llm/releases/download/$model-mnn/tokenizer.txt -OutFile tokenizer.txt
Invoke-WebRequest -Uri https://github.com/wangzhaode/mnn-llm/releases/download/$model-mnn/embedding.mnn -OutFile embedding.mnn
Invoke-WebRequest -Uri https://github.com/wangzhaode/mnn-llm/releases/download/$model-mnn/lm.mnn -OutFile lm.mnn
Expand Down
4 changes: 4 additions & 0 deletions script/download_model.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,14 @@ model=$1
mkdir $model
cd $model
is_7b=`echo $model | grep '7b'`
is_1_8b=`echo $model | grep '1.8b'`
block_num=27
if [ $is_7b ]; then
block_num=31
fi
if [ $is_1_8b ]; then
block_num=24
fi
# download models
wget -c -nv https://github.com/wangzhaode/mnn-llm/releases/download/$model-mnn/tokenizer.txt
wget -c -nv https://github.com/wangzhaode/mnn-llm/releases/download/$model-mnn/embedding.mnn
Expand Down
2 changes: 1 addition & 1 deletion script/model_test.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ function model_test($model) {
Write-Output "test model : ${model}"
powershell .\script\download_model.ps1 ${model}
cd build
.\Release\cli_demo -m ..\${model}
.\Release\cli_demo ..\${model} prompt.txt
cd ..
}

Expand Down
1 change: 1 addition & 0 deletions script/model_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ model_test() {
}

test_all() {
model_test qwen-1.8b
model_test chatglm-6b
model_test chatglm2-6b
model_test chatglm3-6b
Expand Down
2 changes: 1 addition & 1 deletion src/llm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ void Llm::load(const std::string& model_dir) {
ScheduleConfig config;
BackendConfig cpuBackendConfig;
config.type = MNN_FORWARD_CPU;
config.type = MNN_FORWARD_OPENCL;
// config.type = MNN_FORWARD_OPENCL;
config.numThread = 4;
cpuBackendConfig.precision = BackendConfig::Precision_Low;
cpuBackendConfig.memory = BackendConfig::Memory_Low;
Expand Down

0 comments on commit 1dee908

Please sign in to comment.