Morted AI Model Server is a toy web server for deep learning models. Server tries its best to make the most usage of your cpu and gpu resources. All dl models are trained by tensorflow/pytorch
and deployed via MNN toolkit and supply web service through workflow framework finally.
Do not hesitate to let me know if you find bugs here cause I'm a c-with-struct noob 🙃
The three major components are illustrated on the architecture picture bellow.
A quick overview and examples for both serving and model benchmarking are provided below. Detailed documentation and examples will be provided in the docs folder.
You're welcomed to ask questions and help me to make it better!
All models and detectors can be downloaded from my Hugging Face Page.
Before proceeding further with this document, make sure you have the following prerequisites
1. Make sure you have CUDA&GPU&Driver rightly installed. You may refer to this to install them
2. Make sure you have MNN installed. For install instruction you may find some help here. MNN-2.7.0 release version was recommended.
3. Make sure you have WORKFLOW installed. For install instruction you may find some help here
4. Make sure you have OPENCV installed. For install instruction you may find some help here
5. Make sure your GCC tookit support cpp-17
6. Segment-Anything needs ONNXRUNTIME and TensorRT library. You may refer to this to install onnxruntime>=1.16.0 and this to install TensorRT-8.6.1.6
After all prerequisites are settled down you may start to build the mortred ai server frame work.
Step 1: Prepare 3rd-party Libraries
Copy MNN headers and libs
cp -r $MNN_ROOT_DIR/include/MNN ./3rd_party/include
cp $MNN_ROOT_DIR/build/libMNN.so ./3rd_party/libs
cp $MNN_ROOT_DIR/build/source/backend/cuda/libMNN_Cuda_Main.so ./3rd_party/libs
Copy WORKFLOW headers and libs
cp -r $WORKFLOW_ROOT_DIR/_include/workflow ./3rd_party/include
cp -r $WORKFLOW_ROOT_DIR/_lib/libworkflow.so* ./3rd_party/libs
Copy ONNXRUNTIME headers and libs
cp -r $ONNXRUNTIME_ROOT_DIR/include/* ./3rd_party/include/onnxruntime
cp -r $ONNXRUNTIME_ROOT_DIR/_lib/libonnxruntime*.so* ./3rd_party/libs
Copy TensorRT headers and libs
cp -r $TENSORRT_ROOT_DIR/include/* ./3rd_party/include/TensorRT-8.6.1.6
cp -r $TENSORRT_ROOT_DIR/_lib/libnvinfer.so* ./3rd_party/libs
cp -r $TENSORRT_ROOT_DIR/_lib/libnvinfer_builder_resource.so.8.6.1 ./3rd_party/libs
cp -r $TENSORRT_ROOT_DIR/_lib/libnvinfer_plugin.so* ./3rd_party/libs
cp -r $TENSORRT_ROOT_DIR/_lib/libnvonnxparser.so* ./3rd_party/libs
Step 2: Build Mortred AI Server ☕☕☕
mkdir build && cd build
cmake ..
make -j10
Step 3: Download Pre-Built Models 🍵🍵🍵
Download pre-built image models via BaiduNetDisk and extract code is 1y98
. Create a directory named weights
in $PROJECT_ROOT_DIR and unzip the downloaded models in it. The weights directory structure should looks like
Step 4: Test MobileNetv2 Benchmark Tool
The benchmark and server apps will be built in $PROJECT_ROOT_DIR/_bin and libs will be built in $PROJECT_ROOT_DIR/_lib. Benchmark the mobilenetv2 classification model
cd $PROJECT_ROOT_DIR/_bin
./mobilenetv2_benchmark.out ../conf/model/classification/mobilenetv2/mobilenetv2_config.ini
You should see the mobilenetv2 model benchmark profile as follows:
Step 5: Run MobileNetV2 Server Locally
The detailed description about web server configuration will be found at Web Server Configuration. Now start serving the model
cd $PROJECT_ROOT_DIR/_bin
./mobilenetv2_classification_server.out ../conf/server/classification/mobilenetv2/mobilenetv2_server_config.ini
Model service will be start at http://localhost:8091
with 4 workers waiting to serve. A demo python client was supplied to test the service
cd $PROJECT_ROOT_DIR/scripts
export PYTHONPATH=$PWD:$PYTHONPATH
python server/test_server.py --server mobilenetv2 --mode single
The client will repeatly post demo images 1000 times. Server output should be like Client output should be like
For more server demo you may find them in Torturials 👇👇👇
The benchmark test environment is as follows:
OS: Ubuntu 20.04.5 LTS / 5.15.0-87-generic
MEMORY: 32G DIMM DDR4 Synchronous 2666 MHz
CPU: Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
GPU: GeForce RTX 3080
CUDA: CUDA Version: 11.5
GPU Driver: Driver Version: 495.29.05
All models loop several times to avoid the influence of gpu's warmup and only model's inference time has been counted.
- Image Classification Model Server Tutorials
- Image Segmentation Model Server Tutorials
- Image Object Detection Model Server Tutorials
- Image Enhancement Model Server Tutorials
- Image Feature Point Model Server Tutorials
- Add more model into model zoo
mortred_model_server refers to the following projects: