TensorRT is a deep learning inference platform. It can be integrated with TensorFlow to accelerate inference such as speed up the inference time. There are 3 different ways of integration workflow.
- TF-TRT Workflow with A SavedModel
- TF-TRT Workflow with A Frozen Graph
- TF-TRT Workflow with MetaGraph and Checkpoint Files
- Workflow: https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/
- Guide: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html
-
TensorFlow (TF): a Python library used in production for deep learning models.
- TensorFlow Tensor: Tensor represent all the data in any type and dimensions. The flow of the Tensors refer to the computational graph.
- TensorFlow Graph: Graph show map of the tensor. Graph consists of edges and nodes. Each node called “operation”. There are input nodes, middle level nodes (nodes between input and output nodes), and output nodes. Each node can have either an input or output data(tensor). Input data can be “variables” or “constant”.
- TensorFlow Session: Session is a place where the graph is executed. Technically, session place on hardware such as CPUS or GPUs and provide function for execution.
-
Keras: high-level neural networks Python library that built on TensorFlow which is more user-friendly and easy to use but less advanced operations as compared to TensorFlow.
-
TensorBoard: is a suite of web application which used to inspect and understand TensorFlow runs and graph. To use the TensorBoard, run command
tensorboard --logdir=/path/to/logs/file/
For example, this image show command of TensorBoard. Each orange sentence is each graph that found in a directory. Link of the TensorBoard web application is localhost:6006/ or link in the first line of the command’s result. You can find the input or output nodes name by considering the detail of the selected node on the top-right of the TensorBoard web application. The input nodes don’t have any input tensors as they're the input themselves. In the same way, the output nodes don’t have any output tensors as they're the output themselves.
This project is divided into 2 path.
The objective of this path is optimizing the Keras model with TensorRT and comparison time, accuracy, other metrics between the original Keras model and optimized model. The selected integration workflow is a workflow with a frozen graph. It's concluded into 3 steps.
In this case, I converted trained Keras Magma model. There is a function from TensorFlow named tf.graph_util.convert_variables_to_constants
which use to freeze the model. This function can freeze both Keras and TensorFlow model and return a frozen model. This image is a new function using to freeze model. Ref
This function require 3 arguments.
- session: Active TensorFlow session
- input_graph_def: TensorFlow GraphDef which is loaded from .pb file
- output_node_names: List of all output nodes name in the graph.
- variable_names_whitelist: (Optional) The set of variable names to convert. By default, all variables are converted or it value is None.
- variable_names_blacklist: (Optional) The set of variable names to omit converting to constants. The default is None.
That means both of the Keras and TensorFlow models can be frozen in the same way. You can use gflie
library to save a frozen model as a .pb file that allows you to load the model in several times. So now, you have a frozen Keras model.
This step use create_inference_graph
function to optimize frozen model with TensorRT. The function return TensorRT model (graph). Finally, save the a TensorRT model as .pb file.
The arguments of create_inference_graph function are
- input_graph_def: input a frozen graph which is returned by
convert_variables_to_constants
function. - outputs: List of all output nodes name in the graph. The easiest way to find the list of output nodes is using TensorBoard.
- max_batch_size: the max size for the input batch. That means "How many images you can inference at the same time". The default value is 1.
- max_work_space: The maximum GPU temporary memory which the TensorRT engine can use for execution. The default value is 1GB or 1*(10**9).
- precision_mode: It is a data type that the optimized model can have graph and parameters stored in. The available modes are "FP32"(float32),"FP16","INT8". The default value is "FP32".
Step 1 and 2 are in 1-main-Converting.ipynb
The inference is the stage in which a trained model is used to infer/predict the testing samples. It similar forward pass as training to predict the values. 2-InferenceTRT.ipynb is a inferencing code file. The steps are
- Load frozen model (.pb file) which has already optimized with TensorRT
- Import the loaded model using import_graph_def
- Get input and output tensors
- Write logs for TensorBoard (optional)
- Inference using function run of TensorFlow Session.
In this case, I inference 50 times to find the average inference time. You can repeat all these steps with the original frozen model (the frozen model without optimizing with TensorRT) for comparison.
- Time: Inferencing by the optimized model take less time than the original model. But if a number test predictions images is low such as 30 images, unoptimized model sometimes might be faster or equal.
- Prediction Result: Both predicted the same so their accuracy are the same.
- Others metrics such as Confusion matrix, Recall score, Precision score, f1-score, ROC curve and AUC (area under curve): Both are the same.
- Workflow: https://www.youtube.com/watch?v=AIGOSz2tFP8&list=PLkRkKTC6HZMwdtzv3PYJanRtR6ilSCZ4f
- Converting Code Guide: https://medium.com/@pipidog/how-to-convert-your-keras-models-to-tensorflow-e471400b886a
- Inference Code: https://github.com/ardianumam/Tensorflow-TensorRT.git
- Metrics: https://towardsdatascience.com/understanding-data-science-classification-metrics-in-scikit-learn-in-python-3bc336865019
The objectives of this path are optimizing the object detection models with TensorRT and comparison between the original model and optimized model. The working file is InferenceWithTensorRT.ipynb. The workflow is almost the same as optimizing Keras model. The main differences are
- Models for object detection are more diversity and complexity.
- A Dataset is from car camera.
- Inferencing function can receive only 1 image so there must be a loop to serve and queue images to the inferencing function.
object_detection/
data/
-> contains label map of modelslogs/
-> logs created while load original model, which used by TensorBoardtrt_logs/
-> logs created while load optimized model, which used by TensorBoardutils/
-> python helper code such as visualizationtest_images2/
-> contain 2 images for detection testingmodels/
ssd_mobilenet_v1_coco_2017_11_17/
- frozen_inference_graph.pb
faster_rcnn_resnet101_kitti_2018_01_28/
- frozen_inference_graph.pb
aadc2018_frcnn_res101_200k_kitti/
-> given model, dataset, and labelstest_images_20181027
-> contains test images which is images from the car camera.- final_config.json -> list of all objects of all images should be detected
detected_images/
-> contains images after detection- images_detail.json -> list of detail infereced images include detected objects (annotations from final_config.json), filename, path, inference time, and detected numbers
- aadc2018_frcnn_res101_200k_kitti.pb
- aadc_labels_2018.pbtxt
- aadc_labels_2018_without_middlelane.pbtxt
- aadc_labels_2018_slim.pbtxt
InferenceWithTensorRT.ipynb
-> main notebook file
There are 3 models are considered.
ssd_mobilenet_v1_coco_2017_11_17
is a Single-Shot multi-box Detection (SSD) network intended to perform object detection. This model is a default model in tutorial. So, It's a starting model. You can download model here.faster_rcnn_resnet101_kitti_2018_01_28/
is a network for object detection. It has use cases in self-driving cars, manufacturing, security. You can download model here.aadc2018_frcnn_res101_200k_kitti/
is a given model (from professor). It is similar to faster_rcnn_kitti model but more suitable with the dataset.
ssd_mobilenet_v1_coco_2017_11_17
can inference bothtest_images2
andtest_images_20181027
because it use less memory for inferencing. But this network is not suitable to detect objects in images from car camera. So, the accuracy is too low.faster_rcnn_resnet101_kitti_2018_01_28/
cannot inference because Jetson tx2 board has only 8GB memory and inferencing take a lot of memory. So, the inferencing process was automatically killed.aadc2018_frcnn_res101_200k_kitti/
is a newer version of TensorFlow so it cannot be loaded. (TensorFlow version in Jetson tx2 board is 1.9.0 which is an old version)- Because there is only
ssd_mobilenet_v1_coco_2017_11_17
which can inference, it is only the model that optimized with TensorRT. But, the optimized model cannot inference. The inferencing process was automatically killed because the process took too much memory. Although decreasingmax_batch_size to 1
and increasingmax_work_space to 5GB
, the inferencing still automatically killed. Note that: A large max_batch_size will make large consuming memory. And, when max_work_space is exceed, a process will be automatically killed.
Precision mode is one of arguments have to be set in optimizing TensorRT. There are 3 available values, "FP32", "FP16", and "INT8".
Here is the result after trying optimized ssd_mobilenet_v1_coco_2017_11_17
model with 3 different precision mode.
- FP32: take optimizing time around 9 minutes. Can load optimized model normally.
- FP16: take optimizing time around 12 minutes. Can load optimized model normally.
- INT8: take optimizing time around 16 seconds!! but cannot load the optimized model.
In conclusion, precision mode INT8 take the least time and FP16 take the most time. But the optimized model of precision mode INT8 cannot be loaded. It might be because the model need to be quantized before optimizing.
Note that, time taking of optimizing depend on free memory space. "Much free space, less time taking".
Code for checking the inferencing result has already added. The final answer (real value) is in the final_config.json
.
- In preparing images list for inferencing, not only collect images filename and path but also each annotations (objects in each image that should be detected) from final_config.json. The images list will look like this.
TEST_IMAGE_LIST = [{
'annotations' : [list of objects],
'filename': name string,
'path': image path
},
{
...
}]
visualize_boxes_and_labels_on_image_array
function from utils is used to wrtie box of all objects in each images. It return an image with boxes so I edited to also return box number, which is the number of detected object. Now, it's easy to check the correctness of detection by comparing box number with a length of annotations.- Add the box number and inference time into each image in the list and save it as
images_datail.json
. Theimages_datail.json
will look like this.
- https://github.com/tensorflow/models/tree/master/research/object_detection
- https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs this link for configuration of each model
Date: Friday 28th June 2019 All code for inferencing Keras Model with TensorRT is done but there are 2 bugs
- Cannot optimize the frozen model to TensorRT graph
- Cannot inferencing
Date: Wednesday 3rd July 2019 Seperated work file into 2 files Converting Keras to frozen graph and then optimize to .pb files Inferencing Found a curse of the inferencing error, It is a problem related to topK parameter in DetectionOutput layer. 5000 is too large that TensorRT crushed during runtime. After I reduce to below 2500 it runs fine. Reference: https://devtalk.nvidia.com/default/topic/1037616/tensorrt/problems-with-nvidia-ssddetectionoutputplugin/
Date: Thursday 4th July 2019 Bugs of Creating TensorRt inferencer is fixed. the solution is prevention create duplicate layers in a frozen graph.
Date: Friday 5th July 2019 Finished Calculating and comparison between using the original model and optimized model with TensorRT. Available measures are time and accuracy.
Date: Monday 8th July 2019 Fix the problems about the limit of images in TensorRT inferencing by increase the batch size while creation TensorRT frozen graph.
Date: Monday 15th July 2019 Finish showing graph from pb file on Jupyter notebook. And add sklearn accuracy score and f1-score
Date: Tuesday 16th July 2019 Add other metrics such as Confusion matrix, Recall score, Precision score, f1-score, ROC curve and AUC (area under curve).
Date: Thursday 18th July 2019 Add TensorRT Optimization of Object detection
Date: Friday 19th July 2019 Add faster_rcnn_resnet101_kitti_2018_01_28 model and its test images. But there is an error while prediction.
Date: Tuesday 23rd July 2019 Clean a version control such as reduce .git file by removing all large file from all commits (using command git filter-branch --tree-filter 'rm -f <path/to/file>' -- --all) and clean up unnecessary files (using command git gc --aggressive --prune=now), make sure a repository is correct and up to date.
Date: Wednesday 24th July 2019 Add aadc2018_frcnn_res101_200k_kitti model and its test images. But the model cannot import. It may be because the model is a newer version of TensorFlow than TensorFlow version on the Jetson board.
Date: Thursday 25th July 2019 Add code for checking annotations(found items) of each image, which is used for calculation accuracy. A cause of the problem while prediction is "Out of memory". A inference process uses too much memory so it is killed.
Date: Friday 26th July 2019 Trying different precision mode as FP32, FP16, and INT8 then compared and concluded the difference.
Date: Wednesday 31st July 2019 Try to solve the problem of cannot inferencing optimized object detection model by increase max_work_space and decrease max_batch_size. But it's still not working.
- https://github.com/the-house-of-black-and-white/hall-of-faces Area under the ROC curve show how good the model of detection. (mAP measurements) https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173
- YOLO Realtim: https://github.com/ardianumam/Tensorflow-TensorRT.git