diff --git a/aws/sagemaker/notebook/README.md b/aws/sagemaker/notebook/README.md index 7dc7f4f3..48469a0c 100644 --- a/aws/sagemaker/notebook/README.md +++ b/aws/sagemaker/notebook/README.md @@ -18,7 +18,7 @@ Wait for approximately 20 second and refresh the JupyterLab. You will see the Ja Now, you can try with different Notebooks provided by DJL: -- [DJL Jupyter Notebooks](https://github.com/deepjavalibrary/djl/tree/master/jupyter) +- [DJL Jupyter Notebooks](http://docs.djl.ai/docs/demos/jupyter/index.html) - [Dive into Deep Learning (DJL edition)](https://github.com/deepjavalibrary/d2l-java) ## Setup Scala diff --git a/jupyter/BERTQA.ipynb b/jupyter/BERTQA.ipynb new file mode 100644 index 00000000..472ee54f --- /dev/null +++ b/jupyter/BERTQA.ipynb @@ -0,0 +1,214 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# DJL BERT Inference Demo\n", + "\n", + "## Introduction\n", + "\n", + "In this tutorial, you walk through running inference using DJL on a [BERT](https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270) QA model trained with MXNet and PyTorch. \n", + "You can provide a question and a paragraph containing the answer to the model. The model is then able to find the best answer from the answer paragraph.\n", + "\n", + "Example:\n", + "```text\n", + "Q: When did BBC Japan start broadcasting?\n", + "```\n", + "\n", + "Answer paragraph:\n", + "```text\n", + "BBC Japan was a general entertainment channel, which operated between December 2004 and April 2006.\n", + "It ceased operations after its Japanese distributor folded.\n", + "```\n", + "And it picked the right answer:\n", + "```text\n", + "A: December 2004\n", + "```\n", + "\n", + "One of the most powerful features of DJL is that it's engine agnostic. Because of this, you can run different backend engines seamlessly. We showcase BERT QA first with an MXNet pre-trained model, then with a PyTorch model." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparation\n", + "\n", + "This tutorial requires the installation of Java Kernel. To install the Java Kernel, see the [README](http://docs.djl.ai/docs/demos/jupyter/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-engine:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-model-zoo:0.24.0\n", + "%maven ai.djl.pytorch:pytorch-engine:0.24.0\n", + "%maven ai.djl.pytorch:pytorch-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Import java packages by running the following:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.*;\n", + "import ai.djl.engine.*;\n", + "import ai.djl.modality.nlp.qa.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.training.util.*;\n", + "import ai.djl.inference.*;\n", + "import ai.djl.repository.zoo.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that all of the prerequisites are complete, start writing code to run inference with this example.\n", + "\n", + "\n", + "## Load the model and input\n", + "\n", + "**First, load the input**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var question = \"When did BBC Japan start broadcasting?\";\n", + "var resourceDocument = \"BBC Japan was a general entertainment Channel.\\n\" +\n", + " \"Which operated between December 2004 and April 2006.\\n\" +\n", + " \"It ceased operations after its Japanese distributor folded.\";\n", + "\n", + "QAInput input = new QAInput(question, resourceDocument);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then load the model and vocabulary. Create a variable `model` by using the `ModelZoo` as shown in the following code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Criteria criteria = Criteria.builder()\n", + " .optApplication(Application.NLP.QUESTION_ANSWER)\n", + " .setTypes(QAInput.class, String.class)\n", + " .optEngine(\"MXNet\") // For DJL to use MXNet engine\n", + " .optProgress(new ProgressBar()).build();\n", + "ZooModel model = criteria.loadModel();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Run inference\n", + "Once the model is loaded, you can call `Predictor` and run inference as follows" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Predictor predictor = model.newPredictor();\n", + "String answer = predictor.predict(input);\n", + "answer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Running inference on DJL is that easy. Now, let's try the PyTorch engine by specifying PyTorch engine in Criteria.optEngine(\"PyTorch\"). Let's rerun the inference code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var question = \"When did BBC Japan start broadcasting?\";\n", + "var resourceDocument = \"BBC Japan was a general entertainment Channel.\\n\" +\n", + " \"Which operated between December 2004 and April 2006.\\n\" +\n", + " \"It ceased operations after its Japanese distributor folded.\";\n", + "\n", + "QAInput input = new QAInput(question, resourceDocument);\n", + "\n", + "Criteria criteria = Criteria.builder()\n", + " .optApplication(Application.NLP.QUESTION_ANSWER)\n", + " .setTypes(QAInput.class, String.class)\n", + " .optFilter(\"modelType\", \"distilbert\")\n", + " .optEngine(\"PyTorch\") // Use PyTorch engine\n", + " .optProgress(new ProgressBar()).build();\n", + "ZooModel model = criteria.loadModel();\n", + "Predictor predictor = model.newPredictor();\n", + "String answer = predictor.predict(input);\n", + "answer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "Suprisingly, there are no differences between the PyTorch code snippet and MXNet code snippet. \n", + "This is power of DJL. We define a unified API where you can switch to different backend engines on the fly.\n", + "Next chapter: Inference with your own BERT: [MXNet](mxnet/load_your_own_mxnet_bert.ipynb) [PyTorch](pytorch/load_your_own_pytorch_bert.ipynb)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + }, + "pycharm": { + "stem_cell": { + "cell_type": "raw", + "metadata": { + "collapsed": false + }, + "source": [] + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/Dockerfile b/jupyter/Dockerfile new file mode 100644 index 00000000..9c79ec3e --- /dev/null +++ b/jupyter/Dockerfile @@ -0,0 +1,24 @@ +FROM ubuntu:18.04 + +RUN apt-get update || true +RUN apt-get install -y openjdk-11-jdk-headless +RUN apt-get install -y python3-pip git +RUN pip3 install jupyter +RUN apt-get update \ + && DEBIAN_FRONTEND=noninteractive apt-get install -y locales \ + && sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen \ + && dpkg-reconfigure --frontend=noninteractive locales \ + && update-locale LANG=en_US.UTF-8 +RUN apt-get install -y curl + +RUN git clone https://github.com/frankfliu/IJava.git +RUN cd IJava/ && ./gradlew installKernel && cd .. && rm -rf IJava/ +RUN rm -rf ~/.gradle + +WORKDIR /home/jupyter + +ENV LANG en_US.UTF-8 +ENV LC_ALL en_US.UTF-8 + +EXPOSE 8888 +ENTRYPOINT ["jupyter", "notebook", "--ip=0.0.0.0", "--no-browser", "--allow-root", "--NotebookApp.token=''", "--NotebookApp.password=''"] diff --git a/jupyter/README.md b/jupyter/README.md new file mode 100644 index 00000000..38308653 --- /dev/null +++ b/jupyter/README.md @@ -0,0 +1,83 @@ +# DJL - Jupyter notebooks + +## Overview + +This folder contains tutorials that illustrate how to accomplish basic AI tasks with Deep Java Library (DJL). + +## [Beginner Tutorial](tutorial/README.md) + +## More Tutorial Notebooks + +- [Run object detection with model zoo](object_detection_with_model_zoo.ipynb) +- [Load pre-trained PyTorch model](load_pytorch_model.ipynb) +- [Load pre-trained Apache MXNet model](load_mxnet_model.ipynb) +- [Transfer learning example](transfer_learning_on_cifar10.ipynb) +- [Question answering example](BERTQA.ipynb) + +You can run our notebook online: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/deepjavalibrary/djl/master?filepath=jupyter) + +## Setup + +### JDK 11 (not jre) + +JDK 11 (or above are required) to run the examples provided in this folder. + +to confirm the java path is configured properly: + +```bash +java --list-modules | grep "jdk.jshell" + +> jdk.jshell@12.0.1 +``` + +### Install jupyter notebook on python3 + +```bash +pip3 install jupyter +``` + +### Install IJava kernel for jupyter + +```bash +git clone https://github.com/frankfliu/IJava.git +cd IJava/ +./gradlew installKernel +``` + +## Start jupyter notebook + +```bash +jupyter notebook +``` + +## Docker setup + +You may want to use docker for simple installation or you are using Windows. + +### Run docker image + +```sh +cd jupyter +docker run -itd -p 127.0.0.1:8888:8888 -v $PWD:/home/jupyter deepjavalibrary/jupyter +``` + +You can open the `http://localhost:8888` to see the hosted instance on docker. + +### Build docker image by yourself + +You can read [Dockerfile](https://docs.djl.ai/docs/demos/jupyter/Dockerfile) for detail. To build docker image: + +```sh +cd jupyter +docker build -t deepjavalibrary/jupyter . +``` + +### Run docker compose + +```sh +cd jupyter +docker-compose build +docker-compose up -d +``` + +You can open the `http://localhost:8888` to see the hosted instance on docker compose. diff --git a/jupyter/docker-compose.yml b/jupyter/docker-compose.yml new file mode 100644 index 00000000..e8e4d2f8 --- /dev/null +++ b/jupyter/docker-compose.yml @@ -0,0 +1,12 @@ +version: "2.4" +services: + deepjavalibrary_container: + build: + context: . + dockerfile: Dockerfile + ports: + - 8888:8888 + volumes: + - ./:/home/jupyter + restart: always + diff --git a/jupyter/load_mxnet_model.ipynb b/jupyter/load_mxnet_model.ipynb new file mode 100644 index 00000000..f9e4d79f --- /dev/null +++ b/jupyter/load_mxnet_model.ipynb @@ -0,0 +1,190 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Load MXNet model\n", + "\n", + "In this tutorial, you learn how to load an existing MXNet model and use it to run a prediction task.\n", + "\n", + "\n", + "## Preparation\n", + "\n", + "This tutorial requires the installation of Java Kernel. For more information on installing the Java Kernel, see the [README](https://docs.djl.ai/docs/demos/jupyter/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl:model-zoo:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-engine:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import java.awt.image.*;\n", + "import java.nio.file.*;\n", + "import ai.djl.*;\n", + "import ai.djl.inference.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.modality.cv.*;\n", + "import ai.djl.modality.cv.util.*;\n", + "import ai.djl.modality.cv.transform.*;\n", + "import ai.djl.modality.cv.translator.*;\n", + "import ai.djl.translate.*;\n", + "import ai.djl.training.util.*;\n", + "import ai.djl.util.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Prepare your MXNet model\n", + "\n", + "This tutorial assumes that you have a MXNet model trained using Python. A MXNet symbolic model usually contains the following files:\n", + "* Symbol file: {MODEL_NAME}-symbol.json - a json file that contains network information about the model\n", + "* Parameters file: {MODEL_NAME}-{EPOCH}.params - a binary file that stores the parameter weight and bias\n", + "* Synset file: synset.txt - an optional text file that stores classification classes labels\n", + "\n", + "This tutorial uses a pre-trained MXNet `resnet18_v1` model." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We use `DownloadUtils` for downloading files from internet." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DownloadUtils.download(\"https://mlrepo.djl.ai/model/cv/image_classification/ai/djl/mxnet/resnet/0.0.1/resnet18_v1-symbol.json\", \"build/resnet/resnet18_v1-symbol.json\", new ProgressBar());\n", + "DownloadUtils.download(\"https://mlrepo.djl.ai/model/cv/image_classification/ai/djl/mxnet/resnet/0.0.1/resnet18_v1-0000.params.gz\", \"build/resnet/resnet18_v1-0000.params\", new ProgressBar());\n", + "DownloadUtils.download(\"https://mlrepo.djl.ai/model/cv/image_classification/ai/djl/mxnet/synset.txt\", \"build/resnet/synset.txt\", new ProgressBar());\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Load your model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Path modelDir = Paths.get(\"build/resnet\");\n", + "Model model = Model.newInstance(\"resnet\");\n", + "model.load(modelDir, \"resnet18_v1\");" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Create a `Translator`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Pipeline pipeline = new Pipeline();\n", + "pipeline.add(new CenterCrop()).add(new Resize(224, 224)).add(new ToTensor());\n", + "Translator translator = ImageClassificationTranslator.builder()\n", + " .setPipeline(pipeline)\n", + " .optSynsetArtifactName(\"synset.txt\")\n", + " .optApplySoftmax(true)\n", + " .build();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4: Load image for classification" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var img = ImageFactory.getInstance().fromUrl(\"https://resources.djl.ai/images/kitten.jpg\");\n", + "img.getWrappedImage()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 5: Run inference" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Predictor predictor = model.newPredictor(translator);\n", + "Classifications classifications = predictor.predict(img);\n", + "\n", + "classifications" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "Now, you can load any MXNet symbolic model and run inference.\n", + "\n", + "You might also want to check out [load_pytorch_model](https://docs.djl.ai/docs/demos/jupyter/load_pytorch_model.html) which demonstrates loading a local model using the ModelZoo API." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/load_pytorch_model.ipynb b/jupyter/load_pytorch_model.ipynb new file mode 100644 index 00000000..64968a0e --- /dev/null +++ b/jupyter/load_pytorch_model.ipynb @@ -0,0 +1,232 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "# Load PyTorch model\n", + "\n", + "In this tutorial, you learn how to load an existing PyTorch model and use it to run a prediction task.\n", + "\n", + "We will run the inference in DJL way with [example](https://pytorch.org/hub/pytorch_vision_resnet/) on the pytorch official website.\n", + "\n", + "\n", + "## Preparation\n", + "\n", + "This tutorial requires the installation of Java Kernel. For more information on installing the Java Kernel, see the [README](https://docs.djl.ai/docs/demos/jupyter/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.pytorch:pytorch-engine:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import java.nio.file.*;\n", + "import java.awt.image.*;\n", + "import ai.djl.*;\n", + "import ai.djl.inference.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.modality.cv.*;\n", + "import ai.djl.modality.cv.util.*;\n", + "import ai.djl.modality.cv.transform.*;\n", + "import ai.djl.modality.cv.translator.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.translate.*;\n", + "import ai.djl.training.util.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Prepare your model\n", + "\n", + "This tutorial assumes that you have a TorchScript model.\n", + "DJL only supports the TorchScript format for loading models from PyTorch, so other models will need to be [converted](https://github.com/deepjavalibrary/djl/blob/master/docs/pytorch/how_to_convert_your_model_to_torchscript.md).\n", + "A TorchScript model includes the model structure and all of the parameters.\n", + "\n", + "We will be using a pre-trained `resnet18` model. First, use the `DownloadUtils` to download the model files and save them in the `build/pytorch_models` folder" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DownloadUtils.download(\"https://djl-ai.s3.amazonaws.com/mlrepo/model/cv/image_classification/ai/djl/pytorch/resnet/0.0.1/traced_resnet18.pt.gz\", \"build/pytorch_models/resnet18/resnet18.pt\", new ProgressBar());" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to do image classification, you will also need the synset.txt which stores the classification class labels. We will need the synset containing the Imagenet labels with which resnet18 was originally trained." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DownloadUtils.download(\"https://djl-ai.s3.amazonaws.com/mlrepo/model/cv/image_classification/ai/djl/pytorch/synset.txt\", \"build/pytorch_models/resnet18/synset.txt\", new ProgressBar());" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Create a Translator\n", + "\n", + "We will create a transformation pipeline which maps the transforms shown in the [PyTorch example](https://pytorch.org/hub/pytorch_vision_resnet/).\n", + "```python\n", + "...\n", + "preprocess = transforms.Compose([\n", + " transforms.Resize(256),\n", + " transforms.CenterCrop(224),\n", + " transforms.ToTensor(),\n", + " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n", + "])\n", + "...\n", + "```\n", + "\n", + "Then, we will use this pipeline to create the [`Translator`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/translate/Translator.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Translator translator = ImageClassificationTranslator.builder()\n", + " .addTransform(new Resize(256))\n", + " .addTransform(new CenterCrop(224, 224))\n", + " .addTransform(new ToTensor())\n", + " .addTransform(new Normalize(\n", + " new float[] {0.485f, 0.456f, 0.406f},\n", + " new float[] {0.229f, 0.224f, 0.225f}))\n", + " .optApplySoftmax(true)\n", + " .build();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Load your model\n", + "\n", + "Next, we add some search criteria to find the resnet18 model and load it. In this case, we need to tell `Criteria` where to locate the model by calling `.optModelPath()` API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Criteria criteria = Criteria.builder()\n", + " .setTypes(Image.class, Classifications.class)\n", + " .optModelPath(Paths.get(\"build/pytorch_models/resnet18\"))\n", + " .optOption(\"mapLocation\", \"true\") // this model requires mapLocation for GPU\n", + " .optTranslator(translator)\n", + " .optProgress(new ProgressBar()).build();\n", + "\n", + "ZooModel model = criteria.loadModel();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4: Load image for classification\n", + "\n", + "We will use a sample dog image to run our prediction on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var img = ImageFactory.getInstance().fromUrl(\"https://raw.githubusercontent.com/pytorch/hub/master/images/dog.jpg\");\n", + "img.getWrappedImage()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 5: Run inference\n", + "\n", + "Lastly, we will need to create a predictor using our model and translator. Once we have a predictor, we simply need to call the predict method on our test image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Predictor predictor = model.newPredictor();\n", + "Classifications classifications = predictor.predict(img);\n", + "\n", + "classifications" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "Now, you can load any TorchScript model and run inference using it.\n", + "\n", + "You might also want to check out [load_mxnet_model](https://docs.djl.ai/docs/demos/jupyter/load_mxnet_model.html) which demonstrates loading a local model directly instead of through the Model Zoo API." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + }, + "pycharm": { + "stem_cell": { + "cell_type": "raw", + "metadata": { + "collapsed": false + }, + "source": [] + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/mxnet/load_your_own_mxnet_bert.ipynb b/jupyter/mxnet/load_your_own_mxnet_bert.ipynb new file mode 100644 index 00000000..7d5e1fba --- /dev/null +++ b/jupyter/mxnet/load_your_own_mxnet_bert.ipynb @@ -0,0 +1,485 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Load your own MXNet BERT model\n", + "\n", + "In the previous [example](../BERTQA.ipynb), you run BERT inference with the model from Model Zoo. You can also load the model on your own pre-trained BERT and use custom classes as the input and output.\n", + "\n", + "In general, the MXNet BERT model requires these three inputs:\n", + "\n", + "- word indices: The index of each word in a sentence\n", + "- word types: The type index of the word.\n", + "- valid length: The actual length of the question and resource document tokens\n", + "\n", + "We will dive deep into these details later." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparation\n", + "\n", + "This tutorial requires the installation of Java Kernel. To install the Java Kernel, see the [README](https://docs.djl.ai/docs/demos/jupyter/index.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are dependencies we will use." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-engine:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Import java packages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import java.io.*;\n", + "import java.nio.file.*;\n", + "import java.util.*;\n", + "import java.util.stream.*;\n", + "\n", + "import ai.djl.*;\n", + "import ai.djl.util.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.ndarray.types.*;\n", + "import ai.djl.inference.*;\n", + "import ai.djl.translate.*;\n", + "import ai.djl.training.util.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.modality.nlp.*;\n", + "import ai.djl.modality.nlp.qa.*;\n", + "import ai.djl.mxnet.zoo.nlp.qa.*;\n", + "import ai.djl.modality.nlp.bert.*;\n", + "\n", + "import com.google.gson.annotations.SerializedName;\n", + "import java.nio.charset.StandardCharsets;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Reuse the previous input**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var question = \"When did BBC Japan start broadcasting?\";\n", + "var resourceDocument = \"BBC Japan was a general entertainment Channel.\\n\" +\n", + " \"Which operated between December 2004 and April 2006.\\n\" +\n", + " \"It ceased operations after its Japanese distributor folded.\";\n", + "\n", + "QAInput input = new QAInput(question, resourceDocument);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dive deep into Translator\n", + "\n", + "Inference in deep learning is the process of predicting the output for a given input based on a pre-defined model.\n", + "DJL abstracts away the whole process for ease of use. It can load the model, perform inference on the input, and provide\n", + "output. DJL also allows you to provide user-defined inputs. The workflow looks like the following:\n", + "\n", + "![https://github.com/deepjavalibrary/djl/blob/master/examples/docs/img/workFlow.png?raw=true](https://github.com/deepjavalibrary/djl/blob/master/examples/docs/img/workFlow.png?raw=true)\n", + "\n", + "The red block (\"Images\") in the workflow is the input that DJL expects from you. The green block (\"Images\n", + "bounding box\") is the output that you expect. Because DJL does not know which input to expect and which output format that you prefer, DJL provides the [`Translator`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/translate/Translator.html) interface so you can define your own\n", + "input and output.\n", + "\n", + "The `Translator` interface encompasses the two white blocks: Pre-processing and Post-processing. The pre-processing\n", + "component converts the user-defined input objects into an NDList, so that the [`Predictor`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html) in DJL can understand the\n", + "input and make its prediction. Similarly, the post-processing block receives an NDList as the output from the\n", + "`Predictor`. The post-processing block allows you to convert the output from the `Predictor` to the desired output\n", + "format." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Pre-processing\n", + "\n", + "Now, you need to convert the sentences into tokens. We provide a powerful tool [`BertTokenizer`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/modality/nlp/bert/BertTokenizer.html) that you can use to convert questions and answers into tokens, and batchify your sequence together. Once you have properly formatted tokens, you can use [`Vocabulary`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/modality/nlp/Vocabulary.html) to map your token to BERT index.\n", + "\n", + "The following code block demonstrates tokenizing the question and answer defined earlier into BERT-formatted tokens." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var tokenizer = new BertTokenizer();\n", + "List tokenQ = tokenizer.tokenize(question.toLowerCase());\n", + "List tokenA = tokenizer.tokenize(resourceDocument.toLowerCase());\n", + "\n", + "System.out.println(\"Question Token: \" + tokenQ);\n", + "System.out.println(\"Answer Token: \" + tokenA);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`BertTokenizer` can also help you batchify questions and resource documents together by calling `encode()`.\n", + "The output contains information that BERT ingests.\n", + "\n", + "- getTokens: It returns a list of strings, including the question, resource document and special word to let the model tell which part is the question and which part is the resource document. Because MXNet BERT was trained with a fixed sequence length, you see the `[PAD]` in the tokens as well.\n", + "- getTokenTypes: It returns a list of type indices of the word to indicate the location of the resource document. All Questions will be labelled with 0 and all resource documents will be labelled with 1.\n", + "\n", + " [Question tokens...DocResourceTokens...padding tokens] => [000000...11111....0000]\n", + " \n", + "\n", + "- getValidLength: It returns the actual length of the question and tokens, which are required by MXNet BERT.\n", + "- getAttentionMask: It returns the mask for the model to indicate which part should be paid attention to and which part is the padding. It is required by PyTorch BERT.\n", + "\n", + " [Question tokens...DocResourceTokens...padding tokens] => [111111...11111....0000]\n", + " \n", + "MXNet BERT was trained with fixed sequence length 384, so we need to pass that in when we encode the question and resource doc. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "BertToken token = tokenizer.encode(question.toLowerCase(), resourceDocument.toLowerCase(), 384);\n", + "System.out.println(\"Encoded tokens: \" + token.getTokens());\n", + "System.out.println(\"Encoded token type: \" + token.getTokenTypes());\n", + "System.out.println(\"Valid length: \" + token.getValidLength());" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Normally, words and sentences are represented as indices instead of tokens for training. \n", + "They typically work like a vector in a n-dimensional space. In this case, you need to map them into indices.\n", + "DJL provides `Vocabulary` to take care of you vocabulary mapping.\n", + "\n", + "Assume your vocab.json is of the following format\n", + "```\n", + "{'token_to_idx':{'\"slots\": 19832,...}, 'idx_to_token':[\"[UNK]\", \"[PAD]\", ...]}\n", + "```\n", + "We provide the `vocab.json` from our pre-trained BERT for demonstration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DownloadUtils.download(\"https://djl-ai.s3.amazonaws.com/mlrepo/model/nlp/question_answer/ai/djl/mxnet/bertqa/vocab.json\", \"build/mxnet/bertqa/vocab.json\", new ProgressBar());" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class VocabParser {\n", + " @SerializedName(\"idx_to_token\")\n", + " List idx2token;\n", + "\n", + " public static List parseToken(URL file) {\n", + " try (InputStream is = file.openStream();\n", + " Reader reader = new InputStreamReader(is, StandardCharsets.UTF_8)) {\n", + " return JsonUtils.GSON.fromJson(reader, VocabParser.class).idx2token;\n", + " } catch (IOException e) {\n", + " throw new IllegalArgumentException(\"Invalid url: \" + file, e);\n", + " }\n", + " }\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "URL url = Paths.get(\"build/mxnet/bertqa/vocab.json\").toUri().toURL();\n", + "var vocabulary = DefaultVocabulary.builder()\n", + " .optMinFrequency(1)\n", + " .addFromCustomizedFile(url, VocabParser::parseToken)\n", + " .optUnknownToken(\"[UNK]\")\n", + " .build();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can easily convert the token to the index using `vocabulary.getIndex(token)` and the other way around using `vocabulary.getToken(index)`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "long index = vocabulary.getIndex(\"car\");\n", + "String token = vocabulary.getToken(2482);\n", + "System.out.println(\"The index of the car is \" + index);\n", + "System.out.println(\"The token of the index 2482 is \" + token);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To properly convert them into `float[]` for `NDArray` creation, use the following helper function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "/**\n", + " * Convert a List of Number to float array.\n", + " *\n", + " * @param list the list to be converted\n", + " * @return float array\n", + " */\n", + "public static float[] toFloatArray(List list) {\n", + " float[] ret = new float[list.size()];\n", + " int idx = 0;\n", + " for (Number n : list) {\n", + " ret[idx++] = n.floatValue();\n", + " }\n", + " return ret;\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you have everything you need, you can create an NDList and populate all of the inputs you formatted earlier. You're done with pre-processing! \n", + "\n", + "#### Construct `Translator`\n", + "\n", + "You need to do this processing within an implementation of the `Translator` interface. `Translator` is designed to do pre-processing and post-processing. You must define the input and output objects. It contains the following two override classes:\n", + "- `public NDList processInput(TranslatorContext ctx, I)`\n", + "- `public String processOutput(TranslatorContext ctx, O)`\n", + "\n", + "Every translator takes in input and returns output in the form of generic objects. In this case, the translator takes input in the form of `QAInput` (I) and returns output as a `String` (O). `QAInput` is just an object that holds questions and answer; We have prepared the Input class for you." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Armed with the needed knowledge, you can write an implementation of the `Translator` interface. `BertTranslator` uses the code snippets explained previously to implement the `processInput`method. For more information, see [`NDManager`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/ndarray/NDManager.html).\n", + "\n", + "```\n", + "manager.create(Number[] data, Shape)\n", + "manager.create(Number[] data)\n", + "```\n", + "\n", + "The `Shape` for `data0` and `data1` is sequence_length. For `data2` the `Shape` is just 1." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "public class BertTranslator implements NoBatchifyTranslator {\n", + " private List tokens;\n", + " private Vocabulary vocabulary;\n", + " private BertTokenizer tokenizer;\n", + " \n", + " @Override\n", + " public void prepare(TranslatorContext ctx) throws IOException {\n", + " URL path = Paths.get(\"build/mxnet/bertqa/vocab.json\").toUri().toURL();\n", + " vocabulary =\n", + " DefaultVocabulary.builder()\n", + " .optMinFrequency(1)\n", + " .addFromCustomizedFile(path, VocabParser::parseToken)\n", + " .optUnknownToken(\"[UNK]\")\n", + " .build();\n", + " tokenizer = new BertTokenizer();\n", + " }\n", + " \n", + " @Override\n", + " public NDList processInput(TranslatorContext ctx, QAInput input) {\n", + " BertToken token =\n", + " tokenizer.encode(\n", + " input.getQuestion().toLowerCase(),\n", + " input.getParagraph().toLowerCase(),\n", + " 384);\n", + " // get the encoded tokens that would be used in precessOutput\n", + " tokens = token.getTokens();\n", + " // map the tokens(String) to indices(long)\n", + " List indices =\n", + " token.getTokens().stream().map(vocabulary::getIndex).collect(Collectors.toList());\n", + " float[] indexesFloat = toFloatArray(indices);\n", + " float[] types = toFloatArray(token.getTokenTypes());\n", + " int validLength = token.getValidLength();\n", + "\n", + " NDManager manager = ctx.getNDManager();\n", + " NDArray data0 = manager.create(indexesFloat);\n", + " data0.setName(\"data0\");\n", + " NDArray data1 = manager.create(types);\n", + " data1.setName(\"data1\");\n", + " NDArray data2 = manager.create(new float[] {validLength});\n", + " data2.setName(\"data2\");\n", + " return new NDList(data0, data1, data2);\n", + " }\n", + "\n", + " @Override\n", + " public String processOutput(TranslatorContext ctx, NDList list) {\n", + " NDArray array = list.singletonOrThrow();\n", + " NDList output = array.split(2, 2);\n", + " // Get the formatted logits result\n", + " NDArray startLogits = output.get(0).reshape(new Shape(1, -1));\n", + " NDArray endLogits = output.get(1).reshape(new Shape(1, -1));\n", + " int startIdx = (int) startLogits.argMax(1).getLong();\n", + " int endIdx = (int) endLogits.argMax(1).getLong();\n", + " return tokens.subList(startIdx, endIdx + 1).toString();\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Congrats! You have created your first Translator! We have pre-filled the `processOutput()` function to process the `NDList` and return it in a desired format. `processInput()` and `processOutput()` offer the flexibility to get the predictions from the model in any format you desire. \n", + "\n", + "With the Translator implemented, you need to bring up the predictor that uses your `Translator` to start making predictions. You can find the usage for `Predictor` in the [Predictor Javadoc](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html). Create a translator and use the `question` and `resourceDocument` provided previously." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DownloadUtils.download(\"https://djl-ai.s3.amazonaws.com/mlrepo/model/nlp/question_answer/ai/djl/mxnet/bertqa/0.0.1/static_bert_qa-symbol.json\", \"build/mxnet/bertqa/bertqa-symbol.json\", new ProgressBar());\n", + "DownloadUtils.download(\"https://djl-ai.s3.amazonaws.com/mlrepo/model/nlp/question_answer/ai/djl/mxnet/bertqa/0.0.1/static_bert_qa-0002.params.gz\", \"build/mxnet/bertqa/bertqa-0000.params\", new ProgressBar());" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "BertTranslator translator = new BertTranslator();\n", + "Criteria criteria = Criteria.builder()\n", + " .setTypes(QAInput.class, String.class)\n", + " .optModelPath(Paths.get(\"build/mxnet/bertqa/\")) // Search for models in the build/mxnet/bert folder\n", + " .optTranslator(translator)\n", + " .optProgress(new ProgressBar()).build();\n", + "\n", + "ZooModel model = criteria.loadModel();" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "String predictResult = null;\n", + "QAInput input = new QAInput(question, resourceDocument);\n", + "\n", + "// Create a Predictor and use it to predict the output\n", + "try (Predictor predictor = model.newPredictor(translator)) {\n", + " predictResult = predictor.predict(input);\n", + "}\n", + "\n", + "System.out.println(question);\n", + "System.out.println(predictResult);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Based on the input, the following result will be shown:\n", + "```\n", + "[december, 2004]\n", + "```\n", + "That's it! \n", + "\n", + "You can try with more questions and answers. Here are the samples:\n", + "\n", + "**Answer Material**\n", + "\n", + "The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (\"Norman\" comes from \"Norseman\") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.\n", + "\n", + "\n", + "**Question**\n", + "\n", + "Q: When were the Normans in Normandy?\n", + "A: 10th and 11th centuries\n", + "\n", + "Q: In what country is Normandy located?\n", + "A: france\n", + "\n", + "For the full source code,see the [DJL repo](https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/BertQaInference.java) and translator implementation [MXNet](https://github.com/deepjavalibrary/djl/blob/master/engines/mxnet/mxnet-model-zoo/src/main/java/ai/djl/mxnet/zoo/nlp/qa/MxBertQATranslator.java) [PyTorch](https://github.com/deepjavalibrary/djl/blob/master/engines/pytorch/pytorch-model-zoo/src/main/java/ai/djl/pytorch/zoo/nlp/qa/PtBertQATranslator.java)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/object_detection_with_model_zoo.ipynb b/jupyter/object_detection_with_model_zoo.ipynb new file mode 100644 index 00000000..3a998487 --- /dev/null +++ b/jupyter/object_detection_with_model_zoo.ipynb @@ -0,0 +1,159 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Object detection with model zoo model\n", + "\n", + "In this tutorial, you learn how to use a built-in model zoo model (SSD) to achieve an [object detection](https://en.wikipedia.org/wiki/Object_detection) task.\n", + "\n", + "## Preparation\n", + "\n", + "This tutorial requires the installation of Java Kernel. To install Java Kernel, see the [README](https://docs.djl.ai/docs/demos/jupyter/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-engine:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.modality.cv.*;\n", + "import ai.djl.modality.cv.output.*;\n", + "import ai.djl.modality.cv.util.*;\n", + "import ai.djl.mxnet.zoo.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.training.util.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Load image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var img = ImageFactory.getInstance().fromUrl(\"https://resources.djl.ai/images/dog_bike_car.jpg\");\n", + "img.getWrappedImage()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Load model zoo model\n", + "\n", + "In this example, you load a SSD (Single Shot MultiBox Detector) model from the MXNet model zoo.\n", + "For more information about model zoo, see the [Model Zoo Documentation](https://github.com/deepjavalibrary/djl/blob/master/docs/model-zoo.md) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var criteria = Criteria.builder()\n", + " .setTypes(Image.class, DetectedObjects.class)\n", + " .optArtifactId(\"ssd\")\n", + " .optProgress(new ProgressBar())\n", + " .build();\n", + "var model = criteria.loadModel();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Create Predictor and detect an object in the image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var detections = model.newPredictor().predict(img);\n", + "\n", + "detections" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Check detected result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "img.drawBoundingBoxes(detections);\n", + "img.getWrappedImage()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "Using the model zoo model provided, you can run inference with just the following lines of code:\n", + "\n", + "```\n", + "var img = ImageFactory.getInstance().fromUrl(\"https://resources.djl.ai/images/dog_bike_car.jpg\");\n", + "var criteria = Criteria.builder()\n", + " .setTypes(Image.class, DetectedObjects.class)\n", + " .optArtifactId(\"ssd\")\n", + " .build();\n", + "var model = criteria.loadModel();\n", + "var detections = model.newPredictor().predict(img);\n", + "```\n", + "\n", + "You can find full SsdExample source code [here](https://github.com/deepjavalibrary/djl/blob/master/examples/docs/object_detection.md).\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/onnxruntime/machine_learning_with_ONNXRuntime.ipynb b/jupyter/onnxruntime/machine_learning_with_ONNXRuntime.ipynb new file mode 100644 index 00000000..13c9c160 --- /dev/null +++ b/jupyter/onnxruntime/machine_learning_with_ONNXRuntime.ipynb @@ -0,0 +1,224 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Classification on Iris dataset with sklearn and DJL\n", + "\n", + "In this notebook, you will try to use a pre-trained sklearn model to run on DJL for a general classification task. The model was trained with [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set).\n", + "\n", + "## Background \n", + "\n", + "### Iris Dataset\n", + "\n", + "The dataset contains a set of 150 records under five attributes - sepal length, sepal width, petal length, petal width and species.\n", + "\n", + "Iris setosa | Iris versicolor | Iris virginica\n", + ":-------------------------:|:-------------------------:|:-------------------------:\n", + "![](https://upload.wikimedia.org/wikipedia/commons/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg) | ![](https://upload.wikimedia.org/wikipedia/commons/4/41/Iris_versicolor_3.jpg) | ![](https://upload.wikimedia.org/wikipedia/commons/9/9f/Iris_virginica.jpg) \n", + "\n", + "The chart above shows three different kinds of the Iris flowers. \n", + "\n", + "We will use sepal length, sepal width, petal length, petal width as the feature and species as the label to train the model.\n", + "\n", + "### Sklearn Model\n", + "\n", + "You can find more information [here](http://onnx.ai/sklearn-onnx/). You can use the sklearn built-in iris dataset to load the data. Then we defined a [RandomForestClassifer](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) to train the model. After that, we convert the model to onnx format for DJL to run inference. The following code is a sample classification setup using sklearn:\n", + "\n", + "```python\n", + "# Train a model.\n", + "from sklearn.datasets import load_iris\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "iris = load_iris()\n", + "X, y = iris.data, iris.target\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y)\n", + "clr = RandomForestClassifier()\n", + "clr.fit(X_train, y_train)\n", + "```\n", + "\n", + "\n", + "## Preparation\n", + "\n", + "This tutorial requires the installation of Java Kernel. To install the Java Kernel, see the [README](https://docs.djl.ai/docs/demos/jupyter/index.html).\n", + "\n", + "These are dependencies we will use. To enhance the NDArray operation capability, we are importing ONNX Runtime and PyTorch Engine at the same time. Please find more information [here](https://github.com/deepjavalibrary/djl/blob/master/docs/hybrid_engine.md)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.onnxruntime:onnxruntime-engine:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.inference.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.ndarray.types.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.translate.*;\n", + "import java.util.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1 create a Translator\n", + "\n", + "Inference in machine learning is the process of predicting the output for a given input based on a pre-defined model.\n", + "DJL abstracts away the whole process for ease of use. It can load the model, perform inference on the input, and provide\n", + "output. DJL also allows you to provide user-defined inputs. The workflow looks like the following:\n", + "\n", + "![https://github.com/deepjavalibrary/djl/blob/master/examples/docs/img/workFlow.png?raw=true](https://github.com/deepjavalibrary/djl/blob/master/examples/docs/img/workFlow.png?raw=true)\n", + "\n", + "The [`Translator`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/translate/Translator.html) interface encompasses the two white blocks: Pre-processing and Post-processing. The pre-processing\n", + "component converts the user-defined input objects into an NDList, so that the [`Predictor`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html) in DJL can understand the\n", + "input and make its prediction. Similarly, the post-processing block receives an NDList as the output from the\n", + "`Predictor`. The post-processing block allows you to convert the output from the `Predictor` to the desired output\n", + "format.\n", + "\n", + "In our use case, we use a class namely `IrisFlower` as our input class type. We will use [`Classifications`](https://javadoc.io/doc/ai.djl/api/0.24.0/ai/djl/modality/Classifications.html) as our output class type." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "public static class IrisFlower {\n", + "\n", + " public float sepalLength;\n", + " public float sepalWidth;\n", + " public float petalLength;\n", + " public float petalWidth;\n", + "\n", + " public IrisFlower(float sepalLength, float sepalWidth, float petalLength, float petalWidth) {\n", + " this.sepalLength = sepalLength;\n", + " this.sepalWidth = sepalWidth;\n", + " this.petalLength = petalLength;\n", + " this.petalWidth = petalWidth;\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's create a translator" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "public static class MyTranslator implements NoBatchifyTranslator {\n", + "\n", + " private final List synset;\n", + "\n", + " public MyTranslator() {\n", + " // species name\n", + " synset = Arrays.asList(\"setosa\", \"versicolor\", \"virginica\");\n", + " }\n", + "\n", + " @Override\n", + " public NDList processInput(TranslatorContext ctx, IrisFlower input) {\n", + " float[] data = {input.sepalLength, input.sepalWidth, input.petalLength, input.petalWidth};\n", + " NDArray array = ctx.getNDManager().create(data, new Shape(1, 4));\n", + " return new NDList(array);\n", + " }\n", + "\n", + " @Override\n", + " public Classifications processOutput(TranslatorContext ctx, NDList list) {\n", + " float[] data = list.get(1).toFloatArray();\n", + " List probabilities = new ArrayList<>(data.length);\n", + " for (float f : data) {\n", + " probabilities.add((double) f);\n", + " }\n", + " return new Classifications(synset, probabilities);\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2 Prepare your model\n", + "\n", + "We will load a pretrained sklearn model into DJL. We defined a [`ModelZoo`](https://javadoc.io/doc/ai.djl/api/0.24.0/ai/djl/repository/zoo/ModelZoo.html) concept to allow user load model from varity of locations, such as remote URL, local files or DJL pretrained model zoo. We need to define [`Criteria`](https://javadoc.io/doc/ai.djl/api/0.24.0/ai/djl/repository/zoo/Criteria.html) class to help the modelzoo locate the model and attach translator. In this example, we download a compressed ONNX model from S3." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "String modelUrl = \"https://mlrepo.djl.ai/model/tabular/softmax_regression/ai/djl/onnxruntime/iris_flowers/0.0.1/iris_flowers.zip\";\n", + "Criteria criteria = Criteria.builder()\n", + " .setTypes(IrisFlower.class, Classifications.class)\n", + " .optModelUrls(modelUrl)\n", + " .optTranslator(new MyTranslator())\n", + " .optEngine(\"OnnxRuntime\") // use OnnxRuntime engine by default\n", + " .build();\n", + "ZooModel model = criteria.loadModel();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3 Run inference\n", + "\n", + "User will just need to create a `Predictor` from model to run the inference." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Predictor predictor = model.newPredictor();\n", + "IrisFlower info = new IrisFlower(1.0f, 2.0f, 3.0f, 4.0f);\n", + "predictor.predict(info);" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/jupyter/paddlepaddle/face_mask_detection_paddlepaddle.ipynb b/jupyter/paddlepaddle/face_mask_detection_paddlepaddle.ipynb new file mode 100644 index 00000000..f5176215 --- /dev/null +++ b/jupyter/paddlepaddle/face_mask_detection_paddlepaddle.ipynb @@ -0,0 +1,369 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Face Mask Detection using PaddlePaddle\n", + "\n", + "In this tutorial, we will be using pretrained PaddlePaddle model from [PaddleHub](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.5/demo/mask_detection/cpp) to do mask detection on the sample image. To complete this procedure, there are two steps needs to be done:\n", + "\n", + "- Recognize face on the image (no matter wearing mask or not) using Face object detection model\n", + "- classify the face is wearing mask or not\n", + "\n", + "These two steps will involve two paddle models. We will implement the corresponding preprocess and postprocess logic to it.\n", + "\n", + "## Import dependencies and classes\n", + "\n", + "PaddlePaddle is one of the Deep Engines that requires DJL hybrid mode to run inference. Itself does not contains NDArray operations and needs a supplemental DL framework to help with that. So we import Pytorch DL engine as well in here to do the processing works." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.paddlepaddle:paddlepaddle-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32\n", + "\n", + "// second engine to do preprocessing and postprocessing\n", + "%maven ai.djl.pytorch:pytorch-engine:0.24.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.*;\n", + "import ai.djl.inference.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.modality.cv.*;\n", + "import ai.djl.modality.cv.output.*;\n", + "import ai.djl.modality.cv.transform.*;\n", + "import ai.djl.modality.cv.translator.*;\n", + "import ai.djl.modality.cv.util.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.ndarray.types.Shape;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.translate.*;\n", + "\n", + "import java.io.*;\n", + "import java.nio.file.*;\n", + "import java.util.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Face Detection model\n", + "\n", + "Now we can start working on the first model. The model can do face detection and require some additional processing before we feed into it:\n", + "\n", + "- Resize: Shrink the image with a certain ratio to feed in\n", + "- Normalize the image with a scale\n", + "\n", + "Fortunatly, DJL offers a [`Translator`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/translate/Translator.html) interface that can help you with these processing. The rough Translator architecture looks like below:\n", + "\n", + "![](https://github.com/deepjavalibrary/djl/blob/master/examples/docs/img/workFlow.png?raw=true)\n", + "\n", + "In the following sections, we will implement a `FaceTranslator` class to do the work.\n", + "\n", + "### Preprocessing\n", + "\n", + "In this stage, we will load an image and do some preprocessing work to it. Let's load the image first and take a look at it:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "String url = \"https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.5/demo/mask_detection/python/images/mask.jpg\";\n", + "Image img = ImageFactory.getInstance().fromUrl(url);\n", + "img.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then, let's try to apply some transformation to it:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "NDList processImageInput(NDManager manager, Image input, float shrink) {\n", + " NDArray array = input.toNDArray(manager);\n", + " Shape shape = array.getShape();\n", + " array = NDImageUtils.resize(\n", + " array, (int) (shape.get(1) * shrink), (int) (shape.get(0) * shrink));\n", + " array = array.transpose(2, 0, 1).flip(0); // HWC -> CHW BGR -> RGB\n", + " NDArray mean = manager.create(new float[] {104f, 117f, 123f}, new Shape(3, 1, 1));\n", + " array = array.sub(mean).mul(0.007843f); // normalization\n", + " array = array.expandDims(0); // make batch dimension\n", + " return new NDList(array);\n", + "}\n", + "\n", + "processImageInput(NDManager.newBaseManager(), img, 0.5f);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see above, we convert the image to a NDArray with shape following (number_of_batches, channel (RGB), height, width). This is the required input for the model to run object detection.\n", + "\n", + "### Postprocessing\n", + "\n", + "For postprocessing, The output is in shape of (number_of_boxes, (class_id, probability, xmin, ymin, xmax, ymax)). We can store them into the prebuilt DJL [`DetectedObjects`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/modality/cv/output/DetectedObjects.html) classes for further processing. Let's assume we have an inference output of ((1, 0.99, 0.2, 0.4, 0.5, 0.8)) and try to draw this box out." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "DetectedObjects processImageOutput(NDList list, List className, float threshold) {\n", + " NDArray result = list.singletonOrThrow();\n", + " float[] probabilities = result.get(\":,1\").toFloatArray();\n", + " List names = new ArrayList<>();\n", + " List prob = new ArrayList<>();\n", + " List boxes = new ArrayList<>();\n", + " for (int i = 0; i < probabilities.length; i++) {\n", + " if (probabilities[i] >= threshold) {\n", + " float[] array = result.get(i).toFloatArray();\n", + " names.add(className.get((int) array[0]));\n", + " prob.add((double) probabilities[i]);\n", + " boxes.add(\n", + " new Rectangle(\n", + " array[2], array[3], array[4] - array[2], array[5] - array[3]));\n", + " }\n", + " }\n", + " return new DetectedObjects(names, prob, boxes);\n", + "}\n", + "\n", + "NDArray tempOutput = NDManager.newBaseManager().create(new float[]{1f, 0.99f, 0.1f, 0.1f, 0.2f, 0.2f}, new Shape(1, 6));\n", + "DetectedObjects testBox = processImageOutput(new NDList(tempOutput), Arrays.asList(\"Not Face\", \"Face\"), 0.7f);\n", + "Image newImage = img.duplicate();\n", + "newImage.drawBoundingBoxes(testBox);\n", + "newImage.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Translator and run inference\n", + "\n", + "After this step, you might understand how process and postprocess works in DJL. Now, let's do something real and put them together in a single piece:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class FaceTranslator implements NoBatchifyTranslator {\n", + "\n", + " private float shrink;\n", + " private float threshold;\n", + " private List className;\n", + "\n", + " FaceTranslator(float shrink, float threshold) {\n", + " this.shrink = shrink;\n", + " this.threshold = threshold;\n", + " className = Arrays.asList(\"Not Face\", \"Face\");\n", + " }\n", + "\n", + " @Override\n", + " public DetectedObjects processOutput(TranslatorContext ctx, NDList list) {\n", + " return processImageOutput(list, className, threshold);\n", + " }\n", + "\n", + " @Override\n", + " public NDList processInput(TranslatorContext ctx, Image input) {\n", + " return processImageInput(ctx.getNDManager(), input, shrink);\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To run inference with this model, we need to load the model from Paddle model zoo. To load a model in DJL, you need to specify a [`Criteria`](https://javadoc.io/doc/ai.djl/api/0.23.1/ai/djl/repository/zoo/Criteria.html). `Criteria` is used identify where to load the model and which `Translator` should apply to it. Then, all we need to do is to get a [`Predictor`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html) from the model and use it to do inference:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Criteria criteria = Criteria.builder()\n", + " .setTypes(Image.class, DetectedObjects.class)\n", + " .optModelUrls(\"djl://ai.djl.paddlepaddle/face_detection/0.0.1/mask_detection\")\n", + " .optFilter(\"flavor\", \"server\")\n", + " .optTranslator(new FaceTranslator(0.5f, 0.7f))\n", + " .build();\n", + " \n", + "var model = criteria.loadModel();\n", + "var predictor = model.newPredictor();\n", + "\n", + "DetectedObjects inferenceResult = predictor.predict(img);\n", + "newImage = img.duplicate();\n", + "newImage.drawBoundingBoxes(inferenceResult);\n", + "newImage.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see above, it brings you three faces detections.\n", + "\n", + "## Mask Classification model\n", + "\n", + "\n", + "So, once we have the image location ready, we can crop the image and feed it to the Mask Classification model for further processing.\n", + "\n", + "### Crop the image\n", + "\n", + "The output of the box location is a value from 0 - 1 that can be mapped to the actual box pixel location if we simply multiply by width/height. For better accuracy on the cropped image, we extend the detection box to square. Let's try to get a cropped image:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "int[] extendSquare(\n", + " double xmin, double ymin, double width, double height, double percentage) {\n", + " double centerx = xmin + width / 2;\n", + " double centery = ymin + height / 2;\n", + " double maxDist = Math.max(width / 2, height / 2) * (1 + percentage);\n", + " return new int[] {\n", + " (int) (centerx - maxDist), (int) (centery - maxDist), (int) (2 * maxDist)\n", + " };\n", + "}\n", + "\n", + "Image getSubImage(Image img, BoundingBox box) {\n", + " Rectangle rect = box.getBounds();\n", + " int width = img.getWidth();\n", + " int height = img.getHeight();\n", + " int[] squareBox =\n", + " extendSquare(\n", + " rect.getX() * width,\n", + " rect.getY() * height,\n", + " rect.getWidth() * width,\n", + " rect.getHeight() * height,\n", + " 0.18);\n", + " return img.getSubImage(squareBox[0], squareBox[1], squareBox[2], squareBox[2]);\n", + "}\n", + "\n", + "List faces = inferenceResult.items();\n", + "getSubImage(img, faces.get(2).getBoundingBox()).getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare Translator and load the model\n", + "\n", + "For the face classification model, we can use DJL prebuilt [`ImageClassificationTranslator`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/modality/cv/translator/ImageClassificationTranslator.html) with a few transformation. This Translator brings a basic image translation process and can be extended with additional standard processing steps. So in our case, we don't have to create another `Translator` and just leverage on this prebuilt one." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var criteria = Criteria.builder()\n", + " .setTypes(Image.class, Classifications.class)\n", + " .optModelUrls(\"djl://ai.djl.paddlepaddle/mask_classification/0.0.1/mask_classification\")\n", + " .optFilter(\"flavor\", \"server\")\n", + " .optTranslator(\n", + " ImageClassificationTranslator.builder()\n", + " .addTransform(new Resize(128, 128))\n", + " .addTransform(new ToTensor()) // HWC -> CHW div(255)\n", + " .addTransform(\n", + " new Normalize(\n", + " new float[] {0.5f, 0.5f, 0.5f},\n", + " new float[] {1.0f, 1.0f, 1.0f}))\n", + " .addTransform(nd -> nd.flip(0)) // RGB -> GBR\n", + " .build())\n", + " .build();\n", + "\n", + "var classifyModel = criteria.loadModel();\n", + "var classifier = classifyModel.newPredictor();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run inference\n", + "\n", + "So all we need to do is to apply the previous implemented functions and apply them all together. We firstly crop the image and then use it for inference. After these steps, we create a new DetectedObjects with new Classification classes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "List names = new ArrayList<>();\n", + "List prob = new ArrayList<>();\n", + "List rect = new ArrayList<>();\n", + "for (DetectedObjects.DetectedObject face : faces) {\n", + " Image subImg = getSubImage(img, face.getBoundingBox());\n", + " Classifications classifications = classifier.predict(subImg);\n", + " names.add(classifications.best().getClassName());\n", + " prob.add(face.getProbability());\n", + " rect.add(face.getBoundingBox());\n", + "}\n", + "\n", + "newImage = img.duplicate();\n", + "newImage.drawBoundingBoxes(new DetectedObjects(names, prob, rect));\n", + "newImage.getWrappedImage();" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/jupyter/paddlepaddle/face_mask_detection_paddlepaddle_zh.ipynb b/jupyter/paddlepaddle/face_mask_detection_paddlepaddle_zh.ipynb new file mode 100644 index 00000000..fb4074fe --- /dev/null +++ b/jupyter/paddlepaddle/face_mask_detection_paddlepaddle_zh.ipynb @@ -0,0 +1,352 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 用飛槳+ DJL 實作人臉口罩辨識\n", + "在這個教學中我們將會展示利用 PaddleHub 下載預訓練好的 PaddlePaddle 模型並針對範例照片做人臉口罩辨識。這個範例總共會分成兩個步驟:\n", + "\n", + "- 用臉部檢測模型識別圖片中的人臉(無論是否有戴口罩) \n", + "- 確認圖片中的臉是否有戴口罩\n", + "\n", + "這兩個步驟會包含使用兩個 Paddle 模型,我們會在接下來的內容介紹兩個模型對應需要做的前後處理邏輯\n", + "\n", + "## 導入相關環境依賴及子類別\n", + "在這個例子中的前處理飛槳深度學習引擎需要搭配 DJL 混合模式進行深度學習推理,原因是引擎本身沒有包含 NDArray 操作,因此需要藉用其他引擎的 NDArray 操作能力來完成。這邊我們導入 PyTorch 來做協同的前處理工作:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.paddlepaddle:paddlepaddle-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32\n", + "\n", + "// second engine to do preprocessing and postprocessing\n", + "%maven ai.djl.pytorch:pytorch-engine:0.24.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.*;\n", + "import ai.djl.inference.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.modality.cv.*;\n", + "import ai.djl.modality.cv.output.*;\n", + "import ai.djl.modality.cv.transform.*;\n", + "import ai.djl.modality.cv.translator.*;\n", + "import ai.djl.modality.cv.util.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.ndarray.types.Shape;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.translate.*;\n", + "\n", + "import java.io.*;\n", + "import java.nio.file.*;\n", + "import java.util.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 臉部偵測模型\n", + "現在我們可以開始處理第一個模型,在將圖片輸入臉部檢測模型前我們必須先做一些預處理:\n", + "•\t調整圖片尺寸: 以特定比例縮小圖片\n", + "•\t用一個數值對縮小後圖片正規化\n", + "對開發者來說好消息是,DJL 提供了 Translator 介面來幫助開發做這樣的預處理. 一個比較粗略的 Translator 架構如下:\n", + "\n", + "![](https://github.com/deepjavalibrary/djl/blob/master/examples/docs/img/workFlow.png?raw=true)\n", + "\n", + "在接下來的段落,我們會利用一個 FaceTranslator 子類別實作來完成工作\n", + "### 預處理\n", + "在這個階段我們會讀取一張圖片並且對其做一些事先的預處理,讓我們先示範讀取一張圖片:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "String url = \"https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.5/demo/mask_detection/python/images/mask.jpg\";\n", + "Image img = ImageFactory.getInstance().fromUrl(url);\n", + "img.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "接著,讓我們試著對圖片做一些預處理的轉換:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "NDList processImageInput(NDManager manager, Image input, float shrink) {\n", + " NDArray array = input.toNDArray(manager);\n", + " Shape shape = array.getShape();\n", + " array = NDImageUtils.resize(\n", + " array, (int) (shape.get(1) * shrink), (int) (shape.get(0) * shrink));\n", + " array = array.transpose(2, 0, 1).flip(0); // HWC -> CHW BGR -> RGB\n", + " NDArray mean = manager.create(new float[] {104f, 117f, 123f}, new Shape(3, 1, 1));\n", + " array = array.sub(mean).mul(0.007843f); // normalization\n", + " array = array.expandDims(0); // make batch dimension\n", + " return new NDList(array);\n", + "}\n", + "\n", + "processImageInput(NDManager.newBaseManager(), img, 0.5f);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "如上述所見,我們已經把圖片轉成如下尺寸的 NDArray: (披量, 通道(RGB), 高度, 寬度). 這是物件檢測模型輸入的格式\n", + "### 後處理\n", + "當我們做後處理時, 模型輸出的格式是 (number_of_boxes, (class_id, probability, xmin, ymin, xmax, ymax)). 我們可以將其存入預先建立好的 DJL 子類別 DetectedObjects 以便做後續操作. 我們假設有一組推論後的輸出是 ((1, 0.99, 0.2, 0.4, 0.5, 0.8)) 並且試著把人像框顯示在圖片上" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "DetectedObjects processImageOutput(NDList list, List className, float threshold) {\n", + " NDArray result = list.singletonOrThrow();\n", + " float[] probabilities = result.get(\":,1\").toFloatArray();\n", + " List names = new ArrayList<>();\n", + " List prob = new ArrayList<>();\n", + " List boxes = new ArrayList<>();\n", + " for (int i = 0; i < probabilities.length; i++) {\n", + " if (probabilities[i] >= threshold) {\n", + " float[] array = result.get(i).toFloatArray();\n", + " names.add(className.get((int) array[0]));\n", + " prob.add((double) probabilities[i]);\n", + " boxes.add(\n", + " new Rectangle(\n", + " array[2], array[3], array[4] - array[2], array[5] - array[3]));\n", + " }\n", + " }\n", + " return new DetectedObjects(names, prob, boxes);\n", + "}\n", + "\n", + "NDArray tempOutput = NDManager.newBaseManager().create(new float[]{1f, 0.99f, 0.1f, 0.1f, 0.2f, 0.2f}, new Shape(1, 6));\n", + "DetectedObjects testBox = processImageOutput(new NDList(tempOutput), Arrays.asList(\"Not Face\", \"Face\"), 0.7f);\n", + "Image newImage = img.duplicate();\n", + "newImage.drawBoundingBoxes(testBox);\n", + "newImage.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 生成一個翻譯器並執行推理任務\n", + "透過這個步驟,你會理解 DJL 中的前後處理如何運作,現在讓我們把前數的幾個步驟串在一起並對真實圖片進行操作:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class FaceTranslator implements NoBatchifyTranslator {\n", + "\n", + " private float shrink;\n", + " private float threshold;\n", + " private List className;\n", + "\n", + " FaceTranslator(float shrink, float threshold) {\n", + " this.shrink = shrink;\n", + " this.threshold = threshold;\n", + " className = Arrays.asList(\"Not Face\", \"Face\");\n", + " }\n", + "\n", + " @Override\n", + " public DetectedObjects processOutput(TranslatorContext ctx, NDList list) {\n", + " return processImageOutput(list, className, threshold);\n", + " }\n", + "\n", + " @Override\n", + " public NDList processInput(TranslatorContext ctx, Image input) {\n", + " return processImageInput(ctx.getNDManager(), input, shrink);\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "要執行這個人臉檢測推理,我們必須先從 DJL 的 Paddle Model Zoo 讀取模型,在讀取模型之前我們必須指定好 `Crieteria` . `Crieteria` 是用來確認要從哪邊讀取模型而後執行 `Translator` 來進行模型導入. 接著,我們只要利用 `Predictor` 就可以開始進行推論" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Criteria criteria = Criteria.builder()\n", + " .setTypes(Image.class, DetectedObjects.class)\n", + " .optModelUrls(\"djl://ai.djl.paddlepaddle/face_detection/0.0.1/mask_detection\")\n", + " .optFilter(\"flavor\", \"server\")\n", + " .optTranslator(new FaceTranslator(0.5f, 0.7f))\n", + " .build();\n", + " \n", + "var model = criteria.loadModel();\n", + "var predictor = model.newPredictor();\n", + "\n", + "DetectedObjects inferenceResult = predictor.predict(img);\n", + "newImage = img.duplicate();\n", + "newImage.drawBoundingBoxes(inferenceResult);\n", + "newImage.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "如圖片所示,這個推論服務已經可以正確的辨識出圖片中的三張人臉\n", + "## 口罩分類模型\n", + "一旦有了圖片的座標,我們就可以將圖片裁剪到適當大小並且將其傳給口罩分類模型做後續的推論\n", + "### 圖片裁剪\n", + "圖中方框位置的數值範圍從0到1, 只要將這個數值乘上圖片的長寬我們就可以將方框對應到圖片中的準確位置. 為了使裁剪後的圖片有更好的精確度,我們將圖片裁剪成方形,讓我們示範一下:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "int[] extendSquare(\n", + " double xmin, double ymin, double width, double height, double percentage) {\n", + " double centerx = xmin + width / 2;\n", + " double centery = ymin + height / 2;\n", + " double maxDist = Math.max(width / 2, height / 2) * (1 + percentage);\n", + " return new int[] {\n", + " (int) (centerx - maxDist), (int) (centery - maxDist), (int) (2 * maxDist)\n", + " };\n", + "}\n", + "\n", + "Image getSubImage(Image img, BoundingBox box) {\n", + " Rectangle rect = box.getBounds();\n", + " int width = img.getWidth();\n", + " int height = img.getHeight();\n", + " int[] squareBox =\n", + " extendSquare(\n", + " rect.getX() * width,\n", + " rect.getY() * height,\n", + " rect.getWidth() * width,\n", + " rect.getHeight() * height,\n", + " 0.18);\n", + " return img.getSubImage(squareBox[0], squareBox[1], squareBox[2], squareBox[2]);\n", + "}\n", + "\n", + "List faces = inferenceResult.items();\n", + "getSubImage(img, faces.get(2).getBoundingBox()).getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 事先準備 Translator 並讀取模型\n", + "在使用臉部檢測模型的時候,我們可以利用 DJL 預先建好的 `ImageClassificationTranslator` 並且加上一些轉換。這個 Translator 提供了一些基礎的圖片翻譯處理並且同時包含一些進階的標準化圖片處理。以這個例子來說, 我們不需要額外建立新的 `Translator` 而使用預先建立的就可以" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var criteria = Criteria.builder()\n", + " .setTypes(Image.class, Classifications.class)\n", + " .optModelUrls(\"djl://ai.djl.paddlepaddle/mask_classification/0.0.1/mask_classification\")\n", + " .optFilter(\"flavor\", \"server\")\n", + " .optTranslator(\n", + " ImageClassificationTranslator.builder()\n", + " .addTransform(new Resize(128, 128))\n", + " .addTransform(new ToTensor()) // HWC -> CHW div(255)\n", + " .addTransform(\n", + " new Normalize(\n", + " new float[] {0.5f, 0.5f, 0.5f},\n", + " new float[] {1.0f, 1.0f, 1.0f}))\n", + " .addTransform(nd -> nd.flip(0)) // RGB -> GBR\n", + " .build())\n", + " .build();\n", + "\n", + "var classifyModel = criteria.loadModel();\n", + "var classifier = classifyModel.newPredictor();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 執行推論任務\n", + "最後,要完成一個口罩識別的任務,我們只需要將上述的步驟合在一起即可。我們先將圖片做裁剪後並對其做上述的推論操作,結束之後再生成一個新的分類子類別 `DetectedObjects`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "List names = new ArrayList<>();\n", + "List prob = new ArrayList<>();\n", + "List rect = new ArrayList<>();\n", + "for (DetectedObjects.DetectedObject face : faces) {\n", + " Image subImg = getSubImage(img, face.getBoundingBox());\n", + " Classifications classifications = classifier.predict(subImg);\n", + " names.add(classifications.best().getClassName());\n", + " prob.add(face.getProbability());\n", + " rect.add(face.getBoundingBox());\n", + "}\n", + "\n", + "newImage = img.duplicate();\n", + "newImage.drawBoundingBoxes(new DetectedObjects(names, prob, rect));\n", + "newImage.getWrappedImage();" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/jupyter/paddlepaddle/paddle_ocr_java.ipynb b/jupyter/paddlepaddle/paddle_ocr_java.ipynb new file mode 100644 index 00000000..a7984acc --- /dev/null +++ b/jupyter/paddlepaddle/paddle_ocr_java.ipynb @@ -0,0 +1,313 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# PaddleOCR DJL example\n", + "\n", + "In this tutorial, we will be using pretrained PaddlePaddle model from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) to do Optical character recognition (OCR) from the given image. There are three models involved in this tutorial:\n", + "\n", + "- Word detection model: used to detect the word block from the image\n", + "- Word direction model: used to find if the text needs to rotate\n", + "- Word recognition model: Used to recognize test from the word block\n", + "\n", + "## Import dependencies and classes\n", + "\n", + "PaddlePaddle is one of the Deep Engines that requires DJL hybrid mode to run inference. Itself does not contains NDArray operations and needs a supplemental DL framework to help with that. So we import Pytorch DL engine as well in here to do the processing works." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.paddlepaddle:paddlepaddle-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32\n", + "\n", + "// second engine to do preprocessing and postprocessing\n", + "%maven ai.djl.pytorch:pytorch-engine:0.24.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.*;\n", + "import ai.djl.inference.Predictor;\n", + "import ai.djl.modality.Classifications;\n", + "import ai.djl.modality.cv.Image;\n", + "import ai.djl.modality.cv.ImageFactory;\n", + "import ai.djl.modality.cv.output.*;\n", + "import ai.djl.modality.cv.util.NDImageUtils;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.ndarray.types.DataType;\n", + "import ai.djl.ndarray.types.Shape;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.paddlepaddle.zoo.cv.objectdetection.PpWordDetectionTranslator;\n", + "import ai.djl.paddlepaddle.zoo.cv.imageclassification.PpWordRotateTranslator;\n", + "import ai.djl.paddlepaddle.zoo.cv.wordrecognition.PpWordRecognitionTranslator;\n", + "import ai.djl.translate.*;\n", + "import java.util.concurrent.ConcurrentHashMap;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## the Image\n", + "Firstly, let's take a look at our sample image, a flight ticket:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "String url = \"https://resources.djl.ai/images/flight_ticket.jpg\";\n", + "Image img = ImageFactory.getInstance().fromUrl(url);\n", + "img.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Word detection model\n", + "\n", + "In our word detection model, we load the model exported from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_en/inference_en.md#convert-detection-model-to-inference-model). After that, we can spawn a DJL Predictor from it called detector." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var criteria1 = Criteria.builder()\n", + " .optEngine(\"PaddlePaddle\")\n", + " .setTypes(Image.class, DetectedObjects.class)\n", + " .optModelUrls(\"https://resources.djl.ai/test-models/paddleOCR/mobile/det_db.zip\")\n", + " .optTranslator(new PpWordDetectionTranslator(new ConcurrentHashMap()))\n", + " .build();\n", + "var detectionModel = criteria1.loadModel();\n", + "var detector = detectionModel.newPredictor();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then, we can detect the word block from it. The original output from the model is a bitmap that marked all word regions. The `PpWordDetectionTranslator` convert the output bitmap into a rectangle bounded box for us to crop the image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var detectedObj = detector.predict(img);\n", + "Image newImage = img.duplicate();\n", + "newImage.drawBoundingBoxes(detectedObj);\n", + "newImage.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see above, the word block are very narrow and does not include the whole body of all words. Let's try to extend it a bit for a better result. `extendRect` extend the box height and width to a certain scale. `getSubImage` will crop the image and extract the word block." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Image getSubImage(Image img, BoundingBox box) {\n", + " Rectangle rect = box.getBounds();\n", + " double[] extended = extendRect(rect.getX(), rect.getY(), rect.getWidth(), rect.getHeight());\n", + " int width = img.getWidth();\n", + " int height = img.getHeight();\n", + " int[] recovered = {\n", + " (int) (extended[0] * width),\n", + " (int) (extended[1] * height),\n", + " (int) (extended[2] * width),\n", + " (int) (extended[3] * height)\n", + " };\n", + " return img.getSubImage(recovered[0], recovered[1], recovered[2], recovered[3]);\n", + "}\n", + "\n", + "double[] extendRect(double xmin, double ymin, double width, double height) {\n", + " double centerx = xmin + width / 2;\n", + " double centery = ymin + height / 2;\n", + " if (width > height) {\n", + " width += height * 2.0;\n", + " height *= 3.0;\n", + " } else {\n", + " height += width * 2.0;\n", + " width *= 3.0;\n", + " }\n", + " double newX = centerx - width / 2 < 0 ? 0 : centerx - width / 2;\n", + " double newY = centery - height / 2 < 0 ? 0 : centery - height / 2;\n", + " double newWidth = newX + width > 1 ? 1 - newX : width;\n", + " double newHeight = newY + height > 1 ? 1 - newY : height;\n", + " return new double[] {newX, newY, newWidth, newHeight};\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's try to extract one block out:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "List boxes = detectedObj.items();\n", + "var sample = getSubImage(img, boxes.get(5).getBoundingBox());\n", + "sample.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Word Direction model\n", + "\n", + "This model is exported from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_en/inference_en.md#convert-angle-classification-model-to-inference-model) that can help to identify if the image is required to rotate. The following code will load this model and create a rotateClassifier." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var criteria2 = Criteria.builder()\n", + " .optEngine(\"PaddlePaddle\")\n", + " .setTypes(Image.class, Classifications.class)\n", + " .optModelUrls(\"https://resources.djl.ai/test-models/paddleOCR/mobile/cls.zip\")\n", + " .optTranslator(new PpWordRotateTranslator())\n", + " .build();\n", + "var rotateModel = criteria2.loadModel();\n", + "var rotateClassifier = rotateModel.newPredictor();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Word Recgonition model\n", + "\n", + "The word recognition model is exported from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_en/inference_en.md#convert-recognition-model-to-inference-model) that can recognize the text on the image. Let's load this model as well.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var criteria3 = Criteria.builder()\n", + " .optEngine(\"PaddlePaddle\")\n", + " .setTypes(Image.class, String.class)\n", + " .optModelUrls(\"https://resources.djl.ai/test-models/paddleOCR/mobile/rec_crnn.zip\")\n", + " .optTranslator(new PpWordRecognitionTranslator())\n", + " .build();\n", + "var recognitionModel = criteria3.loadModel();\n", + "var recognizer = recognitionModel.newPredictor();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then we can try to play with these two models on the previous cropped image:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "System.out.println(rotateClassifier.predict(sample));\n", + "recognizer.predict(sample);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, let's run these models on the whole image and see the outcome. DJL offers a rich image toolkit that allows you to draw the text on image and display them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Image rotateImg(Image image) {\n", + " try (NDManager manager = NDManager.newBaseManager()) {\n", + " NDArray rotated = NDImageUtils.rotate90(image.toNDArray(manager), 1);\n", + " return ImageFactory.getInstance().fromNDArray(rotated);\n", + " }\n", + "}\n", + "\n", + "List names = new ArrayList<>();\n", + "List prob = new ArrayList<>();\n", + "List rect = new ArrayList<>();\n", + "\n", + "for (int i = 0; i < boxes.size(); i++) {\n", + " Image subImg = getSubImage(img, boxes.get(i).getBoundingBox());\n", + " if (subImg.getHeight() * 1.0 / subImg.getWidth() > 1.5) {\n", + " subImg = rotateImg(subImg);\n", + " }\n", + " Classifications.Classification result = rotateClassifier.predict(subImg).best();\n", + " if (\"Rotate\".equals(result.getClassName()) && result.getProbability() > 0.8) {\n", + " subImg = rotateImg(subImg);\n", + " }\n", + " String name = recognizer.predict(subImg);\n", + " names.add(name);\n", + " prob.add(-1.0);\n", + " rect.add(boxes.get(i).getBoundingBox());\n", + "}\n", + "newImage.drawBoundingBoxes(new DetectedObjects(names, prob, rect));\n", + "newImage.getWrappedImage();" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/jupyter/paddlepaddle/paddle_ocr_java_zh.ipynb b/jupyter/paddlepaddle/paddle_ocr_java_zh.ipynb new file mode 100644 index 00000000..1e60b733 --- /dev/null +++ b/jupyter/paddlepaddle/paddle_ocr_java_zh.ipynb @@ -0,0 +1,309 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# PaddleOCR在DJL 上的實現\n", + "在這個教程裡,我們會展示利用 PaddleOCR 下載預訓練好文字處理模型並對指定的照片進行文學文字檢測 (OCR)。這個教程總共會分成三個部分:\n", + "\n", + "- 文字區塊檢測: 從圖片檢測出文字區塊\n", + "- 文字角度檢測: 確認文字是否需要旋轉\n", + "- 文字識別: 確認區塊內的文字\n", + "\n", + "## 導入相關環境依賴及子類別\n", + "在這個例子中的前處理飛槳深度學習引擎需要搭配DJL混合模式進行深度學習推理,原因是引擎本身沒有包含ND數組操作,因此需要藉用其他引擎的數組操作能力來完成。這邊我們導入Pytorch來做協同的前處理工作:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.paddlepaddle:paddlepaddle-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32\n", + "\n", + "// second engine to do preprocessing and postprocessing\n", + "%maven ai.djl.pytorch:pytorch-engine:0.24.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.*;\n", + "import ai.djl.inference.Predictor;\n", + "import ai.djl.modality.Classifications;\n", + "import ai.djl.modality.cv.Image;\n", + "import ai.djl.modality.cv.ImageFactory;\n", + "import ai.djl.modality.cv.output.*;\n", + "import ai.djl.modality.cv.util.NDImageUtils;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.ndarray.types.DataType;\n", + "import ai.djl.ndarray.types.Shape;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.paddlepaddle.zoo.cv.objectdetection.PpWordDetectionTranslator;\n", + "import ai.djl.paddlepaddle.zoo.cv.imageclassification.PpWordRotateTranslator;\n", + "import ai.djl.paddlepaddle.zoo.cv.wordrecognition.PpWordRecognitionTranslator;\n", + "import ai.djl.translate.*;\n", + "import java.util.concurrent.ConcurrentHashMap;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 圖片讀取\n", + "首先讓我們載入這次教程會用到的機票範例圖片:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "String url = \"https://resources.djl.ai/images/flight_ticket.jpg\";\n", + "Image img = ImageFactory.getInstance().fromUrl(url);\n", + "img.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 文字區塊檢測\n", + "我們首先從 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_en/inference_en.md#convert-detection-model-to-inference-model) 開發套件中讀取文字檢測的模型,之後我們可以生成一個DJL `Predictor` 並將其命名為 `detector`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var criteria1 = Criteria.builder()\n", + " .optEngine(\"PaddlePaddle\")\n", + " .setTypes(Image.class, DetectedObjects.class)\n", + " .optModelUrls(\"https://resources.djl.ai/test-models/paddleOCR/mobile/det_db.zip\")\n", + " .optTranslator(new PpWordDetectionTranslator(new ConcurrentHashMap()))\n", + " .build();\n", + "var detectionModel = criteria1.loadModel();\n", + "var detector = detectionModel.newPredictor();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "接著我們檢測出圖片中的文字區塊,這個模型的原始輸出是含有標註所有文字區域的圖算法(Bitmap),我們可以利用`PpWordDetectionTranslator` 函式將圖算法的輸出轉成長方形的方框來裁剪圖片" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var detectedObj = detector.predict(img);\n", + "Image newImage = img.duplicate();\n", + "newImage.drawBoundingBoxes(detectedObj);\n", + "newImage.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "如上所示,所標註的文字區塊都非常窄,且沒有包住所有完整的文字區塊。讓我們嘗試使用`extendRect`函式來擴展文字框的長寬到需要的大小, 再利用 `getSubImage` 裁剪並擷取出文子區塊。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Image getSubImage(Image img, BoundingBox box) {\n", + " Rectangle rect = box.getBounds();\n", + " double[] extended = extendRect(rect.getX(), rect.getY(), rect.getWidth(), rect.getHeight());\n", + " int width = img.getWidth();\n", + " int height = img.getHeight();\n", + " int[] recovered = {\n", + " (int) (extended[0] * width),\n", + " (int) (extended[1] * height),\n", + " (int) (extended[2] * width),\n", + " (int) (extended[3] * height)\n", + " };\n", + " return img.getSubImage(recovered[0], recovered[1], recovered[2], recovered[3]);\n", + "}\n", + "\n", + "double[] extendRect(double xmin, double ymin, double width, double height) {\n", + " double centerx = xmin + width / 2;\n", + " double centery = ymin + height / 2;\n", + " if (width > height) {\n", + " width += height * 2.0;\n", + " height *= 3.0;\n", + " } else {\n", + " height += width * 2.0;\n", + " width *= 3.0;\n", + " }\n", + " double newX = centerx - width / 2 < 0 ? 0 : centerx - width / 2;\n", + " double newY = centery - height / 2 < 0 ? 0 : centery - height / 2;\n", + " double newWidth = newX + width > 1 ? 1 - newX : width;\n", + " double newHeight = newY + height > 1 ? 1 - newY : height;\n", + " return new double[] {newX, newY, newWidth, newHeight};\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "讓我們輸出其中一個文字區塊" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "List boxes = detectedObj.items();\n", + "var sample = getSubImage(img, boxes.get(5).getBoundingBox());\n", + "sample.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 文字角度檢測\n", + "我們從 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_en/inference_en.md#convert-angle-classification-model-to-inference-model) 輸出這個模型並確認圖片及文字是否需要旋轉。以下的代碼會讀入這個模型並生成a `rotateClassifier` 子類別" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var criteria2 = Criteria.builder()\n", + " .optEngine(\"PaddlePaddle\")\n", + " .setTypes(Image.class, Classifications.class)\n", + " .optModelUrls(\"https://resources.djl.ai/test-models/paddleOCR/mobile/cls.zip\")\n", + " .optTranslator(new PpWordRotateTranslator())\n", + " .build();\n", + "var rotateModel = criteria2.loadModel();\n", + "var rotateClassifier = rotateModel.newPredictor();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 文字識別\n", + "\n", + "我們從 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.0/doc/doc_en/inference_en.md#convert-recognition-model-to-inference-model) 輸出這個模型並識別圖片中的文字, 我們一樣仿造上述的步驟讀取這個模型\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var criteria3 = Criteria.builder()\n", + " .optEngine(\"PaddlePaddle\")\n", + " .setTypes(Image.class, String.class)\n", + " .optModelUrls(\"https://resources.djl.ai/test-models/paddleOCR/mobile/rec_crnn.zip\")\n", + " .optTranslator(new PpWordRecognitionTranslator())\n", + " .build();\n", + "var recognitionModel = criteria3.loadModel();\n", + "var recognizer = recognitionModel.newPredictor();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "接著我們可以試著套用這兩個模型在先前剪裁好的文字區塊上" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "System.out.println(rotateClassifier.predict(sample));\n", + "recognizer.predict(sample);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "最後我們把這些模型串連在一起並套用在整張圖片上看看結果會如何。DJL提供了豐富的影像工具包讓你可以從圖片中擷取出文字並且完美呈現" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Image rotateImg(Image image) {\n", + " try (NDManager manager = NDManager.newBaseManager()) {\n", + " NDArray rotated = NDImageUtils.rotate90(image.toNDArray(manager), 1);\n", + " return ImageFactory.getInstance().fromNDArray(rotated);\n", + " }\n", + "}\n", + "\n", + "List names = new ArrayList<>();\n", + "List prob = new ArrayList<>();\n", + "List rect = new ArrayList<>();\n", + "\n", + "for (int i = 0; i < boxes.size(); i++) {\n", + " Image subImg = getSubImage(img, boxes.get(i).getBoundingBox());\n", + " if (subImg.getHeight() * 1.0 / subImg.getWidth() > 1.5) {\n", + " subImg = rotateImg(subImg);\n", + " }\n", + " Classifications.Classification result = rotateClassifier.predict(subImg).best();\n", + " if (\"Rotate\".equals(result.getClassName()) && result.getProbability() > 0.8) {\n", + " subImg = rotateImg(subImg);\n", + " }\n", + " String name = recognizer.predict(subImg);\n", + " names.add(name);\n", + " prob.add(-1.0);\n", + " rect.add(boxes.get(i).getBoundingBox());\n", + "}\n", + "newImage.drawBoundingBoxes(new DetectedObjects(names, prob, rect));\n", + "newImage.getWrappedImage();" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/jupyter/pytorch/load_your_own_pytorch_bert.ipynb b/jupyter/pytorch/load_your_own_pytorch_bert.ipynb new file mode 100644 index 00000000..bd0f281b --- /dev/null +++ b/jupyter/pytorch/load_your_own_pytorch_bert.ipynb @@ -0,0 +1,441 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Load your own PyTorch BERT model\n", + "\n", + "In the previous [example](https://docs.djl.ai/docs/demos/jupyter/BERTQA.html), you run BERT inference with the model from Model Zoo. You can also load the model on your own pre-trained BERT and use custom classes as the input and output.\n", + "\n", + "In general, the PyTorch BERT model from [HuggingFace](https://github.com/huggingface/transformers) requires these three inputs:\n", + "\n", + "- word indices: The index of each word in a sentence\n", + "- word types: The type index of the word.\n", + "- attention mask: The mask indicates to the model which tokens should be attended to, and which should not after batching sequence together.\n", + "\n", + "We will dive deep into these details later." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparation\n", + "\n", + "This tutorial requires the installation of Java Kernel. To install the Java Kernel, see the [README](https://docs.djl.ai/docs/demos/jupyter/index.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are dependencies we will use." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.pytorch:pytorch-engine:0.24.0\n", + "%maven ai.djl.pytorch:pytorch-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Import java packages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import java.io.*;\n", + "import java.nio.file.*;\n", + "import java.util.*;\n", + "import java.util.stream.*;\n", + "\n", + "import ai.djl.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.ndarray.types.*;\n", + "import ai.djl.inference.*;\n", + "import ai.djl.translate.*;\n", + "import ai.djl.training.util.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.modality.nlp.*;\n", + "import ai.djl.modality.nlp.qa.*;\n", + "import ai.djl.modality.nlp.bert.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Reuse the previous input**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var question = \"When did BBC Japan start broadcasting?\";\n", + "var resourceDocument = \"BBC Japan was a general entertainment Channel.\\n\" +\n", + " \"Which operated between December 2004 and April 2006.\\n\" +\n", + " \"It ceased operations after its Japanese distributor folded.\";\n", + "\n", + "QAInput input = new QAInput(question, resourceDocument);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dive deep into Translator\n", + "\n", + "Inference in deep learning is the process of predicting the output for a given input based on a pre-defined model.\n", + "DJL abstracts away the whole process for ease of use. It can load the model, perform inference on the input, and provide\n", + "output. DJL also allows you to provide user-defined inputs. The workflow looks like the following:\n", + "\n", + "![https://github.com/deepjavalibrary/djl/blob/master/examples/docs/img/workFlow.png?raw=true](https://github.com/deepjavalibrary/djl/blob/master/examples/docs/img/workFlow.png?raw=true)\n", + "\n", + "The red block (\"Images\") in the workflow is the input that DJL expects from you. The green block (\"Images\n", + "bounding box\") is the output that you expect. Because DJL does not know which input to expect and which output format that you prefer, DJL provides the [`Translator`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/translate/Translator.html) interface so you can define your own\n", + "input and output.\n", + "\n", + "The `Translator` interface encompasses the two white blocks: Pre-processing and Post-processing. The pre-processing\n", + "component converts the user-defined input objects into an NDList, so that the [`Predictor`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html) in DJL can understand the\n", + "input and make its prediction. Similarly, the post-processing block receives an NDList as the output from the\n", + "`Predictor`. The post-processing block allows you to convert the output from the `Predictor` to the desired output\n", + "format." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Pre-processing\n", + "\n", + "Now, you need to convert the sentences into tokens. We provide a powerful tool [`BertTokenizer`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/modality/nlp/bert/BertTokenizer.html) that you can use to convert questions and answers into tokens, and batchify your sequence together. Once you have properly formatted tokens, you can use [`Vocabulary`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/modality/nlp/Vocabulary.html) to map your token to BERT index.\n", + "\n", + "The following code block demonstrates tokenizing the question and answer defined earlier into BERT-formatted tokens." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var tokenizer = new BertTokenizer();\n", + "List tokenQ = tokenizer.tokenize(question.toLowerCase());\n", + "List tokenA = tokenizer.tokenize(resourceDocument.toLowerCase());\n", + "\n", + "System.out.println(\"Question Token: \" + tokenQ);\n", + "System.out.println(\"Answer Token: \" + tokenA);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`BertTokenizer` can also help you batchify questions and resource documents together by calling `encode()`.\n", + "The output contains information that BERT ingests.\n", + "\n", + "- getTokens: It returns a list of strings including the question, resource document and special word to let the model tell which part is the question and which part is the resource document. Because PyTorch BERT was trained with varioue sequence length, you don't pad the tokens.\n", + "- getTokenTypes: It returns a list of type indices of the word to indicate the location of the resource document. All Questions will be labelled with 0 and all resource documents will be labelled with 1.\n", + "\n", + " [Question tokens...DocResourceTokens...padding tokens] => [000000...11111....0000]\n", + " \n", + "\n", + "- getValidLength: It returns the actual length of the question and resource document tokens tokens, which are required by MXNet BERT.\n", + "- getAttentionMask: It returns the mask for the model to indicate which part should be paid attention to and which part is the padding. It is required by PyTorch BERT.\n", + "\n", + " [Question tokens...DocResourceTokens...padding tokens] => [111111...11111....0000]\n", + " \n", + "PyTorch BERT was trained with varioue sequence length, so we don't need to pad the tokens." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "BertToken token = tokenizer.encode(question.toLowerCase(), resourceDocument.toLowerCase());\n", + "System.out.println(\"Encoded tokens: \" + token.getTokens());\n", + "System.out.println(\"Encoded token type: \" + token.getTokenTypes());\n", + "System.out.println(\"Valid length: \" + token.getValidLength());" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Normally, words and sentences are represented as indices instead of tokens for training. \n", + "They typically work like a vector in a n-dimensional space. In this case, you need to map them into indices.\n", + "DJL provides `Vocabulary` to take care of you vocabulary mapping.\n", + "\n", + "The bert vocab from Huggingface is of the following format.\n", + "```\n", + "[PAD]\n", + "[unused0]\n", + "[unused1]\n", + "[unused2]\n", + "[unused3]\n", + "[unused4]\n", + "[unused5]\n", + "[unused6]\n", + "[unused7]\n", + "[unused8]\n", + "...\n", + "```\n", + "We provide the `bert-base-uncased-vocab.txt` from our pre-trained BERT for demonstration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DownloadUtils.download(\"https://djl-ai.s3.amazonaws.com/mlrepo/model/nlp/question_answer/ai/djl/pytorch/bertqa/0.0.1/bert-base-uncased-vocab.txt.gz\", \"build/pytorch/bertqa/vocab.txt\", new ProgressBar());" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var path = Paths.get(\"build/pytorch/bertqa/vocab.txt\");\n", + "var vocabulary = DefaultVocabulary.builder()\n", + " .optMinFrequency(1)\n", + " .addFromTextFile(path)\n", + " .optUnknownToken(\"[UNK]\")\n", + " .build();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can easily convert the token to the index using `vocabulary.getIndex(token)` and the other way around using `vocabulary.getToken(index)`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "long index = vocabulary.getIndex(\"car\");\n", + "String token = vocabulary.getToken(2482);\n", + "System.out.println(\"The index of the car is \" + index);\n", + "System.out.println(\"The token of the index 2482 is \" + token);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To properly convert them into `float[]` for `NDArray` creation, here is the helper function:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you have everything you need, you can create an NDList and populate all of the inputs you formatted earlier. You're done with pre-processing! \n", + "\n", + "#### Construct `Translator`\n", + "\n", + "You need to do this processing within an implementation of the `Translator` interface. `Translator` is designed to do pre-processing and post-processing. You must define the input and output objects. It contains the following two override classes:\n", + "- `public NDList processInput(TranslatorContext ctx, I)`\n", + "- `public String processOutput(TranslatorContext ctx, O)`\n", + "\n", + "Every translator takes in input and returns output in the form of generic objects. In this case, the translator takes input in the form of `QAInput` (I) and returns output as a `String` (O). `QAInput` is just an object that holds questions and answer; We have prepared the Input class for you." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Armed with the needed knowledge, you can write an implementation of the `Translator` interface. `BertTranslator` uses the code snippets explained previously to implement the `processInput`method. For more information, see [`NDManager`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/ndarray/NDManager.html).\n", + "\n", + "```\n", + "manager.create(Number[] data, Shape)\n", + "manager.create(Number[] data)\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "public class BertTranslator implements Translator {\n", + " private List tokens;\n", + " private Vocabulary vocabulary;\n", + " private BertTokenizer tokenizer;\n", + " \n", + " @Override\n", + " public void prepare(TranslatorContext ctx) throws IOException {\n", + " Path path = Paths.get(\"build/pytorch/bertqa/vocab.txt\");\n", + " vocabulary = DefaultVocabulary.builder()\n", + " .optMinFrequency(1)\n", + " .addFromTextFile(path)\n", + " .optUnknownToken(\"[UNK]\")\n", + " .build();\n", + " tokenizer = new BertTokenizer();\n", + " }\n", + " \n", + " @Override\n", + " public NDList processInput(TranslatorContext ctx, QAInput input) {\n", + " BertToken token =\n", + " tokenizer.encode(\n", + " input.getQuestion().toLowerCase(),\n", + " input.getParagraph().toLowerCase());\n", + " // get the encoded tokens that would be used in precessOutput\n", + " tokens = token.getTokens();\n", + " NDManager manager = ctx.getNDManager();\n", + " // map the tokens(String) to indices(long)\n", + " long[] indices = tokens.stream().mapToLong(vocabulary::getIndex).toArray();\n", + " long[] attentionMask = token.getAttentionMask().stream().mapToLong(i -> i).toArray();\n", + " long[] tokenType = token.getTokenTypes().stream().mapToLong(i -> i).toArray();\n", + " NDArray indicesArray = manager.create(indices);\n", + " NDArray attentionMaskArray =\n", + " manager.create(attentionMask);\n", + " NDArray tokenTypeArray = manager.create(tokenType);\n", + " // The order matters\n", + " return new NDList(indicesArray, attentionMaskArray, tokenTypeArray);\n", + " }\n", + " \n", + " @Override\n", + " public String processOutput(TranslatorContext ctx, NDList list) {\n", + " NDArray startLogits = list.get(0);\n", + " NDArray endLogits = list.get(1);\n", + " int startIdx = (int) startLogits.argMax().getLong();\n", + " int endIdx = (int) endLogits.argMax().getLong();\n", + " return tokens.subList(startIdx, endIdx + 1).toString();\n", + " }\n", + " \n", + " @Override\n", + " public Batchifier getBatchifier() {\n", + " return Batchifier.STACK;\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Congrats! You have created your first Translator! We have pre-filled the `processOutput()` function to process the `NDList` and return it in a desired format. `processInput()` and `processOutput()` offer the flexibility to get the predictions from the model in any format you desire. \n", + "\n", + "With the Translator implemented, you need to bring up the predictor that uses your `Translator` to start making predictions. You can find the usage for `Predictor` in the [Predictor Javadoc](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html). Create a translator and use the `question` and `resourceDocument` provided previously." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DownloadUtils.download(\"https://djl-ai.s3.amazonaws.com/mlrepo/model/nlp/question_answer/ai/djl/pytorch/bertqa/0.0.1/trace_bertqa.pt.gz\", \"build/pytorch/bertqa/bertqa.pt\", new ProgressBar());" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "BertTranslator translator = new BertTranslator();\n", + "\n", + "Criteria criteria = Criteria.builder()\n", + " .setTypes(QAInput.class, String.class)\n", + " .optModelPath(Paths.get(\"build/pytorch/bertqa/\")) // search in local folder\n", + " .optTranslator(translator)\n", + " .optProgress(new ProgressBar()).build();\n", + "\n", + "ZooModel model = criteria.loadModel();" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "String predictResult = null;\n", + "QAInput input = new QAInput(question, resourceDocument);\n", + "\n", + "// Create a Predictor and use it to predict the output\n", + "try (Predictor predictor = model.newPredictor(translator)) {\n", + " predictResult = predictor.predict(input);\n", + "}\n", + "\n", + "System.out.println(question);\n", + "System.out.println(predictResult);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Based on the input, the following result will be shown:\n", + "```\n", + "[december, 2004]\n", + "```\n", + "That's it! \n", + "\n", + "You can try with more questions and answers. Here are the samples:\n", + "\n", + "**Answer Material**\n", + "\n", + "The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (\"Norman\" comes from \"Norseman\") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.\n", + "\n", + "\n", + "**Question**\n", + "\n", + "Q: When were the Normans in Normandy?\n", + "A: 10th and 11th centuries\n", + "\n", + "Q: In what country is Normandy located?\n", + "A: france\n", + "\n", + "For the full source code, see the [DJL repo](https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/BertQaInference.java) and translator implementation [MXNet](https://github.com/deepjavalibrary/djl/blob/master/engines/mxnet/mxnet-model-zoo/src/main/java/ai/djl/mxnet/zoo/nlp/qa/MxBertQATranslator.java) [PyTorch](https://github.com/deepjavalibrary/djl/blob/master/engines/pytorch/pytorch-model-zoo/src/main/java/ai/djl/pytorch/zoo/nlp/qa/PtBertQATranslator.java)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/rank_classification_using_BERT_on_Amazon_Review.ipynb b/jupyter/rank_classification_using_BERT_on_Amazon_Review.ipynb new file mode 100644 index 00000000..5ddecd9f --- /dev/null +++ b/jupyter/rank_classification_using_BERT_on_Amazon_Review.ipynb @@ -0,0 +1,473 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Rank Classification using BERT on Amazon Review dataset\n", + "\n", + "## Introduction\n", + "\n", + "In this tutorial, you learn how to train a rank classification model using [Transfer Learning](https://en.wikipedia.org/wiki/Transfer_learning). We will use a pretrained DistilBert model to train on the Amazon review dataset.\n", + "\n", + "## About the dataset and model\n", + "\n", + "[Amazon Customer Review dataset](https://s3.amazonaws.com/amazon-reviews-pds/readme.html) consists of all different valid reviews from amazon.com. We will use the \"Digital_software\" category that consists of 102k valid reviews. As for the pre-trained model, use the DistilBERT[[1]](https://arxiv.org/abs/1910.01108) model. It's a light-weight BERT model already trained on [Wikipedia text corpora](https://en.wikipedia.org/wiki/List_of_text_corpora), a much larger dataset consisting of over millions text. The DistilBERT served as a base layer and we will add some more classification layers to output as rankings (1 - 5).\n", + "\n", + "\n", + "
Amazon Review example
\n", + "\n", + "We will use review body as our data input and ranking as label.\n", + "\n", + "\n", + "## Pre-requisites\n", + "This tutorial assumes you have the following knowledge. Follow the READMEs and tutorials if you are not familiar with:\n", + "1. How to setup and run [Java Kernel in Jupyter Notebook](https://docs.djl.ai/docs/demos/jupyter/index.html)\n", + "2. Basic components of Deep Java Library, and how to [train your first model](https://docs.djl.ai/docs/demos/jupyter/tutorial/02_train_your_first_model.html).\n", + "\n", + "\n", + "## Getting started\n", + "Load the Deep Java Libarary and its dependencies from Maven. In here, you can choose between MXNet or PyTorch. MXNet is enabled by default. You can uncomment PyTorch dependencies and comment MXNet ones to switch to PyTorch." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl:basicdataset:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32\n", + "%maven ai.djl.mxnet:mxnet-model-zoo:0.24.0\n", + "\n", + "// PyTorch\n", + "// %maven ai.djl.pytorch:pytorch-model-zoo:0.24.0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's import the necessary modules:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.*;\n", + "import ai.djl.basicdataset.tabular.*;\n", + "import ai.djl.basicdataset.tabular.utils.*;\n", + "import ai.djl.basicdataset.utils.*;\n", + "import ai.djl.engine.*;\n", + "import ai.djl.inference.*;\n", + "import ai.djl.metric.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.modality.nlp.*;\n", + "import ai.djl.modality.nlp.bert.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.ndarray.types.*;\n", + "import ai.djl.nn.*;\n", + "import ai.djl.nn.core.*;\n", + "import ai.djl.nn.norm.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.training.*;\n", + "import ai.djl.training.dataset.*;\n", + "import ai.djl.training.evaluator.*;\n", + "import ai.djl.training.listener.*;\n", + "import ai.djl.training.loss.*;\n", + "import ai.djl.training.util.*;\n", + "import ai.djl.translate.*;\n", + "import java.io.*;\n", + "import java.nio.file.*;\n", + "import java.util.*;\n", + "import org.apache.commons.csv.*;\n", + "\n", + "System.out.println(\"You are using: \" + Engine.getInstance().getEngineName() + \" Engine\");" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare Dataset\n", + "\n", + "First step is to prepare the dataset for training. Since the original data was in TSV format, we can use CSVDataset to be the dataset container. We will also need to specify how do we want to preprocess the raw data. For BERT model, the input data are required to be tokenized and mapped into indices based on the inputs. In DJL, we defined an interface called Fearurizer, it is designed to allow user customize operation on each selected row/column of a dataset. In our case, we would like to clean and tokenize our sentencies. So let's try to implement it to deal with customer review sentencies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "final class BertFeaturizer implements Featurizer {\n", + "\n", + " private final BertFullTokenizer tokenizer;\n", + " private final int maxLength; // the cut-off length\n", + "\n", + " public BertFeaturizer(BertFullTokenizer tokenizer, int maxLength) {\n", + " this.tokenizer = tokenizer;\n", + " this.maxLength = maxLength;\n", + " }\n", + "\n", + " /** {@inheritDoc} */\n", + " @Override\n", + " public void featurize(DynamicBuffer buf, String input) {\n", + " Vocabulary vocab = tokenizer.getVocabulary();\n", + " // convert sentence to tokens (toLowerCase for uncased model)\n", + " List tokens = tokenizer.tokenize(input.toLowerCase());\n", + " // trim the tokens to maxLength\n", + " tokens = tokens.size() > maxLength ? tokens.subList(0, maxLength) : tokens;\n", + " // BERT embedding convention \"[CLS] Your Sentence [SEP]\"\n", + " buf.put(vocab.getIndex(\"[CLS]\"));\n", + " tokens.forEach(token -> buf.put(vocab.getIndex(token)));\n", + " buf.put(vocab.getIndex(\"[SEP]\"));\n", + " }\n", + "\n", + " /** {@inheritDoc} */\n", + " @Override\n", + " public int dataRequired() {\n", + " throw new IllegalStateException(\"BertFeaturizer only support featurize, not deFeaturize\");\n", + " }\n", + "\n", + " /** {@inheritDoc} */\n", + " @Override\n", + " public Object deFeaturize(float[] data) {\n", + " throw new IllegalStateException(\"BertFeaturizer only support featurize, not deFeaturize\");\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once we got this part done, we can apply the `BertFeaturizer` into our Dataset. We take `review_body` column and apply the Featurizer. We also pick `star_rating` as our label set. Since we go for batch input, we need to tell the dataset to pad our data if it is less than the `maxLength` we defined. `PaddingStackBatchifier` will do the work for you." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "CsvDataset getDataset(int batchSize, BertFullTokenizer tokenizer, int maxLength, int limit) {\n", + " String amazonReview =\n", + " \"https://mlrepo.djl.ai/dataset/nlp/ai/djl/basicdataset/amazon_reviews/1.0/amazon_reviews_us_Digital_Software_v1_00.tsv.gz\";\n", + " float paddingToken = tokenizer.getVocabulary().getIndex(\"[PAD]\");\n", + " return CsvDataset.builder()\n", + " .optCsvUrl(amazonReview) // load from Url\n", + " .setCsvFormat(CSVFormat.TDF.withQuote(null).withHeader()) // Setting TSV loading format\n", + " .setSampling(batchSize, true) // make sample size and random access\n", + " .optLimit(limit)\n", + " .addFeature(\n", + " new Feature(\n", + " \"review_body\", new BertFeaturizer(tokenizer, maxLength)))\n", + " .addLabel(\n", + " new Feature(\n", + " \"star_rating\", (buf, data) -> buf.put(Float.parseFloat(data) - 1.0f)))\n", + " .optDataBatchifier(\n", + " PaddingStackBatchifier.builder()\n", + " .optIncludeValidLengths(false)\n", + " .addPad(0, 0, (m) -> m.ones(new Shape(1)).mul(paddingToken))\n", + " .build()) // define how to pad dataset to a fix length\n", + " .build();\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Construct your model\n", + "\n", + "We will load our pretrained model and prepare the classification. First construct the `criteria` to specify where to load the embedding (DistiledBERT), then call `loadModel` to download that embedding with pre-trained weights. Since this model is built without classification layer, we need to add a classification layer to the end of the model and train it. After you are done modifying the block, set it back to model using `setBlock`.\n", + "\n", + "### Load the word embedding\n", + "\n", + "We will download our word embedding and load it to memory (this may take a while)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// MXNet base model\n", + "String modelUrls = \"https://resources.djl.ai/test-models/distilbert.zip\";\n", + "if (\"PyTorch\".equals(Engine.getInstance().getEngineName())) {\n", + " modelUrls = \"https://resources.djl.ai/test-models/traced_distilbert_wikipedia_uncased.zip\";\n", + "}\n", + "\n", + "Criteria criteria = Criteria.builder()\n", + " .optApplication(Application.NLP.WORD_EMBEDDING)\n", + " .setTypes(NDList.class, NDList.class)\n", + " .optModelUrls(modelUrls)\n", + " .optProgress(new ProgressBar())\n", + " .build();\n", + "ZooModel embedding = criteria.loadModel();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create classification layers\n", + "\n", + "Then let's build a simple MLP layer to classify the ranks. We set the output of last FullyConnected (Linear) layer to 5 to get the predictions for star 1 to 5. Then all we need to do is to load the block into the model. Before applying the classification layer, we also need to add text embedding to the front. In our case, we just create a Lambda function that do the followings:\n", + "\n", + "1. batch_data (batch size, token indices) -> batch_data + max_length (size of the token indices)\n", + "2. generate embedding" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Predictor embedder = embedding.newPredictor();\n", + "Block classifier = new SequentialBlock()\n", + " // text embedding layer\n", + " .add(\n", + " ndList -> {\n", + " NDArray data = ndList.singletonOrThrow();\n", + " NDList inputs = new NDList();\n", + " long batchSize = data.getShape().get(0);\n", + " float maxLength = data.getShape().get(1);\n", + "\n", + " if (\"PyTorch\".equals(Engine.getInstance().getEngineName())) {\n", + " inputs.add(data.toType(DataType.INT64, false));\n", + " inputs.add(data.getManager().full(data.getShape(), 1, DataType.INT64));\n", + " inputs.add(data.getManager().arange(maxLength)\n", + " .toType(DataType.INT64, false)\n", + " .broadcast(data.getShape()));\n", + " } else {\n", + " inputs.add(data);\n", + " inputs.add(data.getManager().full(new Shape(batchSize), maxLength));\n", + " }\n", + " // run embedding\n", + " try {\n", + " return embedder.predict(inputs);\n", + " } catch (TranslateException e) {\n", + " throw new IllegalArgumentException(\"embedding error\", e);\n", + " }\n", + " })\n", + " // classification layer\n", + " .add(Linear.builder().setUnits(768).build()) // pre classifier\n", + " .add(Activation::relu)\n", + " .add(Dropout.builder().optRate(0.2f).build())\n", + " .add(Linear.builder().setUnits(5).build()) // 5 star rating\n", + " .addSingleton(nd -> nd.get(\":,0\")); // Take [CLS] as the head\n", + "Model model = Model.newInstance(\"AmazonReviewRatingClassification\");\n", + "model.setBlock(classifier);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Start Training\n", + "\n", + "Finally, we can start building our training pipeline to train the model.\n", + "\n", + "### Creating Training and Testing dataset\n", + "\n", + "Firstly, we need to create a voabulary that is used to map token to index such as \"hello\" to 1121 (1121 is the index of \"hello\" in dictionary). Then we simply feed the vocabulary to the tokenizer that used to tokenize the sentence. Finally, we just need to split the dataset based on the ratio.\n", + "\n", + "Note: we set the cut-off length to 64 which means only the first 64 tokens from the review will be used. You can increase this value to achieve better accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// Prepare the vocabulary\n", + "DefaultVocabulary vocabulary = DefaultVocabulary.builder()\n", + " .addFromTextFile(embedding.getArtifact(\"vocab.txt\"))\n", + " .optUnknownToken(\"[UNK]\")\n", + " .build();\n", + "// Prepare dataset\n", + "int maxTokenLength = 64; // cutoff tokens length\n", + "int batchSize = 8;\n", + "int limit = Integer.MAX_VALUE;\n", + "// int limit = 512; // uncomment for quick testing\n", + "\n", + "BertFullTokenizer tokenizer = new BertFullTokenizer(vocabulary, true);\n", + "CsvDataset amazonReviewDataset = getDataset(batchSize, tokenizer, maxTokenLength, limit);\n", + "// split data with 7:3 train:valid ratio\n", + "RandomAccessDataset[] datasets = amazonReviewDataset.randomSplit(7, 3);\n", + "RandomAccessDataset trainingSet = datasets[0];\n", + "RandomAccessDataset validationSet = datasets[1];" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup Trainer and training config\n", + "\n", + "Then, we need to setup our trainer. We set up the accuracy and loss function. The model training logs will be saved to `build/modlel`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "SaveModelTrainingListener listener = new SaveModelTrainingListener(\"build/model\");\n", + " listener.setSaveModelCallback(\n", + " trainer -> {\n", + " TrainingResult result = trainer.getTrainingResult();\n", + " Model model = trainer.getModel();\n", + " // track for accuracy and loss\n", + " float accuracy = result.getValidateEvaluation(\"Accuracy\");\n", + " model.setProperty(\"Accuracy\", String.format(\"%.5f\", accuracy));\n", + " model.setProperty(\"Loss\", String.format(\"%.5f\", result.getValidateLoss()));\n", + " });\n", + "DefaultTrainingConfig config = new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss()) // loss type\n", + " .addEvaluator(new Accuracy())\n", + " .optDevices(Engine.getInstance().getDevices(1)) // train using single GPU\n", + " .addTrainingListeners(TrainingListener.Defaults.logging(\"build/model\"))\n", + " .addTrainingListeners(listener);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Start training\n", + "\n", + "We will start our training process. Training on GPU will takes approximately 10 mins. For CPU, it will take more than 2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "int epoch = 2;\n", + "\n", + "Trainer trainer = model.newTrainer(config);\n", + "trainer.setMetrics(new Metrics());\n", + "Shape encoderInputShape = new Shape(batchSize, maxTokenLength);\n", + "// initialize trainer with proper input shape\n", + "trainer.initialize(encoderInputShape);\n", + "EasyTrain.fit(trainer, epoch, trainingSet, validationSet);\n", + "System.out.println(trainer.getTrainingResult());" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save the model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model.save(Paths.get(\"build/model\"), \"amazon-review.param\");" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Verify the model\n", + "\n", + "We can create a predictor from the model to run inference on our customized dataset. Firstly, we can create a `Translator` for the model to do preprocessing and post processing. Similar to what we have done before, we need to tokenize the input sentence and get the output ranking." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class MyTranslator implements Translator {\n", + "\n", + " private BertFullTokenizer tokenizer;\n", + " private Vocabulary vocab;\n", + " private List ranks;\n", + "\n", + " public MyTranslator(BertFullTokenizer tokenizer) {\n", + " this.tokenizer = tokenizer;\n", + " vocab = tokenizer.getVocabulary();\n", + " ranks = Arrays.asList(\"1\", \"2\", \"3\", \"4\", \"5\");\n", + " }\n", + "\n", + " @Override\n", + " public Batchifier getBatchifier() { return Batchifier.STACK; }\n", + "\n", + " @Override\n", + " public NDList processInput(TranslatorContext ctx, String input) {\n", + " List tokens = tokenizer.tokenize(input);\n", + " float[] indices = new float[tokens.size() + 2];\n", + " indices[0] = vocab.getIndex(\"[CLS]\");\n", + " for (int i = 0; i < tokens.size(); i++) {\n", + " indices[i+1] = vocab.getIndex(tokens.get(i));\n", + " }\n", + " indices[indices.length - 1] = vocab.getIndex(\"[SEP]\");\n", + " return new NDList(ctx.getNDManager().create(indices));\n", + " }\n", + "\n", + " @Override\n", + " public Classifications processOutput(TranslatorContext ctx, NDList list) {\n", + " return new Classifications(ranks, list.singletonOrThrow().softmax(0));\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, we can create a `Predictor` to run the inference. Let's try with a random customer review:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "String review = \"It works great, but it takes too long to update itself and slows the system\";\n", + "Predictor predictor = model.newPredictor(new MyTranslator(tokenizer));\n", + "\n", + "predictor.predict(review)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/jupyter/tensorflow/pneumonia_detection.ipynb b/jupyter/tensorflow/pneumonia_detection.ipynb new file mode 100644 index 00000000..76bcc28f --- /dev/null +++ b/jupyter/tensorflow/pneumonia_detection.ipynb @@ -0,0 +1,243 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Detecting Pneumonia from X-ray images using Deep Java Library" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*Disclaimer: this blog post is intended for educational purposes only. The application was developed using experimental code. The result should not be used for any medical diagnoses of pneumonia. This content has not been reviewed or approved by any scientists or medical professionals.*\n", + "\n", + "## Introduction\n", + "In this example, we demonstrate how deep learning (DL) can be used to detect pneumonia from chest X-ray images. This work is inspired by the [Chest X-ray Images Challenge](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia) on Kaggle and a related [paper](https://www.cell.com/cell/fulltext/S0092-8674\\(18\\)30154-5). In this notebook, we illustrates how artificial intelligence can assist clinical decision making with focus on enterprise deployment. This work leverages a model trained using Keras and TensorFlow with [this Kaggle kernel](https://www.kaggle.com/aakashnain/beating-everything-with-depthwise-convolution). In this blog post, we will focus on generating predictions with this model using [Deep Java Library](https://djl.ai/) (DJL), an open source library to build and deploy DL in Java." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparation\n", + "\n", + "This tutorial requires the installation of Java Kernel. To install the Java Kernel, see the [documentation](https://docs.djl.ai/jupyter/index.html).\n", + "\n", + "These are the dependencies we will use:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.tensorflow:tensorflow-api:0.24.0\n", + "%maven ai.djl.tensorflow:tensorflow-engine:0.24.0\n", + "%maven ai.djl.tensorflow:tensorflow-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%loadFromPOM\n", + "\n", + " com.google.protobuf\n", + " protobuf-java\n", + " 3.19.2\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Import java packages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.inference.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.modality.cv.*;\n", + "import ai.djl.modality.cv.util.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.translate.*;\n", + "import ai.djl.training.util.*;\n", + "import ai.djl.util.*;\n", + "import java.net.*;\n", + "import java.nio.file.*;\n", + "import java.util.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### set the model URL" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var modelUrl = \"https://resources.djl.ai/demo/pneumonia-detection-model/saved_model.zip\";" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Dive deep into Translator\n", + "\n", + "To successfully run inference, we need to define some preprocessing and post processing logic to achieve the best \n", + "prediction result and understandable output." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class MyTranslator implements Translator {\n", + "\n", + " private static final List CLASSES = Arrays.asList(\"Normal\", \"Pneumonia\");\n", + "\n", + " @Override\n", + " public NDList processInput(TranslatorContext ctx, Image input) {\n", + " NDManager manager = ctx.getNDManager();\n", + " NDArray array = input.toNDArray(manager, Image.Flag.COLOR);\n", + " array = NDImageUtils.resize(array, 224).div(255.0f);\n", + " return new NDList(array);\n", + " }\n", + "\n", + " @Override\n", + " public Classifications processOutput(TranslatorContext ctx, NDList list) {\n", + " NDArray probabilities = list.singletonOrThrow();\n", + " return new Classifications(CLASSES, probabilities);\n", + " }\n", + "\n", + " @Override\n", + " public Batchifier getBatchifier() {\n", + " return Batchifier.STACK;\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see above, the translator resizes the image to 224x224 and normalizes the image by dividing by 255 before feeding it into the model. When doing inference, you need to follow the same pre-processing procedure as was used during training. In this case, we need to match the Keras training code. After running prediction, the model outputs probabilities of each class as an [NDArray](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/ndarray/NDArray.html). We need to tell the predictor to translate it back to classes, namely “Normal” or \"Pneumonia\".\n", + "\n", + "Until this point, all preparation work is done, we can start working on the prediction logic." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Predict using DJL\n", + "\n", + "### Load the image\n", + "We are going to load an CT scanned image of an infected lung from internet " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var imagePath = \"https://resources.djl.ai/images/chest_xray.jpg\";\n", + "var image = ImageFactory.getInstance().fromUrl(imagePath);\n", + "image.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load your model\n", + "Next, we will download the model from `modelUrl`. This will download the model into the DJL cache location" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Criteria criteria =\n", + " Criteria.builder()\n", + " .setTypes(Image.class, Classifications.class)\n", + " .optModelUrls(modelUrl)\n", + " .optTranslator(new MyTranslator())\n", + " .optProgress(new ProgressBar())\n", + " .build();\n", + "ZooModel model = criteria.loadModel();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run inference\n", + "Lastly, we will need to create a predictor using our model and translator. Once we have a predictor, we simply need to call the predict method on our test image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Predictor predictor = model.newPredictor();\n", + "Classifications classifications = predictor.predict(image);\n", + "\n", + "classifications" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/tensorflow/rank_classification_using_BERT_on_Amazon_Review.ipynb b/jupyter/tensorflow/rank_classification_using_BERT_on_Amazon_Review.ipynb new file mode 100644 index 00000000..e4563535 --- /dev/null +++ b/jupyter/tensorflow/rank_classification_using_BERT_on_Amazon_Review.ipynb @@ -0,0 +1,267 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Rank Classification using BERT on Amazon Review\n", + "\n", + "## Introduction\n", + "\n", + "In this tutorial, you learn how to use a pre-trained Tensorflow model to classifiy a Amazon Review rank. The model was refined on Amazon Review dataset with a pretrained DistilBert model.\n", + "\n", + "### About the dataset and model\n", + "\n", + "[Amazon Customer Review dataset](https://s3.amazonaws.com/amazon-reviews-pds/readme.html) consists of all different valid reviews from amazon.com. We will use the \"Digital_software\" category that consists of 102k valid reviews. As for the pre-trained model, use the DistilBERT[[1]](https://arxiv.org/abs/1910.01108) model. It's a light-weight BERT model already trained on [Wikipedia text corpora](https://en.wikipedia.org/wiki/List_of_text_corpora), a much larger dataset consisting of over millions text. The DistilBERT served as a base layer and we will add some more classification layers to output as rankings (1 - 5).\n", + "\n", + "\n", + "
Amazon Review example
\n", + "\n", + "\n", + "## Pre-requisites\n", + "This tutorial assumes you have the following knowledge. Follow the READMEs and tutorials if you are not familiar with:\n", + "1. How to setup and run [Java Kernel in Jupyter Notebook](https://docs.djl.ai/docs/demos/jupyter/index.html)\n", + "2. Basic components of Deep Java Library, and how to [train your first model](https://docs.djl.ai/docs/demos/jupyter/tutorial/02_train_your_first_model.html).\n", + "\n", + "\n", + "## Getting started\n", + "Load the Deep Java Libarary and its dependencies from Maven. In here, you can choose between MXNet or PyTorch. MXNet is enabled by default. You can uncomment PyTorch dependencies and comment MXNet ones to switch to PyTorch." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl.tensorflow:tensorflow-engine:0.24.0\n", + "%maven ai.djl.tensorflow:tensorflow-api:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%loadFromPOM\n", + "\n", + " com.google.protobuf\n", + " protobuf-java\n", + " 3.19.2\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's import the necessary modules:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.*;\n", + "import ai.djl.engine.*;\n", + "import ai.djl.inference.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.modality.nlp.*;\n", + "import ai.djl.modality.nlp.bert.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.translate.*;\n", + "import ai.djl.training.util.*;\n", + "import ai.djl.util.*;\n", + "\n", + "import java.io.*;\n", + "import java.nio.file.*;\n", + "import java.util.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare your model files\n", + "\n", + "You can download pre-trained Tensorflow model from: https://resources.djl.ai/demo/tensorflow/amazon_review_rank_classification.zip." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "String modelUrl = \"https://resources.djl.ai/demo/tensorflow/amazon_review_rank_classification.zip\";\n", + "DownloadUtils.download(modelUrl, \"build/amazon_review_rank_classification.zip\", new ProgressBar());\n", + "Path zipFile = Paths.get(\"build/amazon_review_rank_classification.zip\");\n", + "\n", + "Path modelDir = Paths.get(\"build/saved_model\");\n", + "if (Files.notExists(modelDir)) {\n", + " ZipUtils.unzip(Files.newInputStream(zipFile), modelDir); \n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Translator\n", + "\n", + "Inference in deep learning is the process of predicting the output for a given input based on a pre-defined model.\n", + "DJL abstracts away the whole process for ease of use. It can load the model, perform inference on the input, and provide output.\n", + "\n", + "The [`Translator`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/translate/Translator.html) interface is used to: Pre-processing and Post-processing. The pre-processing\n", + "component converts the user-defined input objects into an NDList, so that the [`Predictor`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html) in DJL can understand the\n", + "input and make its prediction. Similarly, the post-processing block receives an NDList as the output from the\n", + "`Predictor`. The post-processing block allows you to convert the output from the `Predictor` to the desired output\n", + "format.\n", + "\n", + "### Pre-processing\n", + "\n", + "Now, you need to convert the sentences into tokens. We provide a powerful tool [`BertTokenizer`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/modality/nlp/bert/BertTokenizer.html) that you can use to convert questions and answers into tokens, and batchify your sequence together. Once you have properly formatted tokens, you can use [`Vocabulary`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/modality/nlp/Vocabulary.html) to map your token to BERT index.\n", + "\n", + "The following code block demonstrates tokenizing the question and answer defined earlier into BERT-formatted tokens.\n", + "\n", + "In the zip file, we also bundled the BERT `vocab.txt` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// Prepare the vocabulary\n", + "Path vocabFile = modelDir.resolve(\"vocab.txt\");\n", + "DefaultVocabulary vocabulary = DefaultVocabulary.builder()\n", + " .optMinFrequency(1)\n", + " .addFromTextFile(vocabFile)\n", + " .optUnknownToken(\"[UNK]\")\n", + " .build();\n", + "BertFullTokenizer tokenizer = new BertFullTokenizer(vocabulary, true);\n", + "int maxTokenLength = 64; // cutoff tokens length\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class MyTranslator implements Translator {\n", + "\n", + " private BertFullTokenizer tokenizer;\n", + " private Vocabulary vocab;\n", + " private List ranks;\n", + " private int length;\n", + "\n", + " public MyTranslator(BertFullTokenizer tokenizer, int length) {\n", + " this.tokenizer = tokenizer;\n", + " this.length = length;\n", + " vocab = tokenizer.getVocabulary();\n", + " ranks = Arrays.asList(\"1\", \"2\", \"3\", \"4\", \"5\");\n", + " }\n", + "\n", + " @Override\n", + " public Batchifier getBatchifier() {\n", + " return Batchifier.STACK;\n", + " }\n", + "\n", + " @Override\n", + " public NDList processInput(TranslatorContext ctx, String input) {\n", + " List tokens = tokenizer.tokenize(input);\n", + " long[] indices = new long[length];\n", + " long[] mask = new long[length];\n", + " long[] segmentIds = new long[length];\n", + " int size = Math.min(length, tokens.size());\n", + " for (int i = 0; i < size; i++) {\n", + " indices[i + 1] = vocab.getIndex(tokens.get(i));\n", + " }\n", + " Arrays.fill(mask, 0, size, 1);\n", + " NDManager m = ctx.getNDManager();\n", + " return new NDList(m.create(indices), m.create(mask), m.create(segmentIds));\n", + " }\n", + "\n", + " @Override\n", + " public Classifications processOutput(TranslatorContext ctx, NDList list) {\n", + " return new Classifications(ranks, list.singletonOrThrow().softmax(0));\n", + " }\n", + "}\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load your model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "MyTranslator translator = new MyTranslator(tokenizer, maxTokenLength);\n", + "\n", + "Criteria criteria = Criteria.builder()\n", + " .setTypes(String.class, Classifications.class)\n", + " .optModelPath(modelDir) // Load model form model directory\n", + " .optTranslator(translator) // use custom translaotr \n", + " .build();\n", + "\n", + "ZooModel model = criteria.loadModel();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Run inference\n", + "\n", + "Lastly, we will need to create a predictor using our model and translator. Once we have a predictor, we simply need to call the predict method on our test image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "String review = \"It works great, but it takes too long to update itself and slows the system\";\n", + "\n", + "Predictor predictor = model.newPredictor();\n", + "Classifications classifications = predictor.predict(review);\n", + "\n", + "classifications" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/jupyter/tensorflow_lite/inference_with_tensorflow_lite.ipynb b/jupyter/tensorflow_lite/inference_with_tensorflow_lite.ipynb new file mode 100644 index 00000000..0b42d247 --- /dev/null +++ b/jupyter/tensorflow_lite/inference_with_tensorflow_lite.ipynb @@ -0,0 +1,156 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Inference with Tensorflow Lite\n", + "\n", + "In this tutorial, you learn how to load an existing TensorFlow Lite model and use it to run a prediction task.\n", + "\n", + "\n", + "## Preparation\n", + "\n", + "This tutorial requires the installation of Java Kernel. For more information on installing the Java Kernel, see the [README](https://docs.djl.ai/docs/demos/jupyter/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl:model-zoo:0.24.0\n", + "%maven ai.djl.tflite:tflite-engine:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32\n", + "\n", + "// Use secondary engine to help pre-processing and post-processing\n", + "%maven ai.djl.pytorch:pytorch-engine:0.24.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import java.awt.image.*;\n", + "import java.nio.file.*;\n", + "import ai.djl.*;\n", + "import ai.djl.inference.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.modality.cv.*;\n", + "import ai.djl.modality.cv.util.*;\n", + "import ai.djl.modality.cv.transform.*;\n", + "import ai.djl.modality.cv.translator.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.translate.*;\n", + "import ai.djl.training.util.*;\n", + "import ai.djl.util.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Load your Tensorflow Lite mode from DJL model zoo" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Criteria criteria = Criteria.builder()\n", + " .setTypes(Image.class, Classifications.class)\n", + " .optEngine(\"TFLite\")\n", + " .optFilter(\"dataset\", \"aiyDish\")\n", + " .build();\n", + "ZooModel model = criteria.loadModel();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Create a Predictor" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Predictor predictor = model.newPredictor();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Load image for classification" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var img = ImageFactory.getInstance().fromUrl(\"https://resources.djl.ai/images/sachertorte.jpg\");\n", + "\n", + "img.getWrappedImage()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4: Run inference" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Classifications classifications = predictor.predict(img);\n", + "\n", + "classifications" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "Now, you can load Tensorflow Lite model and run inference.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/test_notebook.sh b/jupyter/test_notebook.sh new file mode 100755 index 00000000..a4cd2166 --- /dev/null +++ b/jupyter/test_notebook.sh @@ -0,0 +1,26 @@ +#!/usr/bin/env bash + +# test_notebook.sh [filename] +# If no filename is passed, it runs all files in current directory and subdirectories + +set -e + +function run_test { + base=$(basename $1) + # Workaround on crashes + if [[ "$base" == transfer_learning_on_cifar10* || "$base" == rank_classification_using_BERT* ]]; then + jupyter nbconvert --to notebook --inplace $1 + else + jupyter nbconvert --to notebook --execute --ExecutePreprocessor.timeout=600 --inplace $1 + fi +} + +if [[ $# -eq 0 ]]; then + for f in {**,.}/*.ipynb + do + dir=$(dirname f) + run_test "$f" + done +else + run_test $1 +fi diff --git a/jupyter/transfer_learning_on_cifar10.ipynb b/jupyter/transfer_learning_on_cifar10.ipynb new file mode 100644 index 00000000..7666566c --- /dev/null +++ b/jupyter/transfer_learning_on_cifar10.ipynb @@ -0,0 +1,285 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Transfer Learning on CIFAR-10 Dataset\n", + "\n", + "\n", + "## Introduction\n", + "\n", + "In this tutorial, you learn how to train an image classification model using [Transfer Learning](https://en.wikipedia.org/wiki/Transfer_learning). Transfer learning is a popular machine learning technique that uses a model trained on one problem and applies it to a second related problem. Compared to training from scratch or designing a model for your specific problem, transfer learning can leverage the features already learned on a similar problem and produce a more robust model in a much shorter time.\n", + "\n", + "Train your model with the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset which consists of 60,000 32x32 color images in 10 classes. As for the pre-trained model, use the ResNet50v1[1] model. It's a 50 layer deep model already trained on [ImageNet](http://www.image-net.org/), a much larger dataset consisting of over 1.2 million images in 1000 classes. Modify it to classify 10 classes from the CIFAR-10 dataset.\n", + "\n", + "![The CIFAR-10 Dataset](https://resources.djl.ai/images/cifar-10.png)\n", + "
the CIFAR10 dataset
\n", + "\n", + "\n", + "## Pre-requisites\n", + "This tutorial assumes you have the following knowledge. Follow the READMEs and tutorials if you are not familiar with:\n", + "1. How to setup and run [Java Kernel in Jupyter Notebook](https://docs.djl.ai/docs/demos/jupyter/index.html)\n", + "2. Basic components of Deep Java Library, and how to [train your first model](https://docs.djl.ai/docs/demos/jupyter/tutorial/02_train_your_first_model.html).\n", + "\n", + "\n", + "## Getting started\n", + "Load the Deep Java Libarary and its dependencies from Maven:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl:basicdataset:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-engine:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's import the necessary modules." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.*;\n", + "import ai.djl.basicdataset.cv.classification.*;\n", + "import ai.djl.engine.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.modality.cv.*;\n", + "import ai.djl.modality.cv.transform.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.ndarray.types.*;\n", + "import ai.djl.nn.*;\n", + "import ai.djl.nn.core.*;\n", + "import ai.djl.repository.zoo.*;\n", + "import ai.djl.training.*;\n", + "import ai.djl.training.dataset.*;\n", + "import ai.djl.training.initializer.*;\n", + "import ai.djl.training.listener.*;\n", + "import ai.djl.training.loss.*;\n", + "import ai.djl.training.evaluator.*;\n", + "import ai.djl.training.optimizer.*;\n", + "import ai.djl.training.tracker.*;\n", + "import ai.djl.training.util.*;\n", + "import ai.djl.translate.*;\n", + "import java.nio.file.*;\n", + "import java.util.*;\n", + "import java.util.concurrent.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Construct your model\n", + "\n", + "Load the pre-trained ResNet50V1 model. You can find it in the [Model Zoo](https://github.com/deepjavalibrary/djl/blob/master/docs/model-zoo.md). First construct the `criteria` to specify which ResNet model to load, then call `loadModel` to get a ResNet50V1 model with pre-trained weights. Note this model was trained on ImageNet with 1000 classes; the last layer is a Linear layer with 1000 output channels. Because you are repurposing it on CIFAR10 with 10 classes, you need to remove the last layer and add a new Linear layer with 10 output channels. After you are done modifying the block, set it back to model using `setBlock`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// load model and change last layer\n", + "Criteria criteria = Criteria.builder()\n", + " .setTypes(Image.class, Classifications.class)\n", + " .optProgress(new ProgressBar())\n", + " .optArtifactId(\"resnet\")\n", + " .optFilter(\"layers\", \"50\")\n", + " .optFilter(\"flavor\", \"v1\").build();\n", + "Model model = criteria.loadModel();\n", + "SequentialBlock newBlock = new SequentialBlock();\n", + "SymbolBlock block = (SymbolBlock) model.getBlock();\n", + "block.removeLastBlock();\n", + "newBlock.add(block);\n", + "newBlock.add(Blocks.batchFlattenBlock());\n", + "newBlock.add(Linear.builder().setUnits(10).build());\n", + "model.setBlock(newBlock);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare Dataset\n", + "\n", + "After you have the model, the next step is to prepare the dataset for training. You can construct a CIFAR10 builder with your own specifications. You have the options to get the train or test dataset, specify desired batch size, specify whether to shuffle your data during training, and most importantly, specify the pre-process pipeline. \n", + "\n", + "A pipeline consists of a series of transformations to apply on the input data before feeding it to the model. \n", + "\n", + "For example, `ToTensor` can be used to transform colored image NDArrays with shape (32, 32, 3) and values from 0 to 256 to NDArrays with shape (3, 32, 32) and values from 0 to 1. This operation is transposing image data from channels last to channels first format, which is more suitable for GPU computation. \n", + "\n", + "The `Normalize` transformation can normalize input data according to their mean and standard deviation values. This will make different features have similar range and help our model perform better." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "int batchSize = 32;\n", + "int limit = Integer.MAX_VALUE; // change this to a small value for a dry run\n", + "// int limit = 160; // limit 160 records in the dataset for a dry run\n", + "Pipeline pipeline = new Pipeline(\n", + " new ToTensor(),\n", + " new Normalize(new float[] {0.4914f, 0.4822f, 0.4465f}, new float[] {0.2023f, 0.1994f, 0.2010f}));\n", + "Cifar10 trainDataset = \n", + " Cifar10.builder()\n", + " .setSampling(batchSize, true)\n", + " .optUsage(Dataset.Usage.TRAIN)\n", + " .optLimit(limit)\n", + " .optPipeline(pipeline)\n", + " .build();\n", + "trainDataset.prepare(new ProgressBar());" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up training configuration\n", + "\n", + "You are leveraging a pre-trained model, so you can expect the model to converge quickly. You will only train only ten epochs. As the model converges, you need to reduce the learning rate to get better results. You can use a `Tracker` to reduce the learning rate by 0.1 after two, five, and eight epochs. \n", + "\n", + "Deep Java Library supports training on multiple GPUs. You can use `setDevices` and pass an array of devices you want the model to be trained on. For example, `new Device[]{Device.gpu(0), Device.gpu(1)}` for training on GPU0 and GPU1. You can also call `Engine.getInstancec().getDevices(4)` and pass the number of GPUs you want to train. It will start with GPU0, and use CPU if no GPU is available. To learn more about multi-GPU training, read our multi-GPU [documentation](https://github.com/deepjavalibrary/djl/tree/master/examples/docs).\n", + "\n", + "To complete the training configuration set up, use the `Adam` optimizer, `SoftmaxCrossEntropyLoss`, and `Accuracy` for classification problems." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DefaultTrainingConfig config = new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss())\n", + " //softmaxCrossEntropyLoss is a standard loss for classification problems\n", + " .addEvaluator(new Accuracy()) // Use accuracy so we humans can understand how accurate the model is\n", + " .optDevices(Engine.getInstance().getDevices(1)) // Limit your GPU, using more GPU actually will slow down coverging\n", + " .addTrainingListeners(TrainingListener.Defaults.logging());\n", + "\n", + "// Now that we have our training configuration, we should create a new trainer for our model\n", + "Trainer trainer = model.newTrainer(config);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train your model\n", + "Now you can start training. This procedure is similar to the one in [Train Your First Model](https://docs.djl.ai/docs/demos/jupyter/tutorial/02_train_your_first_model.html). Training requires the following steps:\n", + "1. Initialize a new trainer using the training config you just set up\n", + "2. Initialize the weights in trainer\n", + "3. Using a `for` loop to iterate through the whole dataset 10 times (epochs), resetting the evaluators at the end of each epoch\n", + "4. During each epoch, using a `for` loop to iterate through the dataset in batches and train batch by batch while printing the training accuracy on the progress bar." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "int epoch = 10;\n", + "Shape inputShape = new Shape(1, 3, 32, 32);\n", + "trainer.initialize(inputShape);" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for (int i = 0; i < epoch; ++i) {\n", + " int index = 0;\n", + " for (Batch batch : trainer.iterateDataset(trainDataset)) {\n", + " EasyTrain.trainBatch(trainer, batch);\n", + " trainer.step();\n", + " batch.close();\n", + " }\n", + "\n", + " // reset training and validation evaluators at end of epoch\n", + " trainer.notifyListeners(listener -> listener.onEpoch(trainer));\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Save your model\n", + "\n", + "Finally, you can save your model after training is done and use it for inference." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Path modelDir = Paths.get(\"build/resnet\");\n", + "Files.createDirectories(modelDir);\n", + "\n", + "model.setProperty(\"Epoch\", String.valueOf(epoch));\n", + "model.save(modelDir, \"resnet\");" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What's next\n", + "\n", + "1. Try inference using the model you just trained. You can find an airplane image in [test resources](https://github.com/deepjavalibrary/djl/blob/master/examples/src/test/resources/airplane1.png) and follow the inference tutorials in the [Jupyter module](https://docs.djl.ai/docs/demos/jupyter).\n", + "\n", + "2. Follow the complete example with multi-GPU support, a validation dataset, and the fit API in the [examples module](https://github.com/deepjavalibrary/djl/tree/master/examples/docs).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## References\n", + "[1] [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)\n", + "\n", + "[2] [Gluon CV model zoo](https://gluon-cv.mxnet.io/model_zoo/classification.html) for pre-trained ResNet50 models" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/tutorial/01_create_your_first_network.ipynb b/jupyter/tutorial/01_create_your_first_network.ipynb new file mode 100644 index 00000000..aba216cc --- /dev/null +++ b/jupyter/tutorial/01_create_your_first_network.ipynb @@ -0,0 +1,206 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Create your first deep learning neural network\n", + "\n", + "## Introduction\n", + "\n", + "This is the first part of our [beginner tutorial series](https://docs.djl.ai/docs/demos/jupyter/tutorial) that will take you through creating, training, and running inference on a neural network. In this part, you will learn how to use the built-in `Block` to create your first neural network - a Multilayer Perceptron.\n", + "\n", + "## Step 1: Setup development environment\n", + "\n", + "### Installation\n", + "\n", + "This tutorial requires the installation of the Java Jupyter Kernel. To install the kernel, see the [Jupyter README](https://docs.djl.ai/docs/demos/jupyter/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// Add the snapshot repository to get the DJL snapshot artifacts\n", + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "// Add the maven dependencies\n", + "%maven ai.djl:api:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ai.djl.*;\n", + "import ai.djl.nn.*;\n", + "import ai.djl.nn.core.*;\n", + "import ai.djl.training.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Neural Network\n", + "\n", + "A neural network is a black box function. Instead of coding this function yourself, you provide many sample input/output pairs for this function. Then, we try to train the network to learn how to best approximate the observed behavior of the function given only these input/output pairs. A better model with more data can more accurately approximate the function.\n", + "\n", + "## Application\n", + "\n", + "The first thing to figure out when trying to build a neural network, like building most functions, is what your function signature is. What are your input types and output types? Because most models use relatively consistent signatures, we refer to them as [Applications](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/Application.html). Within the Applications interface, you can find a list of some of the more common model applications used in deep learning.\n", + "\n", + "In this tutorial, we will focus on the image classification application. It is one of the most common first applications and has a significant history with deep learning. In image classification, the input is a single image and it is classified based on the main subject of the image into a number of different possible classes. The classes for the image depend on the specific data you are training with." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Application application = Application.CV.IMAGE_CLASSIFICATION;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataset\n", + "\n", + "Once you have figured out what application you want to learn, next you need to collect the data you are training with and form it into a dataset. Often, trying to collect and clean up the data is the most troublesome task in the deep learning process. \n", + "\n", + "Using a dataset can either involve collecting custom data from various sources or using one of the many datasets freely available online. The custom data may better suit your use case, but a free dataset is often faster and easier to use. You can read our [dataset guide](http://docs.djl.ai/docs/dataset.html) to learn more about datasets.\n", + "\n", + "### MNIST\n", + "\n", + "The dataset we will be using is [MNIST](https://en.wikipedia.org/wiki/MNIST_database), a database of handwritten digits. Each image contains a black and white digit from 0 to 9 in a 28x28 image. It is commonly used when getting started with deep learning because it is small and fast to train.\n", + "\n", + "![Mnist Image](https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png)\n", + "\n", + "Once you understand your dataset, you should create an implementation of the [Dataset class](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/dataset/Dataset.html). In this case, we provide the MNIST dataset built-in to make it easy for you to use it.\n", + "\n", + "## Multilayer Perceptron\n", + "\n", + "Now that we have our dataset, we can choose a model to train with it. For this tutorial, we will build one of the simplest and oldest deep learning networks: a Multilayer Perceptron (MLP).\n", + "\n", + "The MLP is organized into layers. The first layer is the input layer which contains your input data and the last layer is the output layer which produces the final result of the network. Between them are layers referred to as hidden layers. Having more hidden layers and larger hidden layers allows the MLP to represent more complex functions.\n", + "\n", + "The example below contains an input of size 3, a single hidden layer of size 3, and an output of size 2. The number and sizes of the hidden layers are usually determined through experimentation. Between each pair of layers is a linear operation (sometimes called a FullyConnected operation because each number in the input is connected to each number in the output by a matrix multiplication). Not pictured, there is also a non-linear activation function after each linear operation. For more information, see the [Multilayer Perceptron chapter of the D2l DJL book](https://d2l.djl.ai/chapter_multilayer-perceptrons/index.html).\n", + "\n", + "![MLP Image](https://upload.wikimedia.org/wikipedia/commons/c/c2/MultiLayerNeuralNetworkBigger_english.png)\n", + "\n", + "\n", + "## Step 2: Determine your input and output size\n", + "\n", + "The MLP model uses a one dimensional vector as the input and the output. You should determine the appropriate size of this vector based on your input data and what you will use the output of the model for.\n", + "\n", + "Our input vector will have size `28x28` because the MNIST input images have a height and width of 28 and it takes only a single number to represent each pixel. For a color image, you would need to further multiply this by `3` for the RGB channels.\n", + "\n", + "Our output vector has size `10` because there are `10` possible classes (0 to 9) for each image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "long inputSize = 28*28;\n", + "long outputSize = 10;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Create a **SequentialBlock**\n", + "\n", + "### NDArray\n", + "\n", + "The core data type used for working with deep learning is the [NDArray](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/ndarray/NDArray.html). An NDArray represents a multidimensional, fixed-size homogeneous array. It has very similar behavior to the Numpy python package with the addition of efficient computing. We also have a helper class, the [NDList](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/ndarray/NDList.html) which is a list of NDArrays which can have different sizes and data types.\n", + "\n", + "### Block API\n", + "\n", + "In DJL, [Blocks](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/nn/Block.html) serve a purpose similar to functions that convert an input `NDList` to an output `NDList`. They can represent single operations, parts of a neural network, and even the whole neural network. What makes blocks special is that they contain a number of parameters that are used in their function and are trained during deep learning. As these parameters are trained, the function represented by the blocks get more and more accurate.\n", + "\n", + "When building these block functions, the easiest way is to use composition. Similar to how functions are built by calling other functions, blocks can be built by combining other blocks. We refer to the containing block as the parent and the sub-blocks as the children.\n", + "\n", + "\n", + "We provide several helpers to make it easy to build common block composition structures. For the MLP we will use the [SequentialBlock](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/nn/SequentialBlock.html), a container block whose children form a chain of blocks where each child block feeds its output to the next child block in a sequence.\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "SequentialBlock block = new SequentialBlock();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4: Add blocks to SequentialBlock\n", + "\n", + "An MLP is organized into several layers. Each layer is composed of a [Linear Block](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/nn/core/Linear.html) and a non-linear activation function. If we just had two linear blocks in a row, it would be the same as a combined linear block ($f(x) = W_2(W_1x) = (W_2W_1)x = W_{combined}x$). An activation is used to intersperse between the linear blocks to allow them to represent non-linear functions. We will use the popular [ReLU](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/nn/Activation.html#reluBlock()) as our activation function.\n", + "\n", + "The first layer and last layers have fixed sizes depending on your desired input and output size. However, you are free to choose the number and sizes of the middle layers in the network. We will create a smaller MLP with two middle layers that gradually decrease the size. Typically, you would experiment with different values to see what works the best on your data set." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "block.add(Blocks.batchFlattenBlock(inputSize));\n", + "block.add(Linear.builder().setUnits(128).build());\n", + "block.add(Activation::relu);\n", + "block.add(Linear.builder().setUnits(64).build());\n", + "block.add(Activation::relu);\n", + "block.add(Linear.builder().setUnits(outputSize).build());\n", + "\n", + "block" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "Now that you've successfully created your first neural network, you can use this network to train your model.\n", + "\n", + "Next chapter: [Train your first model](02_train_your_first_model.ipynb)\n", + "\n", + "You can find the complete source code for this tutorial in the [model zoo](https://github.com/deepjavalibrary/djl/blob/master/model-zoo/src/main/java/ai/djl/basicmodelzoo/basic/Mlp.java)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/tutorial/02_train_your_first_model.ipynb b/jupyter/tutorial/02_train_your_first_model.ipynb new file mode 100644 index 00000000..b5a6d895 --- /dev/null +++ b/jupyter/tutorial/02_train_your_first_model.ipynb @@ -0,0 +1,243 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Train your first model\n", + "\n", + "This is the second of our [beginner tutorial series](https://docs.djl.ai/docs/demos/jupyter/tutorial) that will take you through creating, training, and running inference on a neural network. In this tutorial, you will learn how to train an image classification model that can recognize handwritten digits.\n", + "\n", + "## Preparation\n", + "\n", + "This tutorial requires the installation of the Java Jupyter Kernel. To install the kernel, see the [Jupyter README](https://docs.djl.ai/docs/demos/jupyter/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// Add the snapshot repository to get the DJL snapshot artifacts\n", + "%mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "// Add the maven dependencies\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl:basicdataset:0.24.0\n", + "%maven ai.djl:model-zoo:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-engine:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import java.nio.file.*;\n", + "\n", + "import ai.djl.*;\n", + "import ai.djl.basicdataset.cv.classification.Mnist;\n", + "import ai.djl.ndarray.types.*;\n", + "import ai.djl.training.*;\n", + "import ai.djl.training.dataset.*;\n", + "import ai.djl.training.initializer.*;\n", + "import ai.djl.training.loss.*;\n", + "import ai.djl.training.listener.*;\n", + "import ai.djl.training.evaluator.*;\n", + "import ai.djl.training.optimizer.*;\n", + "import ai.djl.training.util.*;\n", + "import ai.djl.basicmodelzoo.cv.classification.*;\n", + "import ai.djl.basicmodelzoo.basic.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 1: Prepare MNIST dataset for training\n", + "\n", + "In order to train, you must create a [Dataset class](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/dataset/Dataset.html) to contain your training data. A dataset is a collection of sample input/output pairs for the function represented by your neural network. Each single input/output is represented by a [Record](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/dataset/Record.html). Each record could have multiple arrays of inputs or outputs such as an image question and answer dataset where the input is both an image and a question about the image while the output is the answer to the question.\n", + "\n", + "Because data learning is highly parallelizable, training is often done not with a single record at a time, but a [Batch](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/dataset/Batch.html). This can lead to significant performance gains, especially when working with images\n", + "\n", + "## Sampler\n", + "\n", + "Then, we must decide the parameters for loading data from the dataset. The only parameter we need for MNIST is the choice of [Sampler](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/dataset/Sampler.html). The sampler decides which and how many element from datasets are part of each batch when iterating through it. We will have it randomly shuffle the elements for the batch and use a batchSize of 32. The batchSize is usually the largest power of 2 that fits within memory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "int batchSize = 32;\n", + "Mnist mnist = Mnist.builder().setSampling(batchSize, true).build();\n", + "mnist.prepare(new ProgressBar());" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 2: Create your Model\n", + "\n", + "Next we will build a model. A [Model](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/Model.html) contains a neural network [Block](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/nn/Block.html) along with additional artifacts used for the training process. It possesses additional information about the inputs, outputs, shapes, and data types you will use. Generally, you will use the Model once you have fully completed your Block.\n", + "\n", + "In this part of the tutorial, we will use the built-in Multilayer Perceptron Block from the Model Zoo. To learn how to build it from scratch, see the previous tutorial: [Create Your First Network](01_create_your_first_network.ipynb).\n", + "\n", + "Because images in the MNIST dataset are 28x28 grayscale images, we will create an MLP block with 28 x 28 input. The output will be 10 because there are 10 possible classes (0 to 9) each image could be. For the hidden layers, we have chosen `new int[] {128, 64}` by experimenting with different values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Model model = Model.newInstance(\"mlp\");\n", + "model.setBlock(new Mlp(28 * 28, 10, new int[] {128, 64}));" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 3: Create a Trainer\n", + "\n", + "Now, you can create a [`Trainer`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/Trainer.html) to train your model. The trainer is the main class to orchestrate the training process. Usually, they will be opened using a try-with-resources and closed after training is over.\n", + "\n", + "The trainer takes an existing model and attempts to optimize the parameters inside the model's Block to best match the dataset. Most optimization is based upon [Stochastic Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) (SGD).\n", + "\n", + "## Step 3.1: Setup your training configurations\n", + "\n", + "Before you create your trainer, we we will need a [training configuration](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/DefaultTrainingConfig.html) that describes how to train your model.\n", + "\n", + "The following are a few common items you may need to configure your training:\n", + "\n", + "* **REQUIRED** [`Loss`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/loss/Loss.html) function: A loss function is used to measure how well our model matches the dataset. Because the lower value of the function is better, it's called the \"loss\" function. The Loss is the only required argument to the model\n", + "* [`Evaluator`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/evaluator/Evaluator.html) function: An evaluator function is also used to measure how well our model matches the dataset. Unlike the loss, they are only there for people to look at and are not used for optimizing the model. Since many losses are not as intuitive, adding other evaluators such as Accuracy can help to understand how your model is doing. If you know of any useful evaluators, we recommend adding them.\n", + "* [`Training Listeners`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/listener/TrainingListener.html): The training listener adds additional functionality to the training process through a listener interface. This can include showing training progress, stopping early if training becomes undefined, or recording performance metrics. We offer several easy sets of [default listeners](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/listener/TrainingListener.Defaults.html).\n", + "\n", + "You can also configure other options such as the Device, Initializer, and Optimizer. See [more details](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/TrainingConfig.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DefaultTrainingConfig config = new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss())\n", + " //softmaxCrossEntropyLoss is a standard loss for classification problems\n", + " .addEvaluator(new Accuracy()) // Use accuracy so we humans can understand how accurate the model is\n", + " .addTrainingListeners(TrainingListener.Defaults.logging());\n", + "\n", + "// Now that we have our training configuration, we should create a new trainer for our model\n", + "Trainer trainer = model.newTrainer(config);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 5: Initialize Training\n", + "\n", + "Before training your model, you have to initialize all of the parameters with starting values. You can use the trainer for this initialization by passing in the input shape.\n", + "\n", + "* The first axis of the input shape is the batch size. This won't impact the parameter initialization, so you can use 1 here.\n", + "* The second axis of the input shape of the MLP - the number of pixels in the input image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "trainer.initialize(new Shape(1, 28 * 28));" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 6: Train your model\n", + "\n", + "Now, we can train the model.\n", + "\n", + "When training, it is usually organized into epochs where each epoch trains the model on each item in the dataset once. It is slightly faster than training randomly.\n", + "\n", + "Then, we will use the EasyTrain to, as the name promises, make the training easy. If you want to see more details about how the training loop works, see [the EasyTrain class](https://github.com/deepjavalibrary/djl/blob/master/api/src/main/java/ai/djl/training/EasyTrain.java) or [read our Dive into Deep Learning book](https://d2l.djl.ai)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// Deep learning is typically trained in epochs where each epoch trains the model on each item in the dataset once.\n", + "int epoch = 2;\n", + "\n", + "EasyTrain.fit(trainer, epoch, mnist, null);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 7: Save your model\n", + "\n", + "Once your model is trained, you should save it so that it can be reloaded later. You can also add metadata to it such as training accuracy, number of epochs trained, etc that can be used when loading the model or when examining it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Path modelDir = Paths.get(\"build/mlp\");\n", + "Files.createDirectories(modelDir);\n", + "\n", + "model.setProperty(\"Epoch\", String.valueOf(epoch));\n", + "\n", + "model.save(modelDir, \"mlp\");\n", + "\n", + "model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Summary\n", + "\n", + "Now, you've successfully trained a model that can recognize handwritten digits. You'll learn how to apply this model in the next chapter: [Run image classification with your model](03_image_classification_with_your_model.ipynb).\n", + "\n", + "You can find the complete source code for this tutorial in the [examples project](https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/training/TrainMnist.java)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/tutorial/03_image_classification_with_your_model.ipynb b/jupyter/tutorial/03_image_classification_with_your_model.ipynb new file mode 100644 index 00000000..a7aeaa3e --- /dev/null +++ b/jupyter/tutorial/03_image_classification_with_your_model.ipynb @@ -0,0 +1,214 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Inference with your model\n", + "\n", + "This is the third and final tutorial of our [beginner tutorial series](https://docs.djl.ai/docs/demos/jupyter/tutorial) that will take you through creating, training, and running inference on a neural network. In this tutorial, you will learn how to execute your image classification model for a production system.\n", + "\n", + "In the [previous tutorial](02_train_your_first_model.ipynb), you successfully trained your model. Now, we will learn how to implement a `Translator` to convert between POJO and `NDArray` as well as a `Predictor` to run inference.\n", + "\n", + "\n", + "## Preparation\n", + "\n", + "This tutorial requires the installation of the Java Jupyter Kernel. To install the kernel, see the [Jupyter README](https://docs.djl.ai/docs/demos/jupyter/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "// Add the snapshot repository to get the DJL snapshot artifacts\n", + "// %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", + "\n", + "// Add the maven dependencies\n", + "%maven ai.djl:api:0.24.0\n", + "%maven ai.djl:model-zoo:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-engine:0.24.0\n", + "%maven ai.djl.mxnet:mxnet-model-zoo:0.24.0\n", + "%maven org.slf4j:slf4j-simple:1.7.32" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import java.awt.image.*;\n", + "import java.nio.file.*;\n", + "import java.util.*;\n", + "import java.util.stream.*;\n", + "import ai.djl.*;\n", + "import ai.djl.basicmodelzoo.basic.*;\n", + "import ai.djl.ndarray.*;\n", + "import ai.djl.modality.*;\n", + "import ai.djl.modality.cv.*;\n", + "import ai.djl.modality.cv.util.NDImageUtils;\n", + "import ai.djl.translate.*;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Load your handwritten digit image\n", + "\n", + "We will start by loading the image that we want to run our model to classify." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var img = ImageFactory.getInstance().fromUrl(\"https://resources.djl.ai/images/0.png\");\n", + "img.getWrappedImage();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Load your model\n", + "\n", + "Next, we need to load the model to run inference with. This model should have been saved to the `build/mlp` directory when running the [previous tutorial](02_train_your_first_model.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Path modelDir = Paths.get(\"build/mlp\");\n", + "Model model = Model.newInstance(\"mlp\");\n", + "model.setBlock(new Mlp(28 * 28, 10, new int[] {128, 64}));\n", + "model.load(modelDir);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In addition to loading a local model, you can also find pretrained models within our [model zoo](http://docs.djl.ai/docs/model-zoo.html). See more options in our [model loading documentation](http://docs.djl.ai/docs/load_model.html).\n", + "\n", + "## Step 3: Create a `Translator`\n", + "\n", + "The [`Translator`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/translate/Translator.html) is used to encapsulate the pre-processing and post-processing functionality of your application. The input to the processInput and processOutput should be single data items, not batches." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Translator translator = new Translator() {\n", + "\n", + " @Override\n", + " public NDList processInput(TranslatorContext ctx, Image input) {\n", + " // Convert Image to NDArray\n", + " NDArray array = input.toNDArray(ctx.getNDManager(), Image.Flag.GRAYSCALE);\n", + " return new NDList(NDImageUtils.toTensor(array));\n", + " }\n", + "\n", + " @Override\n", + " public Classifications processOutput(TranslatorContext ctx, NDList list) {\n", + " // Create a Classifications with the output probabilities\n", + " NDArray probabilities = list.singletonOrThrow().softmax(0);\n", + " List classNames = IntStream.range(0, 10).mapToObj(String::valueOf).collect(Collectors.toList());\n", + " return new Classifications(classNames, probabilities);\n", + " }\n", + " \n", + " @Override\n", + " public Batchifier getBatchifier() {\n", + " // The Batchifier describes how to combine a batch together\n", + " // Stacking, the most common batchifier, takes N [X1, X2, ...] arrays to a single [N, X1, X2, ...] array\n", + " return Batchifier.STACK;\n", + " }\n", + "};" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4: Create Predictor\n", + "\n", + "Using the translator, we will create a new [`Predictor`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html). The predictor is the main class to orchestrate the inference process. During inference, a trained model is used to predict values, often for production use cases. The predictor is NOT thread-safe, so if you want to do prediction in parallel, you should call newPredictor multiple times to create a predictor object for each thread." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var predictor = model.newPredictor(translator);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 5: Run inference\n", + "\n", + "With our predictor, we can simply call the [predict](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html#predict(I)) method to run inference. For better performance, you can also call [batchPredict](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html#batchPredict(java.util.List)) with a list of input items. Afterwards, the same predictor should be used for further inference calls. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var classifications = predictor.predict(img);\n", + "\n", + "classifications" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "Now, you've successfully built a model, trained it, and run inference. Congratulations on finishing the [beginner tutorial series](https://docs.djl.ai/docs/demos/jupyter/tutorial). After this, you should read our other [examples](https://github.com/deepjavalibrary/djl/tree/master/examples) and [jupyter notebooks](https://docs.djl.ai/docs/demos/jupyter) to learn more about DJL.\n", + "\n", + "You can find the complete source code for this tutorial in the [examples project](https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/ImageClassification.java)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "java", + "file_extension": ".jshell", + "mimetype": "text/x-java-source", + "name": "Java", + "pygments_lexer": "java", + "version": "14.0.2+12" + }, + "pycharm": { + "stem_cell": { + "cell_type": "raw", + "metadata": { + "collapsed": false + }, + "source": [] + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/jupyter/tutorial/README.md b/jupyter/tutorial/README.md new file mode 100644 index 00000000..4c53b0f4 --- /dev/null +++ b/jupyter/tutorial/README.md @@ -0,0 +1,7 @@ +# DJL - Beginner Tutorial + +Our beginner tutorial takes you through creating your first network, training it, and using it in a real system. This is a good place to start if you are new to DJL or to deep learning. + +1. [Create your first neural network](01_create_your_first_network.ipynb) +2. [Train your first model](02_train_your_first_model.ipynb) +3. [Run image classification with your first model](03_image_classification_with_your_model.ipynb)