Image Object Detection ‐ Deep Forest

Workshops sessions are recorded and posted to the UA Datalab Youtube Channel about 1 week after the live session.

DeepForest Summary

DeepForest is a python library built for identifying tree crowns (and birds) from high-resolution imagery. It works well with drone imagery and high resolution airplane imagery.

DeepForest provides a pre-trained model capable of identifying many types of trees. However, it may not work perfectly in your area of interest. It may produce false positives (ie, objects that are not actually trees) and false negatives (ie, fail to identify all trees in an image). To improve the pre-trained model, users can incorporate their own training data to fine-tune the model to identify your specific trees of interest.

The images above show a drone image of a golf course and the DeepForest pre-trained model applied to detect trees.

How DeepForest Works

DeepForest uses a deep learning convolutional neural network algorithm to identify and put bounding boxes around predicted tree crowns. The algorithm is built using torchvision, which is part of the python library PyTorch.

A convolutional neural network can be trained to identify objects in an image analogously to how humans can identify objects. For example, look at the aerial images of tree species above. The human brain can easily tell that these are probably 3 distinct tree species (from left to right: ponderosa pine, mesquite, willow).

However, traditional image analysis approaches such as looking at color values of individual pixels or grouping adjacent pixels based on similar values, will both fail at the task of identifying trees in high-resolution images.

A convolutional neural network approach first tries to identify all of the small features that make up the visual appearance of a tree. It could be the shape of the leaves, how the branches spread out, or how the crown

https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

DeepForest uses RetinaNet, which is one stage object detection model. RetinaNet consists of two major components, 1. a pre-trained convolutional neural network called ResNet50 and 2. a feature pyramid network (FPN). For a deeper look at the model architecture, please see this article.

Label Imagery for Training

Labeling training data can be done on georeferenced or non-georeferenced imagery. For georeferenced imagery, we will use QGIS for the labeling.

With QGIS launched, add the georeferenced image into display: Layer >>> Add Layer >>> Add Raster Layer

Right Click somewhere on the toolbars and turn on Shape Digitizing Toolbar

Create a New Shapefile Layer

Toggle the Editing button

Click Add Polygon Feature

On the Shape Digitizing Toolbar, select the tool called Rectangle From Extent

Digitize rectangles around trees

Save Layer Edits

Toggle the Editing button to stop editing of the shapefile.

Jupyter Notebook Example

Python code to predict and train imagery using DeepForest is available using Cyverse. To access this jupyter notebook, users must have an account with Cyverse. Sign up for an account here.