^{Back | Next |}^Contents
^{Semantic Segmentation}

Semantic Segmentation with SegNet

The next deep learning capability we'll cover in this tutorial is semantic segmentation. Semantic segmentation is based on image recognition, except the classifications occur at the pixel level as opposed to the entire image. This is accomplished by convolutionalizing a pre-trained image recognition backbone, which transforms the model into a Fully Convolutional Network (FCN) capable of per-pixel labeling. Especially useful for environmental perception, segmentation yields dense per-pixel classifications of many different potential objects per scene, including scene foregrounds and backgrounds.

segNet accepts as input the 2D image, and outputs a second image with the per-pixel classification mask overlay. Each pixel of the mask corresponds to the class of object that was classified. segNet is available to use from Python and C++.

As examples of using the segNet class, we provide sample programs C++ and Python:

segnet.cpp (C++)
segnet.py (Python)

These samples are able to segment images, videos, and camera feeds. For more info about the various types of input/output streams supported, see the Camera Streaming and Multimedia page.

See below for various pre-trained segmentation models available that use the FCN-ResNet18 network with realtime performance on Jetson. Models are provided for a variety of environments and subject matter, including urban cities, off-road trails, and indoor office spaces and homes.

Pre-Trained Segmentation Models Available

Below is a table of the pre-trained semantic segmentation models available for download, and the associated --network argument to segnet used for loading them. They're based on the 21-class FCN-ResNet18 network and have been trained on various datasets and resolutions using PyTorch, and were exported to ONNX format to be loaded with TensorRT.

Dataset	Resolution	CLI Argument	Accuracy	Jetson Nano	Jetson Xavier
Cityscapes	512x256	`fcn-resnet18-cityscapes-512x256`	83.3%	48 FPS	480 FPS
Cityscapes	1024x512	`fcn-resnet18-cityscapes-1024x512`	87.3%	12 FPS	175 FPS
Cityscapes	2048x1024	`fcn-resnet18-cityscapes-2048x1024`	89.6%	3 FPS	47 FPS
DeepScene	576x320	`fcn-resnet18-deepscene-576x320`	96.4%	26 FPS	360 FPS
DeepScene	864x480	`fcn-resnet18-deepscene-864x480`	96.9%	14 FPS	190 FPS
Multi-Human	512x320	`fcn-resnet18-mhp-512x320`	86.5%	34 FPS	370 FPS
Multi-Human	640x360	`fcn-resnet18-mhp-640x360`	87.1%	23 FPS	325 FPS
Pascal VOC	320x320	`fcn-resnet18-voc-320x320`	85.9%	45 FPS	508 FPS
Pascal VOC	512x320	`fcn-resnet18-voc-512x320`	88.5%	34 FPS	375 FPS
SUN RGB-D	512x400	`fcn-resnet18-sun-512x400`	64.3%	28 FPS	340 FPS
SUN RGB-D	640x512	`fcn-resnet18-sun-640x512`	65.1%	17 FPS	224 FPS

If the resolution is omitted from the CLI argument, the lowest resolution model is loaded
Accuracy indicates the pixel classification accuracy across the model's validation dataset
Performance is measured for GPU FP16 mode with JetPack 4.2.1, nvpmodel 0 (MAX-N)

note: to download additional networks, run the Model Downloader tool
$ cd jetson-inference/tools
$ ./download-models.sh

Segmenting Images from the Command Line

First, let's try using the segnet program to segment static images. In addition to the input/output paths, there are some additional command-line options:

optional --network flag changes the segmentation model being used (see above)
optional --visualize flag accepts mask and/or overlay modes (default is overlay)
optional --alpha flag sets the alpha blending value for overlay (default is 120)
optional --filter-mode flag accepts point or linear sampling (default is linear)

Launch the application with the --help flag for more info, and refer to the Camera Streaming and Multimedia page for supported input/output protocols.

Here are some example usages of the program:

C++

$ ./segnet --network=<model> input.jpg output.jpg                  # overlay segmentation on original
$ ./segnet --network=<model> --alpha=200 input.jpg output.jpg      # make the overlay less opaque
$ ./segnet --network=<model> --visualize=mask input.jpg output.jpg # output the solid segmentation mask

Python

$ ./segnet.py --network=<model> input.jpg output.jpg                  # overlay segmentation on original
$ ./segnet.py --network=<model> --alpha=200 input.jpg output.jpg      # make the overlay less opaque
$ ./segnet.py --network=<model> --visualize=mask input.jpg output.jpg # output the segmentation mask

Cityscapes

Let's look at some different scenarios. Here's an example of segmenting an urban street scene with the Cityscapes model:

# C++
$ ./segnet --network=fcn-resnet18-cityscapes images/city_0.jpg images/test/output.jpg

# Python
$ ./segnet.py --network=fcn-resnet18-cityscapes images/city_0.jpg images/test/output.jpg

There are more test images called city-*.jpg found under the images/ subdirectory for trying out the Cityscapes model.

DeepScene

The DeepScene dataset consists of off-road forest trails and vegetation, aiding in path-following for outdoor robots.
Here's an example of generating the segmentation overlay and mask by specifying the --visualize argument:

C++

$ ./segnet --network=fcn-resnet18-deepscene images/trail_0.jpg images/test/output_overlay.jpg                # overlay
$ ./segnet --network=fcn-resnet18-deepscene --visualize=mask images/trail_0.jpg images/test/output_mask.jpg  # mask

Python

$ ./segnet.py --network=fcn-resnet18-deepscene images/trail_0.jpg images/test/output_overlay.jpg               # overlay
$ ./segnet.py --network=fcn-resnet18-deepscene --visualize=mask images/trail_0.jpg images/test/output_mask.jpg # mask

There are more sample images called trail-*.jpg located under the images/ subdirectory.

Multi-Human Parsing (MHP)

Multi-Human Parsing provides dense labeling of body parts, like arms, legs, head, and different types of clothing.
See the handful of test images named humans-*.jpg found under images/ for trying out the MHP model:

# C++
$ ./segnet --network=fcn-resnet18-mhp images/humans_0.jpg images/test/output.jpg

# Python
$ ./segnet.py --network=fcn-resnet18-mhp images/humans_0.jpg images/test/output.jpg

MHP Classes

Pascal VOC

Pascal VOC is one of the original datasets used for semantic segmentation, containing various people, animals, vehicles, and household objects. There are some sample images included named object-*.jpg for testing out the Pascal VOC model:

# C++
$ ./segnet --network=fcn-resnet18-voc images/object_0.jpg images/test/output.jpg

# Python
$ ./segnet.py --network=fcn-resnet18-voc images/object_0.jpg images/test/output.jpg

VOC Classes

SUN RGB-D

The SUN RGB-D dataset provides segmentation ground-truth for many indoor objects and scenes commonly found in office spaces and homes. See the images named room-*.jpg found under the images/ subdirectory for testing out the SUN models:

# C++
$ ./segnet --network=fcn-resnet18-sun images/room_0.jpg images/test/output.jpg

# Python
$ ./segnet.py --network=fcn-resnet18-sun images/room_0.jpg images/test/output.jpg

SUN Classes

Processing a Directory or Sequence of Images

If you want to process a directory or sequence of images, you can launch the program with the path to the directory that contains images or a wildcard sequence:

# C++
$ ./segnet --network=fcn-resnet18-sun "images/room_*.jpg" images/test/room_output_%i.jpg

# Python
$ ./segnet.py --network=fcn-resnet18-sun "images/room_*.jpg" images/test/room_output_%i.jpg

note: when using wildcards, always enclose it in quotes ("*.jpg"). Otherwise, the OS will auto-expand the sequence and modify the order of arguments on the command-line, which may result in one of the input images being overwritten by the output.

For more info about loading/saving sequences of images, see the Camera Streaming and Multimedia page. Next, we'll run segmentation on a live camera or video stream.

Next | Running the Live Camera Segmentation Demo
Back | Coding Your Own Object Detection Program

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segnet-console-2.md

segnet-console-2.md

Semantic Segmentation with SegNet

Pre-Trained Segmentation Models Available

Segmenting Images from the Command Line

C++

Python

Cityscapes

DeepScene

C++

Python

Multi-Human Parsing (MHP)

MHP Classes

Pascal VOC

VOC Classes

SUN RGB-D

SUN Classes

Processing a Directory or Sequence of Images

Files

segnet-console-2.md

Latest commit

History

segnet-console-2.md

File metadata and controls

Semantic Segmentation with SegNet

Pre-Trained Segmentation Models Available

Segmenting Images from the Command Line

C++

Python

Cityscapes

DeepScene

C++

Python

Multi-Human Parsing (MHP)

MHP Classes

Pascal VOC

VOC Classes

SUN RGB-D

SUN Classes

Processing a Directory or Sequence of Images