Multi-Camera DeepTAM

Visual Odometry methods based on classical 3D geometry have been around for years, using either indirect feature matching or direct visual error minimization. Lately,learning-based methods that combine both matching and geometry estimation in a single network have achieved impressive results. One such method is DeepTAM. Further, it has been shown that classical methods benefit from the extended field of view provided by using multiple cameras. However, these setups have been ignored by current learning-based methods.

In this work, we extend the existing DeepTAM pipeline to leverage a multi-camera setup with known geometry. We demonstrate the generalizability of DeepTAM to other monocular setups and highlight the scenarios in which it performs poorly. We show the efficacy of our proposed multi-camera VO pipeline to receive better pose estimates using experiments based on simulation.

Contributors: Mayank Mittal, Rohit Suri, Fadhil Ginting, Parker Ewen

This code has been tested on a computer with following specifications:

OS Platform and Distribution: Linux Ubuntu 16.04LTS
CUDA/cuDNN version: CUDA 9.0.176, cuDNN 7.1.4
GPU model and memory: NVidia GeForce GTX 1070-MaxQ, 8GB
Python: 3.5.2
TensorFlow: 1.9.0

Dataset Directory Organization

For each camera, the directory should be organized as shown below:

data/
└── cam_1/  
    ├── depth
    ├── depth.txt
    ├── groundtruth.txt
    ├── rgb
    └── rgb.txt

The text files be similar to the ones present in TUM RGBD sequences, i.e. each line should first contain the timestamp information followed by the data:

Images: Data is the file path relative to the sequence directory name specified in the config.yaml file
Groundtruth: Data is the cartesian position and quaternion orientation of that particular camera (in world/camera frame)

Single Camera Setup

An example YAML configuration file for the RGBD Freiburg1 Desk Sequence is present here.

NOTE: Please ensure that the sequence directory and camera intrinsics are correctly modified according to the dataset.

Multiple Cameras Setup

For multi-camera, an additional YAML file needs to be written which would contain the path to all the configuration files for all the cameras in the system. An example for the same for AirSim dataset is available here.

Installation Instructions:

To install the virtual environment and all required dependencies, run:

./install.sh

Source the virtual environment installed:

workon deeptam_py

Download the pre-trained weights for the DeepTAM tracking network:

cd resources/weights
chmod +x download_weights.sh
./download_weights.sh

Usage:

Single Camera DeepTAM

Input Arguments:

--config_file or -f: set to the path of configuration YAML file
--output_dir or -o: set to the path of configuration YAML file (default: '')
--weights or -w: set to the path for the weights of the DeepTAM tracking network (without the .index, .meta or .data extensions)
--tracking_network or -n: set to the path of the tracking network (default: path to module deeptam_tracker.models.networks)
--disable_vis or -v: disable the frame-by-frame visualization for speed-up

Example:

Download the RGBD Freiburg1 Desk Sequence:

cd resources/data
chmod +x download_testdata_rgbd_freiburg1.sh
./download_testdata_rgbd_freiburg1.sh

To run DeepTAM with a single camera setup, run:

cd scripts
# run the python script
python single_camera_tracking.py \
    --config_file ../resources/hyperparameters/freiburg1_config.yaml \
    --weights ../resources/weights/deeptam_tracker_weights/snapshot-300000