Minor changes for Open Sourcing.

PiperOrigin-RevId: 568780473
google-research · Sep 28, 2023 · aa7cc31 · aa7cc31
1 parent 1ce78e5
commit aa7cc31
Show file tree

Hide file tree

Showing 13 changed files with 324 additions and 25 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Human Scene Transformer
 
-![Human Scene Transformer](./images/hero.png)
+![Human Scene Transformer](./human_scene_transformer/images/hero.png)
 
 Anticipating the motion of all humans in dynamic environments such as homes and offices is critical to enable safe and effective robot navigation. Such spaces remain challenging as humans do not follow strict rules of motion and there are often multiple occluded entry points such as corners and doors that create opportunities for sudden encounters. In this work, we present a Transformer based architecture to predict human future trajectories in human-centric environments from input features including human positions, head orientations, and 3D skeletal keypoints from onboard in-the-wild sensory information. The resulting model captures the inherent uncertainty for future human trajectory prediction and achieves state-of-the-art performance on common prediction benchmarks and a human tracking dataset captured from a mobile robot adapted for the prediction task. Furthermore, we identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error in such challenging scenarios.
 
@@ -24,6 +24,17 @@ If you use this work please cite our paper
 ## Data
 
 ### JRDB
+
+We provide a extensive prep-processing pipeline to convert the JRDB dataset,
+JRDB was created as a detection and tracking dataset rather than a prediction
+dataset. To make the data suitable for a prediction task, we first extract the
+robot motion from the raw sensor data to account for the robot's motion.
+Further, on the JRDB training split we combine algorithmic detection with the
+ground truth labels from the tracking dataset to create authentic tracks as
+input and labels for HST.
+Note that we do not purely use the ground truth hand labeled tracks in the JRDB
+train dataset as we find them to be overly smoothed giving away the future human
+movement.
 To adapt the JRDB dataset for prediction please follow [this](/data) README.
 
 Make sure to adapt `<data_path>` in `config/<jrdb/pedestrians>/dataset_params.gin` accordingly.
@@ -38,17 +49,14 @@ Please download the raw data [here](https://github.com/StanfordASL/Trajectron-pl
 
 ### JRDB
 ```
-python train.py --model_base_dir=./model/jrdb  --gin_files=.config/jrdb/training_params.gin --gin_files=.config/jrdb/model_params.gin --gin_files=.config/jrdb/dataset_params.gin --gin_files=.config/jrdb/metrics.gin --dataset=JRDB
+python train.py --model_base_dir=./model/jrdb  --gin_files=./config/jrdb/training_params.gin --gin_files=./config/jrdb/model_params.gin --gin_files=./config/jrdb/dataset_params.gin --gin_files=./config/jrdb/metrics.gin --dataset=JRDB
 ```
 
 ### Pedestrians ETH/UCY
 ```
-python train.py --model_base_dir=./models/pedestrians_eth  --gin_files=.config/pedestrians/training_params.gin --gin_files=.config/pedestrians/model_params.gin --gin_files=.config/pedestrians/dataset_params.gin --gin_files=.config/pedestrians/metrics.gin --dataset=PEDESTRIANS
+python train.py --model_base_dir=./models/pedestrians_eth  --gin_files=..config/pedestrians/training_params.gin --gin_files=..config/pedestrians/model_params.gin --gin_files=./config/pedestrians/dataset_params.gin --gin_files=./config/pedestrians/metrics.gin --dataset=PEDESTRIANS
 ```
 
-## Checkpoints
-Coming soon!
-
 ---
 
 ## Evaluation
@@ -58,7 +66,47 @@ Coming soon!
 python jrdb/eval.py --model_path=./models/jrdb/ --checkpoint_path=./models/jrdb/ckpts/ckpt-30
 ```
 
+#### Keypoints Impact Evaluation
+```
+python jrdb/eval_keypoints.py --model_path=./models/jrdb/ --checkpoint_path=./models/jrdb/ckpts/ckpt-30
+```
+
+vs
+
+```
+python jrdb/eval_keypoints.py --model_path=./models/jrdb_no_keypoints/ --checkpoint_path=./models/jrdb_no_keypoints/ckpts/ckpt-30
+```
+
 ### Pedestrians ETH/UCY
 ```
 python pedestrians/eval.py --model_path=./models/pedestrians_eth/ --checkpoint_path=./models/pedestrians_eth/ckpts/ckpt-20
 ```
+
+---
+
+## Results
+
+Compared to the published paper we improved our data processing and fixed small
+bugs in this code release. If you compare against our method please use the
+following updated results.
+
+On the JRDB dataset with dataset options as set [here](/config/jrdb/dataset_params.py):
+
+|        | AVG  |  @ 1s | @ 2s |  @ 3s | @ 4s  |
+|--------|------|-------|------|-------|-------|
+| MinADE | 0.26 | 0.12  | 0.20 | 0.28  | 0.37  |
+| MinFDE | 0.45 | 0.21  | 0.39 | 0.56  | 0.71  |
+|  NLL   |-0.59 | -0.90 | -0.65| -0.08 | 0.32  |
+
+On the ETH/UCY Pedestrians Dataset:
+
+|        | ETH  | Hotel | Univ | Zara1 | Zara2 |  Avg  |
+|--------|------|-------|------|-------|-------|-------|
+| MinADE | 0.41 | 0.10  | 0.24 | 0.17  | 0.14  | 0.21  |
+| MinFDE | 0.73 | 0.14  | 0.44 | 0.30  | 0.24  | 0.37  |
+
+
+### Checkpoints
+You can download trained model checkpoints for both `JRDB` and `Pedestrians (ETH/UCY)` datasets [here]()(Coming Soon).
+
+To evaluate the pre-trained checkpoints you will have to adjust the path to the dataset in the respective `params/operative_config.gin` file.
diff --git a/human_scene_transformer/config/jrdb/dataset_params.gin b/human_scene_transformer/config/jrdb/dataset_params.gin
@@ -55,7 +55,7 @@ TEST_SCENES = ['clark-center-2019-02-28_1',
  'tressider-2019-04-26_3_test']
 
 
-JRDBDatasetParams.path = <dataset_path>
+JRDBDatasetParams.path = '<dataset_path>'
 
 JRDBDatasetParams.train_scenes = %TRAIN_SCENES
 JRDBDatasetParams.eval_scenes = %TEST_SCENES

diff --git a/human_scene_transformer/config/pedestrians/dataset_params.gin b/human_scene_transformer/config/pedestrians/dataset_params.gin
@@ -1,4 +1,4 @@
-PedestriansDatasetParams.path = <dataset_path>
+PedestriansDatasetParams.path = '<dataset_path>'
 PedestriansDatasetParams.dataset = 'eth'
 PedestriansDatasetParams.train_config = 'train'  # train, trainval
 PedestriansDatasetParams.eval_config = 'val' # val, test

diff --git a/human_scene_transformer/data/README.md b/human_scene_transformer/data/README.md
@@ -9,7 +9,7 @@
 5. Download and extract `Train Detections` from the JRDB 2019 section to `<data_path>/detections`.
 
 ## Get the Leaderboard Test Set Tracks
-3. Download and extract the best leaderboard  [3D tracking result](https://jrdb.erc.monash.edu/leaderboards/download/1680) to `<data_path>/test_dataset/labels/raw_leaderboard/`.
+Download and extract this leaderboard  [3D tracking result](https://jrdb.erc.monash.edu/leaderboards/download/1605) to `<data_path>/test_dataset/labels/raw_leaderboard/`. Such that you have `<data_path>/test_dataset/labels/raw_leaderboard/00XX.txt` This is the best available leaderboard tracker at the time the code was developed.
 
 ## Get the Robot Odometry Preprocessed Keypoints
 

diff --git a/human_scene_transformer/data/jrdb_preprocess_test.py b/human_scene_transformer/data/jrdb_preprocess_test.py
@@ -0,0 +1,235 @@
+# Copyright 2023 The human_scene_transformer Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Preprocesses the raw test split of JRDB.
+"""
+
+import os
+
+from human_scene_transformer.data import utils
+import numpy as np
+import pandas as pd
+import tensorflow as tf
+import tqdm
+
+INPUT_PATH = '<dataset_path>'
+OUTPUT_PATH = '<output_path>'
+
+POINTCLOUD = True
+AGENT_KEYPOINTS = True
+FROM_DETECTIONS = True
+
+
+def list_test_scenes(input_path):
+  scenes = os.listdir(os.path.join(input_path, 'images', 'image_0'))
+  scenes.sort()
+  return scenes
+
+
+def get_agents_features_df_with_box(
+    input_path, scene_id, max_distance_to_robot=10.0
+):
+  """Returns agents features with bounding box from raw leaderboard data."""
+  jrdb_header = [
+      'frame',
+      'track id',
+      'type',
+      'truncated',
+      'occluded',
+      'alpha',
+      'bb_left',
+      'bb_top',
+      'bb_width',
+      'bb_height',
+      'x',
+      'y',
+      'z',
+      'height',
+      'width',
+      'length',
+      'rotation_y',
+      'score',
+  ]
+  scene_data_file = utils.get_file_handle(
+      os.path.join(
+          input_path, 'labels', 'raw_leaderboard', f'{scene_id:04}' + '.txt'
+      )
+  )
+  df = pd.read_csv(scene_data_file, sep=' ', names=jrdb_header)
+
+  def camera_to_lower_velodyne(p):
+    return np.stack(
+        [p[..., 2], -p[..., 0], -p[..., 1] + (0.742092 - 0.606982)], axis=-1
+    )
+
+  df = df[df['score'] >= 0.01]
+
+  df['p'] = df[['x', 'y', 'z']].apply(
+      lambda s: camera_to_lower_velodyne(s.to_numpy()), axis=1
+  )
+  df['distance'] = df['p'].apply(lambda s: np.linalg.norm(s, axis=-1))
+  df['l'] = df['height']
+  df['h'] = df['width']
+  df['w'] = df['length']
+  df['yaw'] = df['rotation_y']
+
+  df['id'] = df['track id'].apply(lambda s: f'pedestrian:{s}')
+  df['timestep'] = df['frame']
+
+  df = df.set_index(['timestep', 'id'])
+
+  df = df[df['distance'] <= max_distance_to_robot]
+
+  return df[['p', 'yaw', 'l', 'h', 'w']]
+
+
+def jrdb_preprocess_test(input_path, output_path):
+  scenes = list_test_scenes(os.path.join(input_path, 'test_dataset'))
+  subsample = 1
+  for scene in tqdm.tqdm(scenes):
+    scene_save_name = scene + '_test'
+    agents_df = get_agents_features_df_with_box(
+        os.path.join(input_path, 'test_dataset'),
+        scenes.index(scene),
+        max_distance_to_robot=15.0,
+    )
+
+    robot_odom = utils.get_robot(
+        os.path.join(input_path, 'processed', 'odometry_test'), scene
+    )
+
+    if AGENT_KEYPOINTS:
+      keypoints = utils.get_agents_keypoints(
+          os.path.join(
+              input_path, 'processed', 'labels', 'labels_3d_keypoints_test'
+          ),
+          scene,
+      )
+      keypoints_df = pd.DataFrame.from_dict(
+          keypoints, orient='index'
+      ).rename_axis(['timestep', 'id'])  # pytype: disable=missing-parameter  # pandas-drop-duplicates-overloads
+
+      agents_df = agents_df.join(keypoints_df)
+      agents_df.keypoints.fillna(
+          dict(
+              zip(
+                  agents_df.index[agents_df['keypoints'].isnull()],
+                  [np.ones((33, 3)) * np.nan]
+                  * len(
+                      agents_df.loc[
+                          agents_df['keypoints'].isnull(), 'keypoints'
+                      ]
+                  ),
+              )
+          ),
+          inplace=True,
+      )
+
+    robot_df = pd.DataFrame.from_dict(robot_odom, orient='index').rename_axis(  # pytype: disable=missing-parameter  # pandas-drop-duplicates-overloads
+        ['timestep']
+    )
+    # Remove extra data odometry datapoints
+    robot_df = robot_df.iloc[agents_df.index.levels[0]]
+
+    assert (agents_df.index.levels[0] == robot_df.index).all()
+
+    # Subsample
+    assert len(agents_df.index.levels[0]) == agents_df.index.levels[0].max() + 1
+    agents_df_subsampled_index = agents_df.unstack('id').iloc[::subsample].index
+    agents_df = (
+        agents_df.unstack('id')
+        .iloc[::subsample]
+        .reset_index(drop=True)
+        .stack('id', dropna=True)
+    )
+
+    agents_in_odometry_df = utils.agents_to_odometry_frame(
+        agents_df, robot_df.iloc[::subsample].reset_index(drop=True)
+    )
+
+    agents_pos_ragged_tensor = utils.agents_pos_to_ragged_tensor(
+        agents_in_odometry_df
+    )
+    agents_yaw_ragged_tensor = utils.agents_yaw_to_ragged_tensor(
+        agents_in_odometry_df
+    )
+    assert (
+        agents_pos_ragged_tensor.shape[0] == agents_yaw_ragged_tensor.shape[0]
+    )
+
+    tf.data.Dataset.from_tensors(agents_pos_ragged_tensor).save(
+        os.path.join(output_path, scene_save_name, 'agents', 'position')
+    )
+    tf.data.Dataset.from_tensors(agents_yaw_ragged_tensor).save(
+        os.path.join(output_path, scene_save_name, 'agents', 'orientation')
+    )
+
+    if AGENT_KEYPOINTS:
+      agents_keypoints_ragged_tensor = utils.agents_keypoints_to_ragged_tensor(
+          agents_in_odometry_df
+      )
+      tf.data.Dataset.from_tensors(agents_keypoints_ragged_tensor).save(
+          os.path.join(output_path, scene_save_name, 'agents', 'keypoints')
+      )
+
+    robot_in_odometry_df = utils.robot_to_odometry_frame(robot_df)
+    robot_pos = tf.convert_to_tensor(
+        np.stack(robot_in_odometry_df.iloc[::subsample]['p'].values).astype(
+            np.float32
+        )
+    )
+    robot_orientation = tf.convert_to_tensor(
+        np.stack(robot_in_odometry_df.iloc[::subsample]['yaw'].values).astype(
+            np.float32
+        )
+    )[..., tf.newaxis]
+
+    tf.data.Dataset.from_tensors(robot_pos).save(
+        os.path.join(output_path, scene_save_name, 'robot', 'position')
+    )
+    tf.data.Dataset.from_tensors(robot_orientation).save(
+        os.path.join(output_path, scene_save_name, 'robot', 'orientation')
+    )
+
+    if POINTCLOUD:
+      scene_pointcloud_dict = utils.get_scene_poinclouds(
+          os.path.join(input_path, 'test_dataset'), scene, subsample=subsample
+      )
+      # Remove extra timesteps
+      scene_pointcloud_dict = {
+          ts: scene_pointcloud_dict[ts] for ts in agents_df_subsampled_index
+      }
+
+      scene_pc_odometry = utils.pc_to_odometry_frame(
+          scene_pointcloud_dict, robot_df
+      )
+
+      filtered_pc = utils.filter_agents_and_ground_from_point_cloud(
+          agents_in_odometry_df, scene_pc_odometry, robot_in_odometry_df
+      )
+
+      scene_pc_ragged_tensor = tf.ragged.stack(filtered_pc)
+
+      assert (
+          agents_pos_ragged_tensor.bounding_shape()[1]
+          == scene_pc_ragged_tensor.shape[0]
+      )
+
+      tf.data.Dataset.from_tensors(scene_pc_ragged_tensor).save(
+          os.path.join(output_path, scene_save_name, 'scene', 'pc'),
+          compression='GZIP',
+      )
+
+if __name__ == '__main__':
+  jrdb_preprocess_test(INPUT_PATH, OUTPUT_PATH)
diff --git a/human_scene_transformer/data/jrdb_train_detections_to_tracks.py b/human_scene_transformer/data/jrdb_train_detections_to_tracks.py
@@ -30,7 +30,7 @@
 
 INPUT_PATH = '<data_path>'
 OUTPUT_PATH = os.path.join(
-   input_path, '/processed/labels/labels_detections_3d')
+   INPUT_PATH, 'processed/labels/labels_detections_3d')
 
 
 def get_agents_3d_bounding_box_dict(input_path, scene):
@@ -129,7 +129,6 @@ def jrdb_train_detections_to_tracks(input_path, output_path):
   scenes = utils.list_scenes(
       os.path.join(input_path, 'train_dataset'))
   for scene in tqdm.tqdm(scenes):
-    print(f'Processing {scene}')
     bb_dict = get_agents_3d_bounding_box_dict(
         os.path.join(input_path, 'train_dataset'), scene)
     bb_3d_df = pd.DataFrame.from_dict(
@@ -174,7 +173,7 @@ def jrdb_train_detections_to_tracks(input_path, output_path):
 
     labels_dict = detections_to_dict(matched_df)
 
-    with os.Open(f"{output_path}/{scene}.json", 'w') as write_file:
+    with open(f"{output_path}/{scene}.json", 'w') as write_file:
       json.dump(labels_dict, write_file, indent=2, ensure_ascii=True)
 
 if __name__ == '__main__':

diff --git a/human_scene_transformer/data/utils.py b/human_scene_transformer/data/utils.py
@@ -37,12 +37,13 @@ def maybe_makedir(path):
 
 
 def get_file_handle(path, mode='rt'):
-  file_handle = os.Open(path, mode=mode)
+  file_handle = open(path, mode)
   return file_handle
 
 
 def list_scenes(input_path):
   scenes = os.listdir(os.path.join(input_path, 'labels', 'labels_3d'))
+  scenes.sort()
   return [scene[:-5] for scene in scenes]