diff --git a/README.md b/README.md index c8fcd87..fd732d4 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Human Scene Transformer -![Human Scene Transformer](./images/hero.png) +![Human Scene Transformer](./human_scene_transformer/images/hero.png) Anticipating the motion of all humans in dynamic environments such as homes and offices is critical to enable safe and effective robot navigation. Such spaces remain challenging as humans do not follow strict rules of motion and there are often multiple occluded entry points such as corners and doors that create opportunities for sudden encounters. In this work, we present a Transformer based architecture to predict human future trajectories in human-centric environments from input features including human positions, head orientations, and 3D skeletal keypoints from onboard in-the-wild sensory information. The resulting model captures the inherent uncertainty for future human trajectory prediction and achieves state-of-the-art performance on common prediction benchmarks and a human tracking dataset captured from a mobile robot adapted for the prediction task. Furthermore, we identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error in such challenging scenarios. @@ -24,6 +24,17 @@ If you use this work please cite our paper ## Data ### JRDB + +We provide a extensive prep-processing pipeline to convert the JRDB dataset, +JRDB was created as a detection and tracking dataset rather than a prediction +dataset. To make the data suitable for a prediction task, we first extract the +robot motion from the raw sensor data to account for the robot's motion. +Further, on the JRDB training split we combine algorithmic detection with the +ground truth labels from the tracking dataset to create authentic tracks as +input and labels for HST. +Note that we do not purely use the ground truth hand labeled tracks in the JRDB +train dataset as we find them to be overly smoothed giving away the future human +movement. To adapt the JRDB dataset for prediction please follow [this](/data) README. Make sure to adapt `` in `config//dataset_params.gin` accordingly. @@ -38,17 +49,14 @@ Please download the raw data [here](https://github.com/StanfordASL/Trajectron-pl ### JRDB ``` -python train.py --model_base_dir=./model/jrdb --gin_files=.config/jrdb/training_params.gin --gin_files=.config/jrdb/model_params.gin --gin_files=.config/jrdb/dataset_params.gin --gin_files=.config/jrdb/metrics.gin --dataset=JRDB +python train.py --model_base_dir=./model/jrdb --gin_files=./config/jrdb/training_params.gin --gin_files=./config/jrdb/model_params.gin --gin_files=./config/jrdb/dataset_params.gin --gin_files=./config/jrdb/metrics.gin --dataset=JRDB ``` ### Pedestrians ETH/UCY ``` -python train.py --model_base_dir=./models/pedestrians_eth --gin_files=.config/pedestrians/training_params.gin --gin_files=.config/pedestrians/model_params.gin --gin_files=.config/pedestrians/dataset_params.gin --gin_files=.config/pedestrians/metrics.gin --dataset=PEDESTRIANS +python train.py --model_base_dir=./models/pedestrians_eth --gin_files=..config/pedestrians/training_params.gin --gin_files=..config/pedestrians/model_params.gin --gin_files=./config/pedestrians/dataset_params.gin --gin_files=./config/pedestrians/metrics.gin --dataset=PEDESTRIANS ``` -## Checkpoints -Coming soon! - --- ## Evaluation @@ -58,7 +66,47 @@ Coming soon! python jrdb/eval.py --model_path=./models/jrdb/ --checkpoint_path=./models/jrdb/ckpts/ckpt-30 ``` +#### Keypoints Impact Evaluation +``` +python jrdb/eval_keypoints.py --model_path=./models/jrdb/ --checkpoint_path=./models/jrdb/ckpts/ckpt-30 +``` + +vs + +``` +python jrdb/eval_keypoints.py --model_path=./models/jrdb_no_keypoints/ --checkpoint_path=./models/jrdb_no_keypoints/ckpts/ckpt-30 +``` + ### Pedestrians ETH/UCY ``` python pedestrians/eval.py --model_path=./models/pedestrians_eth/ --checkpoint_path=./models/pedestrians_eth/ckpts/ckpt-20 ``` + +--- + +## Results + +Compared to the published paper we improved our data processing and fixed small +bugs in this code release. If you compare against our method please use the +following updated results. + +On the JRDB dataset with dataset options as set [here](/config/jrdb/dataset_params.py): + +| | AVG | @ 1s | @ 2s | @ 3s | @ 4s | +|--------|------|-------|------|-------|-------| +| MinADE | 0.26 | 0.12 | 0.20 | 0.28 | 0.37 | +| MinFDE | 0.45 | 0.21 | 0.39 | 0.56 | 0.71 | +| NLL |-0.59 | -0.90 | -0.65| -0.08 | 0.32 | + +On the ETH/UCY Pedestrians Dataset: + +| | ETH | Hotel | Univ | Zara1 | Zara2 | Avg | +|--------|------|-------|------|-------|-------|-------| +| MinADE | 0.41 | 0.10 | 0.24 | 0.17 | 0.14 | 0.21 | +| MinFDE | 0.73 | 0.14 | 0.44 | 0.30 | 0.24 | 0.37 | + + +### Checkpoints +You can download trained model checkpoints for both `JRDB` and `Pedestrians (ETH/UCY)` datasets [here]()(Coming Soon). + +To evaluate the pre-trained checkpoints you will have to adjust the path to the dataset in the respective `params/operative_config.gin` file. \ No newline at end of file diff --git a/human_scene_transformer/config/jrdb/dataset_params.gin b/human_scene_transformer/config/jrdb/dataset_params.gin index 40585dd..5fa3b12 100644 --- a/human_scene_transformer/config/jrdb/dataset_params.gin +++ b/human_scene_transformer/config/jrdb/dataset_params.gin @@ -55,7 +55,7 @@ TEST_SCENES = ['clark-center-2019-02-28_1', 'tressider-2019-04-26_3_test'] -JRDBDatasetParams.path = +JRDBDatasetParams.path = '' JRDBDatasetParams.train_scenes = %TRAIN_SCENES JRDBDatasetParams.eval_scenes = %TEST_SCENES diff --git a/human_scene_transformer/config/pedestrians/dataset_params.gin b/human_scene_transformer/config/pedestrians/dataset_params.gin index dd7b64a..6b54090 100644 --- a/human_scene_transformer/config/pedestrians/dataset_params.gin +++ b/human_scene_transformer/config/pedestrians/dataset_params.gin @@ -1,4 +1,4 @@ -PedestriansDatasetParams.path = +PedestriansDatasetParams.path = '' PedestriansDatasetParams.dataset = 'eth' PedestriansDatasetParams.train_config = 'train' # train, trainval PedestriansDatasetParams.eval_config = 'val' # val, test diff --git a/human_scene_transformer/data/README.md b/human_scene_transformer/data/README.md index 4fd2a83..23b3acc 100644 --- a/human_scene_transformer/data/README.md +++ b/human_scene_transformer/data/README.md @@ -9,7 +9,7 @@ 5. Download and extract `Train Detections` from the JRDB 2019 section to `/detections`. ## Get the Leaderboard Test Set Tracks -3. Download and extract the best leaderboard [3D tracking result](https://jrdb.erc.monash.edu/leaderboards/download/1680) to `/test_dataset/labels/raw_leaderboard/`. +Download and extract this leaderboard [3D tracking result](https://jrdb.erc.monash.edu/leaderboards/download/1605) to `/test_dataset/labels/raw_leaderboard/`. Such that you have `/test_dataset/labels/raw_leaderboard/00XX.txt` This is the best available leaderboard tracker at the time the code was developed. ## Get the Robot Odometry Preprocessed Keypoints diff --git a/human_scene_transformer/data/jrdb_preprocess_test.py b/human_scene_transformer/data/jrdb_preprocess_test.py new file mode 100644 index 0000000..22a8ffd --- /dev/null +++ b/human_scene_transformer/data/jrdb_preprocess_test.py @@ -0,0 +1,235 @@ +# Copyright 2023 The human_scene_transformer Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Preprocesses the raw test split of JRDB. +""" + +import os + +from human_scene_transformer.data import utils +import numpy as np +import pandas as pd +import tensorflow as tf +import tqdm + +INPUT_PATH = '' +OUTPUT_PATH = '' + +POINTCLOUD = True +AGENT_KEYPOINTS = True +FROM_DETECTIONS = True + + +def list_test_scenes(input_path): + scenes = os.listdir(os.path.join(input_path, 'images', 'image_0')) + scenes.sort() + return scenes + + +def get_agents_features_df_with_box( + input_path, scene_id, max_distance_to_robot=10.0 +): + """Returns agents features with bounding box from raw leaderboard data.""" + jrdb_header = [ + 'frame', + 'track id', + 'type', + 'truncated', + 'occluded', + 'alpha', + 'bb_left', + 'bb_top', + 'bb_width', + 'bb_height', + 'x', + 'y', + 'z', + 'height', + 'width', + 'length', + 'rotation_y', + 'score', + ] + scene_data_file = utils.get_file_handle( + os.path.join( + input_path, 'labels', 'raw_leaderboard', f'{scene_id:04}' + '.txt' + ) + ) + df = pd.read_csv(scene_data_file, sep=' ', names=jrdb_header) + + def camera_to_lower_velodyne(p): + return np.stack( + [p[..., 2], -p[..., 0], -p[..., 1] + (0.742092 - 0.606982)], axis=-1 + ) + + df = df[df['score'] >= 0.01] + + df['p'] = df[['x', 'y', 'z']].apply( + lambda s: camera_to_lower_velodyne(s.to_numpy()), axis=1 + ) + df['distance'] = df['p'].apply(lambda s: np.linalg.norm(s, axis=-1)) + df['l'] = df['height'] + df['h'] = df['width'] + df['w'] = df['length'] + df['yaw'] = df['rotation_y'] + + df['id'] = df['track id'].apply(lambda s: f'pedestrian:{s}') + df['timestep'] = df['frame'] + + df = df.set_index(['timestep', 'id']) + + df = df[df['distance'] <= max_distance_to_robot] + + return df[['p', 'yaw', 'l', 'h', 'w']] + + +def jrdb_preprocess_test(input_path, output_path): + scenes = list_test_scenes(os.path.join(input_path, 'test_dataset')) + subsample = 1 + for scene in tqdm.tqdm(scenes): + scene_save_name = scene + '_test' + agents_df = get_agents_features_df_with_box( + os.path.join(input_path, 'test_dataset'), + scenes.index(scene), + max_distance_to_robot=15.0, + ) + + robot_odom = utils.get_robot( + os.path.join(input_path, 'processed', 'odometry_test'), scene + ) + + if AGENT_KEYPOINTS: + keypoints = utils.get_agents_keypoints( + os.path.join( + input_path, 'processed', 'labels', 'labels_3d_keypoints_test' + ), + scene, + ) + keypoints_df = pd.DataFrame.from_dict( + keypoints, orient='index' + ).rename_axis(['timestep', 'id']) # pytype: disable=missing-parameter # pandas-drop-duplicates-overloads + + agents_df = agents_df.join(keypoints_df) + agents_df.keypoints.fillna( + dict( + zip( + agents_df.index[agents_df['keypoints'].isnull()], + [np.ones((33, 3)) * np.nan] + * len( + agents_df.loc[ + agents_df['keypoints'].isnull(), 'keypoints' + ] + ), + ) + ), + inplace=True, + ) + + robot_df = pd.DataFrame.from_dict(robot_odom, orient='index').rename_axis( # pytype: disable=missing-parameter # pandas-drop-duplicates-overloads + ['timestep'] + ) + # Remove extra data odometry datapoints + robot_df = robot_df.iloc[agents_df.index.levels[0]] + + assert (agents_df.index.levels[0] == robot_df.index).all() + + # Subsample + assert len(agents_df.index.levels[0]) == agents_df.index.levels[0].max() + 1 + agents_df_subsampled_index = agents_df.unstack('id').iloc[::subsample].index + agents_df = ( + agents_df.unstack('id') + .iloc[::subsample] + .reset_index(drop=True) + .stack('id', dropna=True) + ) + + agents_in_odometry_df = utils.agents_to_odometry_frame( + agents_df, robot_df.iloc[::subsample].reset_index(drop=True) + ) + + agents_pos_ragged_tensor = utils.agents_pos_to_ragged_tensor( + agents_in_odometry_df + ) + agents_yaw_ragged_tensor = utils.agents_yaw_to_ragged_tensor( + agents_in_odometry_df + ) + assert ( + agents_pos_ragged_tensor.shape[0] == agents_yaw_ragged_tensor.shape[0] + ) + + tf.data.Dataset.from_tensors(agents_pos_ragged_tensor).save( + os.path.join(output_path, scene_save_name, 'agents', 'position') + ) + tf.data.Dataset.from_tensors(agents_yaw_ragged_tensor).save( + os.path.join(output_path, scene_save_name, 'agents', 'orientation') + ) + + if AGENT_KEYPOINTS: + agents_keypoints_ragged_tensor = utils.agents_keypoints_to_ragged_tensor( + agents_in_odometry_df + ) + tf.data.Dataset.from_tensors(agents_keypoints_ragged_tensor).save( + os.path.join(output_path, scene_save_name, 'agents', 'keypoints') + ) + + robot_in_odometry_df = utils.robot_to_odometry_frame(robot_df) + robot_pos = tf.convert_to_tensor( + np.stack(robot_in_odometry_df.iloc[::subsample]['p'].values).astype( + np.float32 + ) + ) + robot_orientation = tf.convert_to_tensor( + np.stack(robot_in_odometry_df.iloc[::subsample]['yaw'].values).astype( + np.float32 + ) + )[..., tf.newaxis] + + tf.data.Dataset.from_tensors(robot_pos).save( + os.path.join(output_path, scene_save_name, 'robot', 'position') + ) + tf.data.Dataset.from_tensors(robot_orientation).save( + os.path.join(output_path, scene_save_name, 'robot', 'orientation') + ) + + if POINTCLOUD: + scene_pointcloud_dict = utils.get_scene_poinclouds( + os.path.join(input_path, 'test_dataset'), scene, subsample=subsample + ) + # Remove extra timesteps + scene_pointcloud_dict = { + ts: scene_pointcloud_dict[ts] for ts in agents_df_subsampled_index + } + + scene_pc_odometry = utils.pc_to_odometry_frame( + scene_pointcloud_dict, robot_df + ) + + filtered_pc = utils.filter_agents_and_ground_from_point_cloud( + agents_in_odometry_df, scene_pc_odometry, robot_in_odometry_df + ) + + scene_pc_ragged_tensor = tf.ragged.stack(filtered_pc) + + assert ( + agents_pos_ragged_tensor.bounding_shape()[1] + == scene_pc_ragged_tensor.shape[0] + ) + + tf.data.Dataset.from_tensors(scene_pc_ragged_tensor).save( + os.path.join(output_path, scene_save_name, 'scene', 'pc'), + compression='GZIP', + ) + +if __name__ == '__main__': + jrdb_preprocess_test(INPUT_PATH, OUTPUT_PATH) diff --git a/human_scene_transformer/data/jrdb_train_detections_to_tracks.py b/human_scene_transformer/data/jrdb_train_detections_to_tracks.py index 612de6d..0b33a80 100644 --- a/human_scene_transformer/data/jrdb_train_detections_to_tracks.py +++ b/human_scene_transformer/data/jrdb_train_detections_to_tracks.py @@ -30,7 +30,7 @@ INPUT_PATH = '' OUTPUT_PATH = os.path.join( - input_path, '/processed/labels/labels_detections_3d') + INPUT_PATH, 'processed/labels/labels_detections_3d') def get_agents_3d_bounding_box_dict(input_path, scene): @@ -129,7 +129,6 @@ def jrdb_train_detections_to_tracks(input_path, output_path): scenes = utils.list_scenes( os.path.join(input_path, 'train_dataset')) for scene in tqdm.tqdm(scenes): - print(f'Processing {scene}') bb_dict = get_agents_3d_bounding_box_dict( os.path.join(input_path, 'train_dataset'), scene) bb_3d_df = pd.DataFrame.from_dict( @@ -174,7 +173,7 @@ def jrdb_train_detections_to_tracks(input_path, output_path): labels_dict = detections_to_dict(matched_df) - with os.Open(f"{output_path}/{scene}.json", 'w') as write_file: + with open(f"{output_path}/{scene}.json", 'w') as write_file: json.dump(labels_dict, write_file, indent=2, ensure_ascii=True) if __name__ == '__main__': diff --git a/human_scene_transformer/data/utils.py b/human_scene_transformer/data/utils.py index 8932815..55f6def 100644 --- a/human_scene_transformer/data/utils.py +++ b/human_scene_transformer/data/utils.py @@ -37,12 +37,13 @@ def maybe_makedir(path): def get_file_handle(path, mode='rt'): - file_handle = os.Open(path, mode=mode) + file_handle = open(path, mode) return file_handle def list_scenes(input_path): scenes = os.listdir(os.path.join(input_path, 'labels', 'labels_3d')) + scenes.sort() return [scene[:-5] for scene in scenes] diff --git a/human_scene_transformer/jrdb/eval.py b/human_scene_transformer/jrdb/eval.py index 8943075..9697e0b 100644 --- a/human_scene_transformer/jrdb/eval.py +++ b/human_scene_transformer/jrdb/eval.py @@ -156,10 +156,11 @@ def main(argv: Sequence[str]) -> None: [os.path.join(_MODEL_PATH.value, 'params', 'operative_config.gin')], None, skip_unknown=True) - logging.info('Actual gin config used:') - logging.info(gin.config_str()) + print('Actual gin config used:') + print(gin.config_str()) evaluation(_CKPT_PATH.value) if __name__ == '__main__': + logging.set_verbosity(logging.ERROR) app.run(main) diff --git a/human_scene_transformer/jrdb/eval_keypoints.py b/human_scene_transformer/jrdb/eval_keypoints.py index c54a602..a1c7f87 100644 --- a/human_scene_transformer/jrdb/eval_keypoints.py +++ b/human_scene_transformer/jrdb/eval_keypoints.py @@ -193,10 +193,11 @@ def main(argv: Sequence[str]) -> None: [os.path.join(_MODEL_PATH.value, 'params', 'operative_config.gin')], None, skip_unknown=True) - logging.info('Actual gin config used:') - logging.info(gin.config_str()) + print('Actual gin config used:') + print(gin.config_str()) evaluation(_CKPT_PATH.value) if __name__ == '__main__': + logging.set_verbosity(logging.ERROR) app.run(main) diff --git a/human_scene_transformer/pedestrians/eval.py b/human_scene_transformer/pedestrians/eval.py index 3128501..eb0f906 100644 --- a/human_scene_transformer/pedestrians/eval.py +++ b/human_scene_transformer/pedestrians/eval.py @@ -109,11 +109,12 @@ def main(argv: Sequence[str]) -> None: [os.path.join(_MODEL_PATH.value, 'params', 'operative_config.gin')], None, skip_unknown=True) - logging.info('Actual gin config used:') - logging.info(gin.config_str()) + print('Actual gin config used:') + print(gin.config_str()) evaluation(_CKPT_PATH.value) if __name__ == '__main__': + logging.set_verbosity(logging.ERROR) app.run(main) diff --git a/human_scene_transformer/train_model.py b/human_scene_transformer/train_model.py index f71b491..29cfe96 100644 --- a/human_scene_transformer/train_model.py +++ b/human_scene_transformer/train_model.py @@ -27,13 +27,14 @@ from human_scene_transformer.model import model_params as mp import tensorflow as tf -import tensorflow_models as tfm + +import official.modeling.optimization.lr_schedule as tfm_lr_schedule _LOGGING_INTERVAL = 1 def get_file_handle(path, mode='rt'): - file_handle = os.Open(path, mode=mode) + file_handle = open(path, mode) return file_handle @@ -78,8 +79,7 @@ def _get_learning_rate_schedule( decay_schedule = tf.keras.optimizers.schedules.CosineDecay( initial_learning_rate=learning_rate, decay_steps=total_steps, alpha=alpha) - return tfm.optimization.LinearWarmup( - decay_schedule, warmup_steps, 1e-10) + return tfm_lr_schedule.LinearWarmup(decay_schedule, warmup_steps, 1e-10) def train_model( diff --git a/pyproject.toml b/pyproject.toml index c4cfdb2..ba2aa35 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -27,8 +27,9 @@ dependencies = [ pandas, open3d, tensorflow, - tensorflow_models, - tensorflow_probability + tf-models-official, + tensorflow-probability, + tensorflow-graphics, gin-config, absl-py, tqdm diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..0d12751 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,12 @@ +absl-py>=1.4 +gin-config>=0.5 +numpy>=1.24 +open3d>=0.17 +pandas>=2.1 +scipy>=1.11 +tensorflow>=2.13 +tensorflow-datasets>=4.9 +tensorflow-graphics>=2021.12 +tensorflow-probability>=0.21 +tf-models-official>=2.5 +tqdm>=4.66 \ No newline at end of file