This project is an oral presentation at CVPR2020.
(Project Page) (PDF) (Slides) (Video)
When we humans look at a video of human-object interaction, we can not only infer what is happening but we can even extract actionable information and imitate those interactions. On the other hand, current recognition or geometric approaches lack the physicality of action representation. In this paper, we take a step towards a more physical understanding of actions. We address the problem of inferring contact points and the physical forces from videos of humans interacting with objects. One of the main challenges in tackling this problem is obtaining ground-truth labels for forces. We sidestep this problem by instead using a physics simulator for supervision. Specifically, we use a simulator to predict effects and enforce that estimated forces must lead to the same effect as depicted in the video.
Our quantitative and qualitative results show that:
- We can predict meaningful forces from videos whose effects lead to accurate imitation of the motions observe.
- By jointly optimizing for contact point and force prediction, we can improve the performance on both tasks in comparison to independent training.
- We can learn a representation from this model that generalizes to novel objects using few shot examples.
- Install requirements:
pip3 install -r requirements.txt
- Clone the repository using the command:
git clone https://github.com/ehsanik/touchTorch
cd touchTorch
- Download the dataset from here and extract it to DatasetForce.
- Download pretrained weights from here and extract it to "DatasetForce/trained_weights".
Dataset statistics is provided in the paper. The structure of the dataset is as follows:
DatasetForce
└── images
│ └── OBJECTNAME
│ └── image_*.jpeg
└── annotations
│ ├── [test/train/val]_time_to_clip_ind.json
│ ├── [test/train/val]_cleaned_start_states.json
│ ├── time_to_keypoint_fps_{FPS}.json
│ ├── time_to_obj_state_fps_{FPS}.json
│ └── clean_clip_to_contact_point.json
└── trained_weights
│ └── *.pytar
└── objects_16k ----- YCB objects modified for PyBullet
└── OBJECTNAME
└── google_16k
├── textured.urdf ----- URDF to load in PyBullet
├── textured_big.obj ----- Object mesh
├── textured_big.obj.mtl ----- Corresponding texture
├── textured_bigconvex.obj ----- Simplified mesh for collision
│ (Calculated using [v-hacd])
└── *_keypoints.json ----- Keypoints on the object mesh
To train your own model:
python3 main.py --title joint_training --sequence_length 10 --gpu-ids 0 --number_of_cp 5 \
--model SeparateTowerModel --dataset DatasetWAugmentation --loss KPProjectionCPPredictionLoss \
--object_list ALL --data DatasetForce
See scripts/train_scripts.sh
for additional training scripts.
To test using the pretrained model and reproduce the results in the paper:
python3 main.py --title test_all_joint_training --sequence_length 10 --gpu-ids 0 \
--number_of_cp 5 --model SeparateTowerModel --dataset DatasetWAugmentation \
--loss KPProjectionCPPredictionLoss --object_list ALL --data DatasetForce \
--reload DatasetForce/trained_weights/all_obj_end2end.pytar test
See scripts/test_w_weights.sh
for additional scripts.
If you find this project useful in your research, please consider citing:
@inproceedings{ehsani2020force,
title={Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects},
author={Ehsani, Kiana and Tulsiani, Shubham and Gupta, Saurabh and Farhadi, Ali and Gupta, Abhinav},
booktitle={CVPR},
year={2020}
}