This repo is a experimental combination of SVIP
:https://github.com/svip-lab/SVIP-Sequence-VerIfication-for-Procedures-in-Videos
and VideoAlignment
: https://github.com/hadjisma/VideoAlignment.
Main pipeline uses SVIP so the setup and scripts are copied from there. Smooth DTW loss defined in utils/smoothDTW.py
. Training pipeline is modified to only use Smooth DTW loss. Some example figures in figs
. dist_matrix_*.png
corresponds to distance matrix from smooth DTW. dtw_matrix_*.png
corresponds to DTW matrix computed through DP. frames_*.png
corresponds to the frame input pairing including labels.
- python 3.6
- pytorch 1.7.1
- cuda 10.2
-
Clone the repo and install dependencies.
git clone https://github.com/svip-lab/SVIP-Sequence-VerIfication-for-Procedures-in-Videos.git cd VIP-Sequence-VerIfication-for-Procedures-in-Videos pip install requirements.txt
-
Download the pretrained model.
Link:here
Extraction code:2555
Please refer to here for detailed instructions.
We have provided the default configuration files for reproducing our results. Try these commands to play with this project.
- For training:
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --config configs/train_resnet_config.yml
- For evaluation:
Note that we use L2 distance while evaluating on COIN-SV, otherwise NormL2.
CUDA_VISIBLE_DEVICES=0 python eval.py --config configs/eval_resnet_config.yml --root_path [model&log folder] --dist [L2/NormL2] --log_name [xxx]
We provide checkpoints for each dataset trained with this re-organized codebase.
Notice
: The reproduced performances are occassionally higher or lower (within a reasonable range) than the results reported in the paper.
Dataset | Split | Papar | Reproduce | ckpt |
---|---|---|---|---|
COIN-SV | val | 56.81, 0.4005 | 58.27, 0.4667 | here |
test | 51.13, 0.4098 | 51.55, 0.4658 | ||
Diving48-SV | val | 91.91, 1.0642 | 91.69, 1.0928 | here |
test | 83.11, 0.6009 | 84.28, 0.6193 | ||
CSV | test | 83.02, 0.4193 | 82.88, 0.4474 | here |
If you find this repo helpful, please cite our paper:
@inproceedings{qian2022svip,
title={SVIP: Sequence VerIfication for Procedures in Videos},
author={Qian, Yicheng and Luo, Weixin and Lian, Dongze and Tang, Xu and Zhao, Peilin and Gao, Shenghua},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={19890--19902},
year={2022}
}