You Only Watch Once (YOWO)

PyTorch implementation of the article "You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization". Code will be uploaded soon!

In this work, we present YOWO (You Only Watch Once), a unified CNN architecture for real-time spatiotemporal action localization in video stream. YOWO is a single-stage framework, the input is a clip consisting of several successive frames in a video, while the output predicts bounding box positions as well as corresponding class labels in current frame. Afterwards, with specific strategy, these detections can be linked together to generate Action Tubes in the whole video.

Since we do not separate human detection and action classification procedures, the whole network can be optimized by a joint loss in an end-to-end framework. We have carried out a series of comparative evaluations on two challenging representative datasets UCF101-24 and J-HMDB-21. Our approach outperforms the other state-of-the-art results while retaining real-time capability, providing 34 frames-per-second on 16-frames input clips and 62 frames-per-second on 8-frames input clips.

Citation

If you use this code or pre-trained models, please cite the following:

@InProceedings{kopuklu2019yowo,
title={You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization},
author={K{\"o}p{\"u}kl{\"u}, Okan and Wei, Xiangyu and Rigoll, Gerhard},
journal={arXiv preprint arXiv:1911.06644},
year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
examples		examples
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

You Only Watch Once (YOWO)

Citation

About

Releases

Packages

LKCN/YOWO

Folders and files

Latest commit

History

Repository files navigation

You Only Watch Once (YOWO)

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages