Skip to content

Latest commit

 

History

History
14 lines (12 loc) · 2.58 KB

Bibliography.md

File metadata and controls

14 lines (12 loc) · 2.58 KB

Bibliography

Here is a reverse chronological list of papers using the How2 dataset. Please submit your work by sending email, and we will add it to the list.

  • R. Sharma, S. Palaskar, A. W. Black and F. Metze, "End-to-End Speech Summarization Using Restricted Self-Attention," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 8072-8076, doi: 10.1109/ICASSP43922.2022.9747320.
  • Nils Holzenberger, Shruti Palaskar, Pranava Madhyastha, Florian Metze, and Raman Arora. Learning from multiview correlations in open-domain videos. In Proc. ICASSP. IEEE, May 2019. arXiv:arXiv: 1811.08890.
  • Ozan Çaglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, and Florian Metze. Multimodal grounding for sequence-to-sequence speech recognition. In Proc. ICASSP, Brighton, UK, May 2019. IEEE. arXiv: arXiv:1811.03865.
  • Ramon Sanabria, Shruti Palaskar, and Florian Metze. CMU Sinbad's submission for the DSTC7 AVSD challenge. In Proc. 7th Dialog State Tracking Challenge Workshop, Honolulu, HI; U.S.A., January 2019.
  • Jindrich Libovicky, Shruti Palaskar, Spandana Gella, and Florian Metze. Multimodal abstractive summarization for open-domain videos. In Proc. Visually Grounded Interaction and Language (ViGIL), Montreal; Canada, December 2018. Neural Information Processing Society (NIPS).
  • Ramon Sanabria, Ozan Çaglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, and Florian Metze. How2: A large-scale dataset for multimodal language understanding. In Proc. Visually Grounded Interaction and Language (ViGIL), Montreal; Canada, December 2018. Neural Information Processing Society (NIPS). arXiv:1811.00347.
  • Yasufumi Moriya, Gareth J. F. Jones, Ramon Sanabria, and Florian Metze. Eyes and ears together: New task for multimodal spoken content analysis. In Proc. MediaEval, Cagnes-sur-Mer, France, October 2018. http://multimediaeval.org/mediaeval2018/.
  • Abhinav Gupta, Yajie Miao, Leonardo Neves, and Florian Metze. Visual features for context-aware speech recognition. In Proc. ICASSP, New Orleans, LA; U.S.A., March 2017. IEEE. Best student paper candidate. arXiv:1712.00489.
  • Yajie Miao and Florian Metze. Open-domain audio-visual speech recognition: A deep learning approach. In Proc. INTERSPEECH, San Francisco, CA; U.S.A., September 2016. ISCA.
  • Shoou-I Yu, Lu Jiang, and Alexander Hauptmann. Instructional Videos for Unsupervised Harvesting and Learning of Action Examples. In Proc. 22nd ACM International Conference on Multimedia; Orlando, FL; U.S.A.; 2014. ACM. http://doi.acm.org/10.1145/2647868.2654997.