Challenge 2: Alignment

Sub-Challenge 2a: Connections

Definition: Identifying connections between elements of multiple modalities

Supervised Approach
Unsupervised Approach

Sub-Challenge 2b: Aligned Representations

Definition: Model all cross-modal connections and interactions to learn better representations

Aligned Representations – A Popular Approach

Aligned Representation – Early Fusion

Li et al., VisualBERT: A Simple and Performant Baseline for Vision and Language, arxiv 2019

Aligned Representations – Two-Way Directional Alignment

Multimodal Transformer – Pairwise Cross-Modal

Cross-Modal Transformer Module (V -> L)

Example of Two-Way Directional Alignment

Lu, Jiasen, et al. "Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks." arXiv (August 6, 2019).
Tan, Hao, and Mohit Bansal. "Lxmert: Learning cross-modality encoder representations from transformers." arXiv (August 20, 2019).

Aligned Representations with Graph Networks

First advantage: Does not require all elements to be connected
Second advantage: Allows different edge functions for modality and temporal connections

Modal-Temporal Attention Graph

Yang et al., MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimo23dal Language Sequences, NAACL 2021

Sub-Challenge 2c: Segmentation

Definition: Handle ambiguity in segmentation and element’s granularity during alignment

Alignment and Segmentation – A Simple Approach

Alignment and Segmentation – A Classification Approach

Grave et al., Connectionist Temporal Classification: Labelling Unsegmented Seque26nce Data with Recurrent Neural Networks, ICML 2006

Representation and Segmentation – Cluster-based Approaches

Hsu et al., HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units, arxiv 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3-alignment.md

3-alignment.md

Challenge 2: Alignment

Sub-Challenge 2a: Connections

Sub-Challenge 2b: Aligned Representations

Aligned Representations – A Popular Approach

Aligned Representation – Early Fusion

Aligned Representations – Two-Way Directional Alignment

Multimodal Transformer – Pairwise Cross-Modal

Cross-Modal Transformer Module (V -> L)

Example of Two-Way Directional Alignment

Aligned Representations with Graph Networks

Modal-Temporal Attention Graph

Sub-Challenge 2c: Segmentation

Alignment and Segmentation – A Simple Approach

Alignment and Segmentation – A Classification Approach

Representation and Segmentation – Cluster-based Approaches

Files

3-alignment.md

Latest commit

History

3-alignment.md

File metadata and controls

Challenge 2: Alignment

Sub-Challenge 2a: Connections

Sub-Challenge 2b: Aligned Representations

Aligned Representations – A Popular Approach

Aligned Representation – Early Fusion

Aligned Representations – Two-Way Directional Alignment

Multimodal Transformer – Pairwise Cross-Modal

Cross-Modal Transformer Module (V -> L)

Example of Two-Way Directional Alignment

Aligned Representations with Graph Networks

Modal-Temporal Attention Graph

Sub-Challenge 2c: Segmentation

Alignment and Segmentation – A Simple Approach

Alignment and Segmentation – A Classification Approach

Representation and Segmentation – Cluster-based Approaches