TLDR; The authors present a novel Attention-over-Attention (AoA) model for Machine Comprehension. Given a document and cloze-style question, the model predicts a single-word answer. The model,
- Embeds both context and query using a bidirectional GRU
- Computes a pairwise matching matrix between document and query words
- Computes query-to-document attention values
- Computes document-to-que attention averages for each query word
- Multiplies the two attention vectors to get final attention scores for words in the document
- Maps attention results back into the vocabulary space
The authors evaluate the model on the CNN News and CBTest Question Answering datasets, obtaining state-of-the-art results and beating other models including EpiReader, ASReader, etc.
- Very good model visualization in the paper
- I like that this model is much simpler than EpiReader while also performing better