Members: Brandon Kates, Zhao Shen, Arnav Ghosh
Objective: To create a system capable of detecting duplicate questions on Q&A platforms.
We expect our approach to help centralize the available knowledge on a single question/issue and direct users with questions that have already been answered to the appropriate resource.
We will test a variety of duplicate question identification methods on the Quora question pairs dataset, and hope to eventually apply our findings to the classroom Q&A platform Piazza to improve the Cornell student experience.
Below is the data required to successfully train/run all of the models.
In the current directory ("DuQI"), create a folder named "data" and populate it with:
- training and test data from Quora Question Pairs.
Final directory should look like:
- data
- Quora training/test CSV files