DuQI: Duplicate Question Identification

Members: Brandon Kates, Zhao Shen, Arnav Ghosh

Objective: To create a system capable of detecting duplicate questions on Q&A platforms.

We expect our approach to help centralize the available knowledge on a single question/issue and direct users with questions that have already been answered to the appropriate resource.

We will test a variety of duplicate question identification methods on the Quora question pairs dataset, and hope to eventually apply our findings to the classroom Q&A platform Piazza to improve the Cornell student experience.

Data Requirements

Below is the data required to successfully train/run all of the models.

In the current directory ("DuQI"), create a folder named "data" and populate it with:

training and test data from Quora Question Pairs.

Final directory should look like:

data
- Quora training/test CSV files

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
archive		archive
data_exploration		data_exploration
lang_models		lang_models
nn		nn
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
DuQI_research_paper.pdf		DuQI_research_paper.pdf
README.md		README.md
progress.md		progress.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DuQI: Duplicate Question Identification

Data Requirements

About

Releases

Packages

Contributors 5

Languages

CornellDataScience/DuQI

Folders and files

Latest commit

History

Repository files navigation

DuQI: Duplicate Question Identification

Data Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages