twitter-relatedness-analysis

in this code we used libebarys:

preprocessed data was saved at dataF.csv which the punctuation, stopwords, and links have been removed from data and words have stemmed.

7 models have trained with oversampled data(to balance the number of class samples) and the prediction results for validation data was:

model	accuracy
RNN	97%
CNN	96%
sequential NN model	94%
Linear Support Vector Machine	93%
Random Forest Classifier	90%
Multinomial Navy Base	88%
Logistic Regression	84%

more details about models:

random forest classification model max depth is equal to 5 and The number of trees in the forest is equal to 200.

and the summary of the sequential model is:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
code.ipynb		code.ipynb
sequential_model.ipynb		sequential_model.ipynb