- Data: Contains our own dummy data and provided datasets.
- Notebooks: All notebooks corresponding to the 4 problem statements.
- Models: Dumped models that could be uploaded without breaching GitHub limits or using LFS.
- attendance.csv: Dummy data used for PS-3.
- exhib_fin.csv: Edited exhibitors (1).csv. Used in PS-1 & 2.
- exhibitors (1).csv : Provided dataset.
- reviews.csv: Gemma generated reviews, used in PS-2.
- sponsor.csv: Dummy data used for PS-4.
- vis_fin.csv: Edited visitors.csv, added about_me which was generated using preexisting features. Used in PS-1 & 2.
- visitors.csv: Provided dataset.
- PS1_1.ipynb: Part 1 of PS-1, contains minor EDA and addition of about_me column.
- PS1_2.ipynb: Part 2 of PS-2, contains model creation and usage, powered by Gemma.
- PS2_1.ipynb: Part 1 of PS-2, contains creation of reviews, ratings and keyword extraction from the same.
- PS2_2.ipynb: Part 2 of PS-2, contains creation of recommendation system using TF-IDF technique.
- PS3.ipynb: Implementation of PS-3, contains EDA, Feature Engineering and Feature Selection, along with regression model creation.
- PS4.ipynb: Implementation of PS-4, contains EDA, Feature Engineering and Feature Selection, along with regression model creation.
- model2.pkl: Model for PS-2.
- model3.pkl: Model for PS-3.
- model4.pkl: Model for PS-4.
- vectorizer.pkl : Fitted TF-IDF csr-matrix, can be used for instant vectorization.