You can set up the environment using either venv or Conda.
-
Clone the repository and navigate to the project directory.
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install pandas numpy matplotlib seaborn torch torchvision torchaudio scikit-learn transformers tqdm pyyaml flask
-
Clone the repository and navigate to the project directory.
-
Create a Conda environment and activate it:
conda create --name paper_classifier python=3.8 conda activate paper_classifier
-
Install the required packages using pip:
pip install -r requirements.txt
-
Ensure you have the following files in your project directory:
config.yaml
: Configuration filecc_data.parquet
: Training datacc_test.parquet
: Test datarequirements.txt
: List of required packages
A pre-trained model is available for immediate use. You can download it from the following link:
After downloading, place the model file in the appropriate directory as specified in your config.yaml
file.
-
Run all the cells in the noteboook.
-
The trained model will be saved as specified in the
config.yaml
file (bert_classifier.pth
by default)
-
Start the Flask server:
python app.py
-
Open a web browser and go to
http://127.0.0.1:5000
(you can Ctrl+click this link in most consoles). -
Upload a
.parquet
file containing scientific paper data. -
The server will process the file and return a
predictions.parquet
file with the classification results (straight to downloads).