Scientific Paper Classifier

Setup

You can set up the environment using either venv or Conda.

Option 1: venv

Clone the repository and navigate to the project directory.

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:

pip install pandas numpy matplotlib seaborn torch torchvision torchaudio scikit-learn transformers tqdm pyyaml flask

Option 2: Conda (it works on my machine)

Clone the repository and navigate to the project directory.

Create a Conda environment and activate it:

conda create --name paper_classifier python=3.8
conda activate paper_classifier

Install the required packages using pip:
```
pip install -r requirements.txt
```
Ensure you have the following files in your project directory:
- config.yaml: Configuration file
- cc_data.parquet: Training data
- cc_test.parquet: Test data
- requirements.txt: List of required packages

Pre-trained Model

A pre-trained model is available for immediate use. You can download it from the following link:

Pre-trained Model

After downloading, place the model file in the appropriate directory as specified in your config.yaml file.

Training

Run all the cells in the noteboook.
The trained model will be saved as specified in the config.yaml file (bert_classifier.pth by default)

Running the Prediction Server

Start the Flask server:
```
python app.py
```
Open a web browser and go to http://127.0.0.1:5000 (you can Ctrl+click this link in most consoles).
Upload a .parquet file containing scientific paper data.
The server will process the file and return a predictions.parquet file with the classification results (straight to downloads).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scientific Paper Classifier

Setup

Option 1: venv

Option 2: Conda (it works on my machine)

Pre-trained Model

Training

Running the Prediction Server

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
cc_data.parquet		cc_data.parquet
cc_test.parquet		cc_test.parquet
config.yaml		config.yaml
main.ipynb		main.ipynb
predictions.parquet		predictions.parquet
requirements.txt		requirements.txt

gedasv/scientific-paper-classifier

Folders and files

Latest commit

History

Repository files navigation

Scientific Paper Classifier

Setup

Option 1: venv

Option 2: Conda (it works on my machine)

Pre-trained Model

Training

Running the Prediction Server

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages