This paper involves two datasets: medical abstract (in data/medical_text) and radiology report (in data/MiMic) datasets.
all_ Data.csv contains all human-written data and ChatGPT-generated data
prompt*_seed*_train.csv, prompt*_seed*_val.csv, prompt*_seed*_test.csv is the training set, validation set, and testing set for different groups.
pip install -r requirements.txt
- vocabulary and sentence analysis:
python word_count.py
- Part-of-speech analysis:
python pos_analysis.py
- Dependency parsing:
python dependency_analysis.py
- Sentiment analysis:
python sentiment_analysis.py
- Text perplexity:
python PPL_distribution.py
- Perplexity-CLS:
python ppl_cls.py
- CART:
python CART_cls.py
- XGBoost:
python xgboost_cls.py
- BERT:
python BERT_cls.py
'