From 105eee5df98ba452b4ef455e64d1eb037223b60e Mon Sep 17 00:00:00 2001 From: Jeff Date: Wed, 29 Nov 2023 13:04:12 +0100 Subject: [PATCH] improve README.md --- README.md | 74 +++++++++++++++++++++---------------------------- requirement.txt | 7 +++++ 2 files changed, 39 insertions(+), 42 deletions(-) create mode 100644 requirement.txt diff --git a/README.md b/README.md index bf90e99..bbc1e80 100644 --- a/README.md +++ b/README.md @@ -1,60 +1,50 @@ -# fromtexttotables -## Confusion Matrix Analysis Script +# From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents +**Note: This documentation is currently under construction. Some sections may be updated or changed as development progresses.** -This Python script, `confusionmatrix.py`, generates confusion matrices for machine learning model predictions. It compares predictions against a ground truth dataset to visualize the performance of a classification model. +## General Setup Instructions -### Setup and Run +Before running the scripts, please ensure the following setup steps are completed: -1. Ensure you have Python installed. -2. Install required packages: `pandas`, `numpy`, `matplotlib`, `seaborn`, `sklearn`. -3. Place your ground truth data and prediction results in accessible paths. +1. **Python Installation**: Make sure Python is installed on your system. The scripts are compatible with Python 3.8. +2. **Dependency Installation**: Install the required Python packages. You can do this easily by using the `requirements.txt` file provided: + ```bash + pip install -r requirements.txt + ``` -### Usage +## Data Preparation -Run the script from the command line by specifying the path to your ground truth data and predictions: - -```bash -python confusionmatrix.py path/to/ground_truth.csv path/to/predictions.jsonl - -``` -The script will generate confusion matrices for each classification label, helping you assess your model's performance. +Place your dataset files in accessible paths on your system. +## Script-Specific Instructions -## Accuracy Comparison Script - -`accuracy_comparison.py` is a Python script designed to compare the accuracy of different machine learning models. It calculates and visualizes the accuracy of each model for various symptoms. - -### Setup and Run - -1. Ensure Python is installed on your system. -2. Install necessary Python packages: `pandas`, `numpy`, `matplotlib`, `sklearn`. -3. Place your ground truth dataset in an accessible location. - -### Usage - -To use the script, run it from the command line with the path to your ground truth data: +### MIMIC Features Extraction Script (`extract_mimic_features_from_report.py`) +This Python script extracts and analyzes specific medical features from patient reports using a predefined grammar and prompt. +#### Usage +Run the script from the command line by specifying the path to your MIMIC ground truth data: + ```bash -python accuracy_comparison.py path/to/ground_truth.csv +python extract_mimic_features_from_report.py path/to/MIMIC_groundtruth.csv ``` -The script will calculate the accuracies of specified models for different symptoms and plot the results, aiding in the comparative analysis of model performance. - -## MIMIC Features Extraction Script -`extract_mimic_features_from_report.py` is a Python script designed to extract and analyze specific medical features from patient reports using a predefined grammar and prompt. +### Confusion Matrix Analysis Script (`confusionmatrix.py`) -### Setup and Run +This Python script generates confusion matrices for machine learning model predictions, comparing predictions against a ground truth dataset to visualize the performance of a classification model. -1. Ensure Python is installed on your system. -2. Install necessary Python packages: `pandas`, `requests`, `tqdm`. -3. Place your MIMIC ground truth dataset in an accessible location. +#### Usage -### Usage - -Run the script from the command line by specifying the path to your MIMIC ground truth data: +Run the script from the command line by specifying the path to your ground truth data and predictions: ```bash -python extract_mimic_features_from_report.py path/to/MIMIC_groundtruth.csv +python confusionmatrix.py path/to/ground_truth.csv path/to/predictions.jsonl ``` -The script processes each report in the dataset, extracting specific medical features using a specialized grammar and saves the results in a JSONL file, facilitating the analysis of medical data. \ No newline at end of file +### Accuracy Comparison Script (`accuracy_comparison.py`) +This Python script compares the accuracy of different machine learning models, calculating and visualizing the accuracy of each model for various symptoms. + +#### Usage +Run the script from the command line with the path to your ground truth data: + +```bash +python accuracy_comparison.py path/to/ground_truth.csv + ``` \ No newline at end of file diff --git a/requirement.txt b/requirement.txt new file mode 100644 index 0000000..a389627 --- /dev/null +++ b/requirement.txt @@ -0,0 +1,7 @@ +pandas +numpy +matplotlib +seaborn +sklearn +requests +tqdm