Skip to content

Commit

Permalink
improve README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Ultimate-Storm committed Nov 29, 2023
1 parent da20ad2 commit 105eee5
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 42 deletions.
74 changes: 32 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,50 @@
# fromtexttotables
## Confusion Matrix Analysis Script
# From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents
**Note: This documentation is currently under construction. Some sections may be updated or changed as development progresses.**

This Python script, `confusionmatrix.py`, generates confusion matrices for machine learning model predictions. It compares predictions against a ground truth dataset to visualize the performance of a classification model.
## General Setup Instructions

### Setup and Run
Before running the scripts, please ensure the following setup steps are completed:

1. Ensure you have Python installed.
2. Install required packages: `pandas`, `numpy`, `matplotlib`, `seaborn`, `sklearn`.
3. Place your ground truth data and prediction results in accessible paths.
1. **Python Installation**: Make sure Python is installed on your system. The scripts are compatible with Python 3.8.
2. **Dependency Installation**: Install the required Python packages. You can do this easily by using the `requirements.txt` file provided:
```bash
pip install -r requirements.txt
```

### Usage
## Data Preparation

Run the script from the command line by specifying the path to your ground truth data and predictions:

```bash
python confusionmatrix.py path/to/ground_truth.csv path/to/predictions.jsonl

```
The script will generate confusion matrices for each classification label, helping you assess your model's performance.
Place your dataset files in accessible paths on your system.

## Script-Specific Instructions

## Accuracy Comparison Script

`accuracy_comparison.py` is a Python script designed to compare the accuracy of different machine learning models. It calculates and visualizes the accuracy of each model for various symptoms.

### Setup and Run

1. Ensure Python is installed on your system.
2. Install necessary Python packages: `pandas`, `numpy`, `matplotlib`, `sklearn`.
3. Place your ground truth dataset in an accessible location.

### Usage

To use the script, run it from the command line with the path to your ground truth data:
### MIMIC Features Extraction Script (`extract_mimic_features_from_report.py`)
This Python script extracts and analyzes specific medical features from patient reports using a predefined grammar and prompt.

#### Usage
Run the script from the command line by specifying the path to your MIMIC ground truth data:

```bash
python accuracy_comparison.py path/to/ground_truth.csv
python extract_mimic_features_from_report.py path/to/MIMIC_groundtruth.csv
```
The script will calculate the accuracies of specified models for different symptoms and plot the results, aiding in the comparative analysis of model performance.

## MIMIC Features Extraction Script

`extract_mimic_features_from_report.py` is a Python script designed to extract and analyze specific medical features from patient reports using a predefined grammar and prompt.
### Confusion Matrix Analysis Script (`confusionmatrix.py`)

### Setup and Run
This Python script generates confusion matrices for machine learning model predictions, comparing predictions against a ground truth dataset to visualize the performance of a classification model.

1. Ensure Python is installed on your system.
2. Install necessary Python packages: `pandas`, `requests`, `tqdm`.
3. Place your MIMIC ground truth dataset in an accessible location.
#### Usage

### Usage

Run the script from the command line by specifying the path to your MIMIC ground truth data:
Run the script from the command line by specifying the path to your ground truth data and predictions:

```bash
python extract_mimic_features_from_report.py path/to/MIMIC_groundtruth.csv
python confusionmatrix.py path/to/ground_truth.csv path/to/predictions.jsonl
```

The script processes each report in the dataset, extracting specific medical features using a specialized grammar and saves the results in a JSONL file, facilitating the analysis of medical data.
### Accuracy Comparison Script (`accuracy_comparison.py`)
This Python script compares the accuracy of different machine learning models, calculating and visualizing the accuracy of each model for various symptoms.

#### Usage
Run the script from the command line with the path to your ground truth data:

```bash
python accuracy_comparison.py path/to/ground_truth.csv
```
7 changes: 7 additions & 0 deletions requirement.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pandas
numpy
matplotlib
seaborn
sklearn
requests
tqdm

0 comments on commit 105eee5

Please sign in to comment.