-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
da20ad2
commit 105eee5
Showing
2 changed files
with
39 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,60 +1,50 @@ | ||
# fromtexttotables | ||
## Confusion Matrix Analysis Script | ||
# From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents | ||
**Note: This documentation is currently under construction. Some sections may be updated or changed as development progresses.** | ||
|
||
This Python script, `confusionmatrix.py`, generates confusion matrices for machine learning model predictions. It compares predictions against a ground truth dataset to visualize the performance of a classification model. | ||
## General Setup Instructions | ||
|
||
### Setup and Run | ||
Before running the scripts, please ensure the following setup steps are completed: | ||
|
||
1. Ensure you have Python installed. | ||
2. Install required packages: `pandas`, `numpy`, `matplotlib`, `seaborn`, `sklearn`. | ||
3. Place your ground truth data and prediction results in accessible paths. | ||
1. **Python Installation**: Make sure Python is installed on your system. The scripts are compatible with Python 3.8. | ||
2. **Dependency Installation**: Install the required Python packages. You can do this easily by using the `requirements.txt` file provided: | ||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
### Usage | ||
## Data Preparation | ||
|
||
Run the script from the command line by specifying the path to your ground truth data and predictions: | ||
|
||
```bash | ||
python confusionmatrix.py path/to/ground_truth.csv path/to/predictions.jsonl | ||
|
||
``` | ||
The script will generate confusion matrices for each classification label, helping you assess your model's performance. | ||
Place your dataset files in accessible paths on your system. | ||
|
||
## Script-Specific Instructions | ||
|
||
## Accuracy Comparison Script | ||
|
||
`accuracy_comparison.py` is a Python script designed to compare the accuracy of different machine learning models. It calculates and visualizes the accuracy of each model for various symptoms. | ||
|
||
### Setup and Run | ||
|
||
1. Ensure Python is installed on your system. | ||
2. Install necessary Python packages: `pandas`, `numpy`, `matplotlib`, `sklearn`. | ||
3. Place your ground truth dataset in an accessible location. | ||
|
||
### Usage | ||
|
||
To use the script, run it from the command line with the path to your ground truth data: | ||
### MIMIC Features Extraction Script (`extract_mimic_features_from_report.py`) | ||
This Python script extracts and analyzes specific medical features from patient reports using a predefined grammar and prompt. | ||
|
||
#### Usage | ||
Run the script from the command line by specifying the path to your MIMIC ground truth data: | ||
|
||
```bash | ||
python accuracy_comparison.py path/to/ground_truth.csv | ||
python extract_mimic_features_from_report.py path/to/MIMIC_groundtruth.csv | ||
``` | ||
The script will calculate the accuracies of specified models for different symptoms and plot the results, aiding in the comparative analysis of model performance. | ||
|
||
## MIMIC Features Extraction Script | ||
|
||
`extract_mimic_features_from_report.py` is a Python script designed to extract and analyze specific medical features from patient reports using a predefined grammar and prompt. | ||
### Confusion Matrix Analysis Script (`confusionmatrix.py`) | ||
|
||
### Setup and Run | ||
This Python script generates confusion matrices for machine learning model predictions, comparing predictions against a ground truth dataset to visualize the performance of a classification model. | ||
|
||
1. Ensure Python is installed on your system. | ||
2. Install necessary Python packages: `pandas`, `requests`, `tqdm`. | ||
3. Place your MIMIC ground truth dataset in an accessible location. | ||
#### Usage | ||
|
||
### Usage | ||
|
||
Run the script from the command line by specifying the path to your MIMIC ground truth data: | ||
Run the script from the command line by specifying the path to your ground truth data and predictions: | ||
|
||
```bash | ||
python extract_mimic_features_from_report.py path/to/MIMIC_groundtruth.csv | ||
python confusionmatrix.py path/to/ground_truth.csv path/to/predictions.jsonl | ||
``` | ||
|
||
The script processes each report in the dataset, extracting specific medical features using a specialized grammar and saves the results in a JSONL file, facilitating the analysis of medical data. | ||
### Accuracy Comparison Script (`accuracy_comparison.py`) | ||
This Python script compares the accuracy of different machine learning models, calculating and visualizing the accuracy of each model for various symptoms. | ||
|
||
#### Usage | ||
Run the script from the command line with the path to your ground truth data: | ||
|
||
```bash | ||
python accuracy_comparison.py path/to/ground_truth.csv | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
pandas | ||
numpy | ||
matplotlib | ||
seaborn | ||
sklearn | ||
requests | ||
tqdm |