improve README.md

I2C9W · Nov 29, 2023 · 105eee5 · 105eee5
1 parent da20ad2
commit 105eee5
Show file tree

Hide file tree

Showing 2 changed files with 39 additions and 42 deletions.
diff --git a/README.md b/README.md
@@ -1,60 +1,50 @@
-# fromtexttotables
-## Confusion Matrix Analysis Script
+# From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents
+**Note: This documentation is currently under construction. Some sections may be updated or changed as development progresses.**
 
-This Python script, `confusionmatrix.py`, generates confusion matrices for machine learning model predictions. It compares predictions against a ground truth dataset to visualize the performance of a classification model.
+## General Setup Instructions
 
-### Setup and Run
+Before running the scripts, please ensure the following setup steps are completed:
 
-1. Ensure you have Python installed.
-2. Install required packages: `pandas`, `numpy`, `matplotlib`, `seaborn`, `sklearn`.
-3. Place your ground truth data and prediction results in accessible paths.
+1. **Python Installation**: Make sure Python is installed on your system. The scripts are compatible with Python 3.8.
+2. **Dependency Installation**: Install the required Python packages. You can do this easily by using the `requirements.txt` file provided:
+   ```bash
+   pip install -r requirements.txt
+   ```
 
-### Usage
+## Data Preparation
 
-Run the script from the command line by specifying the path to your ground truth data and predictions:
-
-```bash
-python confusionmatrix.py path/to/ground_truth.csv path/to/predictions.jsonl
-
-```
-The script will generate confusion matrices for each classification label, helping you assess your model's performance.
+Place your dataset files in accessible paths on your system.
 
+## Script-Specific Instructions
 
-## Accuracy Comparison Script
-
-`accuracy_comparison.py` is a Python script designed to compare the accuracy of different machine learning models. It calculates and visualizes the accuracy of each model for various symptoms.
-
-### Setup and Run
-
-1. Ensure Python is installed on your system.
-2. Install necessary Python packages: `pandas`, `numpy`, `matplotlib`, `sklearn`.
-3. Place your ground truth dataset in an accessible location.
-
-### Usage
-
-To use the script, run it from the command line with the path to your ground truth data:
+### MIMIC Features Extraction Script (`extract_mimic_features_from_report.py`)
+This Python script extracts and analyzes specific medical features from patient reports using a predefined grammar and prompt.
 
+#### Usage
+Run the script from the command line by specifying the path to your MIMIC ground truth data:
+
 ```bash
-python accuracy_comparison.py path/to/ground_truth.csv
+python extract_mimic_features_from_report.py path/to/MIMIC_groundtruth.csv
 ```
-The script will calculate the accuracies of specified models for different symptoms and plot the results, aiding in the comparative analysis of model performance.
-
-## MIMIC Features Extraction Script
 
-`extract_mimic_features_from_report.py` is a Python script designed to extract and analyze specific medical features from patient reports using a predefined grammar and prompt.
+### Confusion Matrix Analysis Script (`confusionmatrix.py`)
 
-### Setup and Run
+This Python script generates confusion matrices for machine learning model predictions, comparing predictions against a ground truth dataset to visualize the performance of a classification model.
 
-1. Ensure Python is installed on your system.
-2. Install necessary Python packages: `pandas`, `requests`, `tqdm`.
-3. Place your MIMIC ground truth dataset in an accessible location.
+#### Usage
 
-### Usage
-
-Run the script from the command line by specifying the path to your MIMIC ground truth data:
+Run the script from the command line by specifying the path to your ground truth data and predictions:
 
 ```bash
-python extract_mimic_features_from_report.py path/to/MIMIC_groundtruth.csv
+python confusionmatrix.py path/to/ground_truth.csv path/to/predictions.jsonl
 ```
 
-The script processes each report in the dataset, extracting specific medical features using a specialized grammar and saves the results in a JSONL file, facilitating the analysis of medical data.
+### Accuracy Comparison Script (`accuracy_comparison.py`)
+This Python script compares the accuracy of different machine learning models, calculating and visualizing the accuracy of each model for various symptoms.
+
+#### Usage
+Run the script from the command line with the path to your ground truth data:
+
+```bash
+python accuracy_comparison.py path/to/ground_truth.csv
+ ```
diff --git a/requirement.txt b/requirement.txt
@@ -0,0 +1,7 @@
+pandas
+numpy
+matplotlib
+seaborn
+sklearn
+requests
+tqdm