From 105eee5df98ba452b4ef455e64d1eb037223b60e Mon Sep 17 00:00:00 2001
From: Jeff <jiefu.zhu@tu-dresden.de>
Date: Wed, 29 Nov 2023 13:04:12 +0100
Subject: [PATCH] improve README.md

---
 README.md       | 74 +++++++++++++++++++++----------------------------
 requirement.txt |  7 +++++
 2 files changed, 39 insertions(+), 42 deletions(-)
 create mode 100644 requirement.txt

diff --git a/README.md b/README.md
index bf90e99..bbc1e80 100644
--- a/README.md
+++ b/README.md
@@ -1,60 +1,50 @@
-# fromtexttotables
-## Confusion Matrix Analysis Script
+# From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents
+**Note: This documentation is currently under construction. Some sections may be updated or changed as development progresses.**
 
-This Python script, `confusionmatrix.py`, generates confusion matrices for machine learning model predictions. It compares predictions against a ground truth dataset to visualize the performance of a classification model.
+## General Setup Instructions
 
-### Setup and Run
+Before running the scripts, please ensure the following setup steps are completed:
 
-1. Ensure you have Python installed.
-2. Install required packages: `pandas`, `numpy`, `matplotlib`, `seaborn`, `sklearn`.
-3. Place your ground truth data and prediction results in accessible paths.
+1. **Python Installation**: Make sure Python is installed on your system. The scripts are compatible with Python 3.8.
+2. **Dependency Installation**: Install the required Python packages. You can do this easily by using the `requirements.txt` file provided:
+   ```bash
+   pip install -r requirements.txt
+   ```
 
-### Usage
+## Data Preparation
 
-Run the script from the command line by specifying the path to your ground truth data and predictions:
-
-```bash
-python confusionmatrix.py path/to/ground_truth.csv path/to/predictions.jsonl
-
-```
-The script will generate confusion matrices for each classification label, helping you assess your model's performance.
+Place your dataset files in accessible paths on your system.
 
+## Script-Specific Instructions
 
-## Accuracy Comparison Script
-
-`accuracy_comparison.py` is a Python script designed to compare the accuracy of different machine learning models. It calculates and visualizes the accuracy of each model for various symptoms.
-
-### Setup and Run
-
-1. Ensure Python is installed on your system.
-2. Install necessary Python packages: `pandas`, `numpy`, `matplotlib`, `sklearn`.
-3. Place your ground truth dataset in an accessible location.
-
-### Usage
-
-To use the script, run it from the command line with the path to your ground truth data:
+### MIMIC Features Extraction Script (`extract_mimic_features_from_report.py`)
+This Python script extracts and analyzes specific medical features from patient reports using a predefined grammar and prompt.
 
+#### Usage
+Run the script from the command line by specifying the path to your MIMIC ground truth data:
+    
 ```bash
-python accuracy_comparison.py path/to/ground_truth.csv
+python extract_mimic_features_from_report.py path/to/MIMIC_groundtruth.csv
 ```
-The script will calculate the accuracies of specified models for different symptoms and plot the results, aiding in the comparative analysis of model performance.
-
-## MIMIC Features Extraction Script
 
-`extract_mimic_features_from_report.py` is a Python script designed to extract and analyze specific medical features from patient reports using a predefined grammar and prompt.
+### Confusion Matrix Analysis Script (`confusionmatrix.py`)
 
-### Setup and Run
+This Python script generates confusion matrices for machine learning model predictions, comparing predictions against a ground truth dataset to visualize the performance of a classification model.
 
-1. Ensure Python is installed on your system.
-2. Install necessary Python packages: `pandas`, `requests`, `tqdm`.
-3. Place your MIMIC ground truth dataset in an accessible location.
+#### Usage
 
-### Usage
-
-Run the script from the command line by specifying the path to your MIMIC ground truth data:
+Run the script from the command line by specifying the path to your ground truth data and predictions:
 
 ```bash
-python extract_mimic_features_from_report.py path/to/MIMIC_groundtruth.csv
+python confusionmatrix.py path/to/ground_truth.csv path/to/predictions.jsonl
 ```
 
-The script processes each report in the dataset, extracting specific medical features using a specialized grammar and saves the results in a JSONL file, facilitating the analysis of medical data.
\ No newline at end of file
+### Accuracy Comparison Script (`accuracy_comparison.py`)
+This Python script compares the accuracy of different machine learning models, calculating and visualizing the accuracy of each model for various symptoms.
+
+#### Usage
+Run the script from the command line with the path to your ground truth data:
+    
+```bash
+python accuracy_comparison.py path/to/ground_truth.csv
+ ```
\ No newline at end of file
diff --git a/requirement.txt b/requirement.txt
new file mode 100644
index 0000000..a389627
--- /dev/null
+++ b/requirement.txt
@@ -0,0 +1,7 @@
+pandas
+numpy
+matplotlib
+seaborn
+sklearn
+requests
+tqdm