AI Annotator

AI Annotator is primarily designed for prototyping and testing annotation or classification tasks using Large Language Models (LLMs). While it doesn't offer extensive functionality or customizability, it provides a streamlined solution for quick experimentation. Additionally, it serves as a wrapper for a Vector Database (currently ChromaDB), making it adaptable for any task that leverages retrieval-augmented generation (RAG).

It supports a range of models, including both LLMs and embedding models. This includes locally run solutions such as Hugging Face and Ollama, with the flexibility to easily integrate custom models. For a more lightweight setup, API-based options like OpenAI and Mistral are supported, offering a simpler way to get started without the need for local deployment. Standardized task/instruction formats and automated parsing of model outputs are also included.

Installation

Clone the Repository
Clone the repository to your local machine:
```
git clone https://github.com/nsschw/ai_annotator.git
```
Installing the package
Install the package using pip:
```
pip install -e ai_annontator
```
or if you want to use locally hosted models (Hugging Face and Transformers):
```
pip install -e ai_annontator[local]
```

How to Use

Import Necessary Modules
Import the relevant classes and functions from ai_annotator or other libraries:

from ai_annotator import AnnotationProject, OllamaModel, HuggingFaceEmbeddingModel, AnnotationConfig

Define Your Task
Create a task description to define what you're annotating or classifying. For example:

task = """
You will be given an abstract of a study. Your task is to determine whether the study is valid based on the following criteria:
1. The study must be a meta-analysis.
2. The study must examine the association between life satisfaction, well-being, or subjective well-being and any other variable.

Structure your feedback as follows:

Feedback::
Evaluation (Your reasoning whether this is a valid article or not)
Valid: (1 if not valid, 1 if valid)
"""

Configure Models
Set up the LLM and embedding models. This example shows how to use both Ollama and Hugging Face models:

model = OllamaModel(host="http://ollama:11434", model="llama3.1:7b")
emb_model = HuggingFaceEmbeddingModel("Alibaba-NLP/gte-Qwen2-1.5B-instruct")

Set Project Configuration
Define the configuration for the annotation project using the AnnotationConfig class. Specify the data path, task description, and models to use:

project_config = AnnotationConfig(
    db_path="SecondOrderMetaStudy",
    task_description=task,
    embedding_model=emb_model,
    model=model
)

Create Annotation Project
Initialize the AnnotationProject with your configuration and add data from a CSV file:

ap = AnnotationProject(config=project_config)
ap.add_data_from_csv("abtracts.csv", column_mapping={"input": "notes_abstract", "output": "valid_abstract"})

Generate Reasoning
Use a reasoning prompt to generate reasoning for each data point:

ap.generate_reasoning(reasoning_prompt="What are the clues that lead to: [{output}] being correct in the document: [{input}] with the task being: [{task_description}].")

Run Predictions
Finally, run predictions on the test dataset:

test_cases = ap.predict(["Test_Case_1", "Test_Case_2"...], number_demonstrations=3, use_reasoning=True)

ToDo

Figure out a way to use strucured output (similar to OpenAI) for tasks definition, model output, and evaluation. See also: jsonformer, instructor, outlines
Training a simple Peft Model
Change to lazy loading of models
Discovering the possibility of deploying a jury for the annotation project
HP-Tuner for comparing different models
Parse JSON Output
Enable Model Offloading
Enable passing of bnb_config to the hf models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AI Annotator

Installation

How to Use

ToDo

Files

README.md

Latest commit

History

README.md

File metadata and controls

AI Annotator

Installation

How to Use

ToDo