Skip to content

Latest commit

 

History

History
103 lines (78 loc) · 4.35 KB

README.md

File metadata and controls

103 lines (78 loc) · 4.35 KB

AI Annotator

AI Annotator is primarily designed for prototyping and testing annotation or classification tasks using Large Language Models (LLMs). While it doesn't offer extensive functionality or customizability, it provides a streamlined solution for quick experimentation. Additionally, it serves as a wrapper for a Vector Database (currently ChromaDB), making it adaptable for any task that leverages retrieval-augmented generation (RAG).

It supports a range of models, including both LLMs and embedding models. This includes locally run solutions such as Hugging Face and Ollama, with the flexibility to easily integrate custom models. For a more lightweight setup, API-based options like OpenAI and Mistral are supported, offering a simpler way to get started without the need for local deployment. Standardized task/instruction formats and automated parsing of model outputs are also included.

Installation

  1. Clone the Repository
    Clone the repository to your local machine:

    git clone https://github.com/nsschw/ai_annotator.git
  2. Installing the package
    Install the package using pip:

    pip install -e ai_annontator

    or if you want to use locally hosted models (Hugging Face and Transformers):

    pip install -e ai_annontator[local]

How to Use

  1. Import Necessary Modules
    Import the relevant classes and functions from ai_annotator or other libraries:

    from ai_annotator import AnnotationProject, OllamaModel, HuggingFaceEmbeddingModel, AnnotationConfig
  2. Define Your Task
    Create a task description to define what you're annotating or classifying. For example:

    task = """
    You will be given an abstract of a study. Your task is to determine whether the study is valid based on the following criteria:
    1. The study must be a meta-analysis.
    2. The study must examine the association between life satisfaction, well-being, or subjective well-being and any other variable.
    
    Structure your feedback as follows:
    
    Feedback::
    Evaluation (Your reasoning whether this is a valid article or not)
    Valid: (1 if not valid, 1 if valid)
    """
  3. Configure Models
    Set up the LLM and embedding models. This example shows how to use both Ollama and Hugging Face models:

    model = OllamaModel(host="http://ollama:11434", model="llama3.1:7b")
    emb_model = HuggingFaceEmbeddingModel("Alibaba-NLP/gte-Qwen2-1.5B-instruct")
  4. Set Project Configuration
    Define the configuration for the annotation project using the AnnotationConfig class. Specify the data path, task description, and models to use:

    project_config = AnnotationConfig(
        db_path="SecondOrderMetaStudy",
        task_description=task,
        embedding_model=emb_model,
        model=model
    )
  5. Create Annotation Project
    Initialize the AnnotationProject with your configuration and add data from a CSV file:

    ap = AnnotationProject(config=project_config)
    ap.add_data_from_csv("abtracts.csv", column_mapping={"input": "notes_abstract", "output": "valid_abstract"})
  6. Generate Reasoning
    Use a reasoning prompt to generate reasoning for each data point:

    ap.generate_reasoning(reasoning_prompt="What are the clues that lead to: [{output}] being correct in the document: [{input}] with the task being: [{task_description}].")
  7. Run Predictions
    Finally, run predictions on the test dataset:

    test_cases = ap.predict(["Test_Case_1", "Test_Case_2"...], number_demonstrations=3, use_reasoning=True)

ToDo

  • Figure out a way to use strucured output (similar to OpenAI) for tasks definition, model output, and evaluation. See also: jsonformer, instructor, outlines

  • Training a simple Peft Model

  • Change to lazy loading of models

  • Discovering the possibility of deploying a jury for the annotation project

  • HP-Tuner for comparing different models

  • Parse JSON Output

  • Enable Model Offloading

  • Enable passing of bnb_config to the hf models