This repository contains the codebase for the LLM-AGI Evaluation Platform. This platform is designed to generate, summarize, and evaluate results from various language learning models such as GPT3.5, GPT4, and Baby AGI. The evaluations are performed by human experts via a web interface.
Clone the repo:
git clone https://github.com/KatherLab/llm-agent.git && cd llm-agent
Follow the instructions in the setup_env.md file to set up the environment.
If you are a project maintainer:
Create and checkout to your own dev branch:
git checkout <dev_branchname>
If you are a human expert:
Checkout to the working branch:
git checkout <exp_branchname>
If you are a project maintainer, go through all the steps below.
If you are a human expert, you only need to follow steps 4 and 5.
-
Put the openai api key in the
generator/API_KEY
file. Make sure not to push this file remotely. This file has already been added to gitignore. -
Run
generate.py
to generate results from the language learning models:
python generator/generate.py
The generated text files can be found in the generator/results
directory.
- Generate summarized markdown files:
python generator/summarize.py
- Load the summaries into the local database:
python webui/gpt4_summaries_db.py
- Run the web application:
python webui/app.py
Then, access the application in your favourite browser by visiting http://127.0.0.1:5000
.
- Commit and push updates to the database:
git add webui/scores.db
git commit -m "Update database"
git push
Thanks everyone for your contributions!
Results from the human experts will be stored in the visualization directory.
Check analysis_summary.md for visualizations
The web-based evaluation application allows expert users to provide ratings. This platform will expedite the procedure of collating results generated by llm-agi and evaluations from ChatGPT-4 into a comprehensive summary database. The scoring system within this web interface will construct a score database, which will have each summary ID interconnected with the main summary database. This system will enable dynamic updates when human experts submit their scores via the web interface.