Replit Code Instruct inference using CPU

Run inference on the replit code instruct model using your CPU. This inference code uses a ggml quantized model. To run the model we'll use a library called ctransformers that has bindings to ggml in python.

Demo:

2023-06-27.14-46-07.mp4

Requirements

Using docker should make all of this easier for you. Minimum specs, system with 8GB of ram. Recommend to use python 3.10.

Tested working on

Will post some numbers for these two later.

AMD Epyc 7003 series CPU
AMD Ryzen 5950x CPU
Mac M1

Setup

First create a venv.

python -m venv env && source env/bin/activate

Next install the submodule with ctransformers patch.

git submodule update --init --recursive

Next install dependencies.

pip install -r requirements.txt

Next download the quantized model weights (about 1.5GB).

python download_model.py

Ready to rock, run inference.

python inference.py

Next modify inference script prompt and generation parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_model.py		download_model.py
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replit Code Instruct inference using CPU

Requirements

Tested working on

Setup

About

Releases

Packages

Languages

License

pixilated730/Replit_LLM

Folders and files

Latest commit

History

Repository files navigation

Replit Code Instruct inference using CPU

Requirements

Tested working on

Setup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages