What is SEMIKONG?

SEMIKONG - The Open Source Foundation Model for Semiconductor Manufacturing Process

🤗 Hugging Face Dataset • 🤖 Hugging Face Model

👩‍🚀 Ask questions or discuss ideas on GitHub

📝 Check out SEMIKONG Tech Report

📕 Table of Contents

What is SEMIKONG?
How to use SEMIKONG?
- Quick start
  - Choose your path
  - pip
  - docker
  - Web demo
- Fine-tuning
- Quantization
- Deployment
- FAQ
- Learning hub
Why SEMIKONG?
Who can use SEMIKONG?
Misc.

What is SEMIKONG?

Introduction

🤖 SEMIKONG is an open-source, industry-specific large language model (LLM) tailored to the semiconductor domain. It aims to address the unique challenges faced by the semiconductor industry, such as the physics and chemistry of semiconductor devices and processes, by incorporating domain-specific knowledge into the model.
🙌 Targeted as a bilingual language model and trained on 3T multilingual corpus, the SEMIKONG series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. For example,
- SEMIKONG-8B / 70B-Instruct model .
- SEMIKONG-8B / 70B model .
- 🙏 (Credits to Llama) Thanks to the Transformer and Llama open-source communities, as they reduce the efforts required to build from scratch and enable the utilization of the same tools within the AI ecosystem.

[ Back to top ⬆️ ]

News

[ Back to top ⬆️ ]

Key Features

First industry-specific LLM for the semiconductor domain
Trained on a comprehensive semiconductor-related text corpus
Novel pre-training approach leveraging domain-specific knowledge
Superior performance compared to general-purpose LLMs on industry-relevant benchmarks
Serves as a valuable foundation for companies to build proprietary models tailored to their needs

Models

SEMIKONG models come in multiple sizes and cater to different use cases. You can also fine-tune SEMIKONG models to meet your specific requirements.

If you want to deploy SEMIKONG models, make sure you meet the software and hardware requirements.

Instruct models

Model	Download
SEMIKONG-70B-Instruct	• 🤗 Hugging Face
SEMIKONG-8B-Instruct	• 🤗 Hugging Face

Base models

Model	Download
SEMIKONG-70B	• 🤗 Hugging Face
SEMIKONG-8B	• 🤗 Hugging Face

Model info

For chat and base models

Model	Intro	Default context window	Pretrained tokens
70B series models	A powerful version of SEMIKONG that suitable more complex task	48k	25T
8B series models	An economical version of SEMIKONG that able to perform general instruction and chat in semiconductor manufacturing process	48k	25T

For chat models
For chat model limitations, see the explanations below. ⬇️
- Hallucination: This refers to the model generating factually incorrect or nonsensical information. With the model's responses being more varied, there's a higher chance of hallucination that are not based on accurate data or logical reasoning.
- Non-determinism in re-generation: When attempting to regenerate or sample responses, inconsistencies in the outcomes may occur. The increased diversity can lead to varying results even under similar input conditions.
- Cumulative Error: This occurs when errors in the model's responses compound over time. As the model generates more diverse responses, the likelihood of small inaccuracies building up into larger errors increases, especially in complex tasks like extended reasoning, mathematical problem-solving, etc.
- To achieve more coherent and consistent responses, it is advisable to adjust generation configuration parameters such as temperature, top_p, or top_k. These adjustments can help in the balance between creativity and coherence in the model's outputs.

[ Back to top ⬆️ ]

How to use SEMIKONG?

Quick start
- Choose your path
- pip
- docker
- Web demo
Fine-tuning
Quantization
Deployment
FAQ
Learning hub

Quick start

Getting up and running with SEMIKONG models is simple with multiple choices available.

Choose your path

Select one of the following paths to begin your journey with SEMIKONG!

🎯 Deploy SEMIKONG locally

If you prefer to deploy SEMIKONG models locally,

🙋‍♀️ and you have sufficient resources (for example, NVIDIA A100 40GB), you can choose one of the following methods:
- pip
- Docker

🎯 Not to deploy SEMIKONG locally

If you prefer not to deploy SEMIKONG models locally, you can explore SEMIKONG's capabilities using any of the following options.

🙋‍♀️ Chat with SEMIKONG

If you want to chat with SEMIKONG, you can use one of these online services, which offer a similar user experience:

SEMIKONG-70B-Instruct (SEMIKONG official on Hugging Face)

[ Back to top ⬆️ ]

Quick start - pip

This tutorial guides you through every step of running SEMIKONG-8B-Instruct locally on an A100 (40G) and then performing inference.

Step 0: Prerequisites

Make sure Python 3.10 or a later version is installed.
If you want to run other SEMIKONG models, see software and hardware requirements.

Step 1: Prepare your environment

To set up the environment and install the required packages, execute the following command.

git clone https://github.com/aitomatic/semikong.git
cd semikong
pip install -r requirements.txt

Step 2: Download the SEMIKONG model

You can download the weights and tokenizer of SEMIKONG models from the following sources:

Hugging Face

Step 3: Perform inference

You can perform inference with SEMIKONG chat or base models as below.

Perform inference with SEMIKONG chat model

Create a file named quick_start.py and copy the following content to it.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = '<your-model-path>'

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)

# Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM.
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype='auto'
).eval()

# Prompt content: "hi"
messages = [
    {"role": "user", "content": "hi"}
]

input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
output_ids = model.generate(input_ids.to('cuda'))
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)

# Model response: "Hello! How can I assist you today?"
print(response)

Run quick_start.py.
```
python quick_start.py
```
Then you can see an output similar to the one below. 🥳
```
Hello! How can I assist you today?
```

Perform inference with SEMIKONG base model

SEMIKONG-8B

Input

from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_DIR = "pentagoniac/SEMIKONG-8B"
model = AutoModelForCausalLM.from_pretrained(MODEL_DIR, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, use_fast=False)

input_text = "what is semiconductor ?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Output

Semiconductor is a ....

[ Back to top ⬆️ ]

Quick start - Docker

TBA

[ Back to top ⬆️ ]

Web demo

You can build a web UI demo for SEMIKONG chat models (note that SEMIKONG base models are not supported in this senario).

Step 1: Prepare your environment.

Step 2: Download the SEMIKONG model.

Step 3. To start a web service locally, run the following command.

python demo/web_demo.py -c <your-model-path>

You can access the web UI by entering the address provided in the console into your browser.

[ Back to top ⬆️ ]

Fine-tuning

Finetune code for SEMIKONG 8B and 70B

Hardware Setup

For the SEMIKONG-8B model, a node with 1 GPUs, each with GPU memory larger than 16GB, is recommended.

For the SEMIKONG-70B model, because the usage of the zero-offload technique consumes a lot of CPU memory, please be careful to limit the number of GPUs in the 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the number of GPUs (as shown in scripts/run_sft_Yi_34b.sh).

A typical hardware setup for finetuning the 70B model is a node with 8 GPUs (limited to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each with GPU memory larger than 80GB, and total CPU memory larger than 900GB.

Quick Start

Deployment

If you want to deploy SEMIKONG models, make sure you meet the software and hardware requirements.

Software requirements

Before using SEMIKONG quantized models, make sure you've installed the correct software listed below.

Model	Software
SEMIKONG 4-bit quantized models	AWQ and CUDA
SEMIKONG 8-bit quantized models	GPTQ and CUDA

Hardware requirements

Before deploying SEMIKONG in your environment, make sure your hardware meets the following requirements.

Instruction models

Model	Minimum VRAM	Recommended GPU Example
SEMIKONG-70B-Instruct	170 GB	3 x A100 80GB 5 x A100 40GB
SEMIKONG-8B-Instruct	16 GB	1 x RTX 3060 (12 GB) 1 x RTX 4060 (8 GB)

Base models

Model	Minimum VRAM	Recommended GPU Example
SEMIKONG-8B	15 GB	1 x RTX 3090 (24 GB) 1 x RTX 4090 (24 GB) 1 x A10 (24 GB) 1 x A30 (24 GB)
SEMIKONG-70B	200 GB	4 x A800 (80 GB)

[ Back to top ⬆️ ]

Why SEMIKONG?

Ecosystem
- Upstream
- Downstream
  - Serving
  - Quantization
  - Fine-tuning
  - API
Benchmarks
- Chat model performance
- Base model performance
  - SEMIKONG-34B and SEMIKONG-34B-200K
  - SEMIKONG-9B

Ecosystem

SEMIKONG has a comprehensive ecosystem, offering a range of tools, services, and models to enrich your experiences and maximize productivity.

Upstream
Downstream
- Serving
- Quantization
- Fine-tuning
- API

Upstream

The SEMIKONG series models follow the same model architecture as Llama. By choosing SEMIKONG, you can leverage existing tools, libraries, and resources within the Llama ecosystem, eliminating the need to create new tools and enhancing development efficiency.

For example, the SEMIKONG series models are saved in the format of the Llama model. You can directly use LlamaForCausalLM and LlamaTokenizer to load the model. For more information, see Use the chat model.

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("pentagoniac/SEMIKONG-8B-Instruct", use_fast=False)

model = AutoModelForCausalLM.from_pretrained("pentagoniac/SEMIKONG-8B-Instruct", device_map="auto")

[ Back to top ⬆️ ]

Downstream

💡 Tip

Feel free to create a PR and share the fantastic work you've built using the SEMIKONG series models.

To help others quickly understand your work, it is recommended to use the format of <model-name>: <model-intro> + <model-highlights>.

Serving

If you want to get up with SEMIKONG in a few minutes, you can use the following services built upon SEMIKONG.

SEMIKONG-70B-Instruct: you can chat with SEMIKONG using one of the following platforms:
- SEMIKONG-70B-Instruct | Hugging Face
- SEMIKONG-70B-Instruct | SEMIKONG Platform:

[ Back to top ⬆️ ]

Tech report

For detailed capabilities of the SEMIKONG series model, see SEMIKONG: Technical Report.

Citation

@article{semikong2024,
  title={SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model},
  author={Christopher Nguyen et al.},
  journal={arXiv preprint arXiv:2024.xxxxx},
  year={2024}
}

Chat model performance

SEMIKONG-70B-Chat model demonstrates exceptional performance, ranking first among all existing open-source models in the benchmarks including MMLU, CMMLU, BBH, GSM8k, and more.

Evaluation methods and challenges. ⬇️

Base model performance

SEMIKONG-9B

[ Back to top ⬆️ ]

Who can use SEMIKONG?

Everyone! 🙌 ✅

The code and weights of the SEMIKONG series models are distributed under the Apache 2.0 license, which means the SEMIKONG series models are free for personal usage, academic purposes, and commercial use.

[ Back to top ⬆️ ]

Misc.

Contributions

This project is the result of a collaborative effort involving multiple companies and individuals:

Tokyo Electron: Atsushi Suzuki, Daisuke Oku
FPT Software AIC: Huy Vo, Thang Nguyen, Lan Nguyen
Aitomatic: Daniel Guttierez, Vinh Luong, Christopher Nguyen.
AI Alliance members and researchers

We would like to express our gratitude to the AI Alliance (https://thealliance.ai) for providing the impetus, resources, and platform for this work, and for collaboration in open science. We also extend our thanks to the member organizations of the AI Alliance, their researchers and engineers for their valuable contributions to this study, including:

Noritaka Yokomori (Tokyo Electron)
Anthony Annunziata (IBM Research)
Sean Hughes (ServiceNow)
Phong Nguyen (FPT Software, AI Center)

Their expertise, insights, and collaborative spirit have been instrumental in advancing our research.

[ Back to top ⬆️ ]

Disclaimer

We use data compliance checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model will generate correct, and reasonable output in all scenarios. Please be aware that there is still a risk of the model producing problematic outputs. We will not be responsible for any risks and issues resulting from misuse, misguidance, illegal usage, and related misinformation, as well as any associated data security concerns.

[ Back to top ⬆️ ]

License

The code and weights of the SEMIKONG series models are distributed under the Apache 2.0 license.

If you create derivative works based on this model, please include the following attribution in your derivative works:

This work is a derivative of [The SEMIKONG Series Model You Base On] by AI Alliance, used under the Apache 2.0 License.

[ Back to top ⬆️ ]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
inference_vllm_openai.py		inference_vllm_openai.py
inference_vllm_server.py		inference_vllm_server.py
logging_setup.py		logging_setup.py
raw_inference.py		raw_inference.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
training.py		training.py

License

aitomatic/semikong

Folders and files

Latest commit

History

Repository files navigation

SEMIKONG - The Open Source Foundation Model for Semiconductor Manufacturing Process

What is SEMIKONG?

Introduction

News

Key Features

Models

Instruct models

Base models

Model info

How to use SEMIKONG?

Quick start

Choose your path

🎯 Deploy SEMIKONG locally

🎯 Not to deploy SEMIKONG locally

🙋‍♀️ Chat with SEMIKONG

Quick start - pip

Step 0: Prerequisites

Step 1: Prepare your environment

Step 2: Download the SEMIKONG model

Step 3: Perform inference

Perform inference with SEMIKONG chat model

Perform inference with SEMIKONG base model

Quick start - Docker

Web demo

Fine-tuning

Finetune code for SEMIKONG 8B and 70B

Hardware Setup

Quick Start

Deployment

Software requirements

Hardware requirements

Instruction models

Base models

Why SEMIKONG?

Ecosystem

Upstream

Downstream

Serving

Tech report

Citation

Benchmarks

Chat model performance

Base model performance

SEMIKONG-9B

Who can use SEMIKONG?

Misc.

Contributions

Disclaimer

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages