litgpt/config_hub/finetune at main · Lightning-AI/litgpt

History

Name		Name	Last commit message	Last commit date
parent directory ..
falcon-7b		falcon-7b
gemma-2b		gemma-2b
gemma-7b		gemma-7b
gemma2-2b		gemma2-2b
gemma2-9b		gemma2-9b
llama-2-7b		llama-2-7b
llama-3-8b		llama-3-8b
llama-3.1-8b		llama-3.1-8b
llama-3.2-1B		llama-3.2-1B
llama-3.2-3B		llama-3.2-3B
mistral-7b-v0.2		mistral-7b-v0.2
mistral-7b		mistral-7b
phi-2		phi-2
phi-3		phi-3
stablelm-base-alpha-3b		stablelm-base-alpha-3b
tiny-llama		tiny-llama
README.md		README.md

README.md

Config files

The table below lists the performances you can expect from the provided config files. Note that you can achieve lower memory consumption by lowering the micro batch size as needed. In addition, you can lower the rank (lora_r) in the LoRA configuration files and disable LoRA for certain layers (for example, setting lora_projection and other LoRA layer-specific parameters to false). For more information, see the Dealing with out-of-memory (OOM) errors on lowering the memory requirements. The "Cost" column refers to the on-demand compute cost on Lightning AI Studios where these benchmarks were executed. All experiments were conducted using bfloat-16 precision on the Alpaca2k dataset. The "Multitask score" refers to MMLU.

Config	Model	Epochs	Max seq length	Micro batch size	Machine	Training runtime	Cost	Peak memory	Validation loss	Validation perplexity	Multitask score (MMLU)
falcon-7b/lora.yaml	falcon-7b	4	512	1	1xA10G	24.84 min	$0.7	16.69 GB	0.945	2.573	26.2%
falcon-7b/lora.yaml	falcon-7b	4	512	1	4xA10G	24.94 min	$2.0	16.69 GB	0.945	2.573	26.4%
falcon-7b/qlora.yaml	falcon-7b	4	512	1	1xA10G	50.85 min	$1.5	9.44 GB	0.993	2.699	26.3%

gemma-2b/full.yaml	gemma-2b	1	512	1	4xA10G	14.06 min	$1.1	17.43 GB	1.021	2.777	32.4%
gemma-2b/lora.yaml	gemma-2b	2	512	2	1xA10G	9.41 min	$0.3	12.62 GB	0.981	2.666	34.4%
gemma-2b/lora.yaml	gemma-2b	2	512	2	4xA10G	9.41 min	$0.8	12.62 GB	0.981	2.667	34.0%
gemma-2b/qlora.yaml	gemma-2b	2	512	2	1xA10G	12.91 min	$0.4	11.58 GB	1.085	2.959	36.4%

gemma-7b/lora.yaml	gemma-7b	2	512	1	1xA10G	OOM	OOM	OOM	OOM	OOM
gemma-7b/lora.yaml	gemma-7b	2	512	1	4xA10G	OOM	OOM	OOM	OOM	OOM
gemma-7b/qlora.yaml	gemma-7b	2	512	1	1xA10G	43.58 min	$1.3	17.18 GB	0.973	2.646	62.45%

gemma2-2b/lora.yaml	gemma-2b	2	512	2	1xA10G	11.96 min	$0.4	14.31 GB	0.951	2.589	23.84%
gemma2b/qlora.yaml	gemma-2b	2	512	2	1xA10G	16.06 min	$0.5	13.52 GB	0.983	2.673	24.12%

gemma2-9b/lora.yaml	gemma-2-9b	2	512	1	1xA10G	OOM	OOM	OOM	OOM	OOM
gemma2-9b/lora.yaml	gemma-2-9b	2	512	1	4xA10G	OOM	OOM	OOM	OOM	OOM
gemma2-9b/qlora.yaml	gemma-2-9b	2	512	1	1xA10G	50.01 min	$4.0	20.92 GB	0.852	2.345	24.2%

llama-2-7b/full.yaml	llama-2-7b	1	512	4	4xA10G	OOM	OOM	OOM	OOM	OOM
llama-2-7b/lora.yaml	llama-2-7b	4	512	2	1xA10G	32.82 min	$1.0	19.77 GB	0.802	2.230	40.3%
llama-2-7b/lora.yaml	llama-2-7b	4	512	2	4xA10G	32.83 min	$2.6	19.77 GB	0.802	2.229	40.2%
llama-2-7b/qlora.yaml	llama-2-7b	4	512	2	1xA10G	45.67 min	$1.4	13.68 GB	0.814	2.258	38.6%

llama-3-8b/full.yaml	llama-3-8b	1	512	4	4xA10G	OOM	OOM	OOM	OOM	OOM
llama-3-8b/lora.yaml	llama-3-8b	2	512	1	1xA10G	14.79 min	$0.4	19.73 GB	0.888	2.431	62.4%
llama-3-8b/lora.yaml	llama-3-8b	2	512	1	4xA10G	14.88 min	$1.2	19.73 GB	0.889	2.432	62.5%
llama-3-8b/qlora.yaml	llama-3-8b	2	512	2	1xA10G	22.24 min	$0.7	17.41 GB	0.939	2.558	62.2%

llama-3.1-8b/full.yaml	llama-3.1-8b	1	512	4	1xA10G	OOM	OOM	OOM	OOM	OOM	OOM
llama-3.1-8b/lora.yaml	llama-3.1-8b	2	512	1	1xA10G	13.36 min	$1.1	19.73 GB	0.878	2.406	xx.xx
llama-3.1-8b/qlora.yaml	llama-3.1-8b	2	512	2	1xA10G	21.81 min	$0.7	17.41 GB	0.928	2.529	xx.xx

llama-3.2-1b/full.yaml	llama-3.2-1b	1	512	4	1xA10G	2.01 min	$0.1	8.70 GB	1.442	4.229	38.21%
llama-3.2-1b/lora.yaml	llama-3.2-1b	2	512	1	1xA10G	4.17 min	$0.4	4.49 GB	1.114	3.046	36.87%
llama-3.2-1b/qlora.yaml	llama-3.2-1b	2	512	2	1xA10G	6.20 min	$0.6	5.53 GB	1.201	3.322	36.49%

llama-3.2-3b/full.yaml	llama-3.2-3b	1	512	4	1xA10G	4.71 min	$0.4	16.51 GB	1.255	3.509	54.69%
llama-3.2-3b/lora.yaml	llama-3.2-3b	2	512	1	1xA10G	8.31 min	$0.8	9.67 GB	0.973	2.647	54.77%
llama-3.2-3b/qlora.yaml	llama-3.2-3b	2	512	2	1xA10G	14.89 min	$1.4	10.30 GB	1.031	2.804	55.08%

mistral-7b-v0.2/lora.yaml	mistral-7b-v0.2	4	512	2	1xA10G	31.00 min	$0.9	20.66 GB	0.801	2.228	55.7%
mistral-7b-v0.2/lora.yaml	mistral-7b-v0.2	4	512	2	4xA10G	31.00 min	$2.5	20.66 GB	0.802	2.229	55.5%
mistral-7b-v0.2/qlora.yaml	mistral-7b-v0.2	4	512	2	1xA10G	44.75 min	$1.3	14.29 GB	0.813	2.255	56.5%

mistral-7b/lora.yaml	mistral-7b	4	512	2	1xA10G	31.01 min	$0.9	20.66 GB	0.794	2.211	57.9%
mistral-7b/lora.yaml	mistral-7b	4	512	2	4xA10G	31.03 min	$2.5	20.66 GB	0.796	2.218	57.9%
mistral-7b/qlora.yaml	mistral-7b	4	512	2	1xA10G	44.75 min	$1.3	14.29 GB	0.803	2.231	57.9%

phi-2/full.yaml	phi-2	1	512	4	4xA10G	11.87 min	$1.0	14.44 GB	1.305	3.688	38.4%
phi-2/lora.yaml	phi-2	1	512	4	1xA10G	3.78 min	$0.1	13.98 GB	0.819	2.269	53.0%
phi-2/lora.yaml	phi-2	1	512	4	4xA10G	3.78 min	$0.3	13.98 GB	0.820	2.271	52.4%
phi-2/qlora.yaml	phi-2	1	512	4	1xA10G	4.51 min	$0.1	14.27 GB	0.837	2.310	52.3%

phi-3/full.yaml	Phi-3-mini-4k-instruct	1	512	4	1xA10G	6.93 min	$0.2	17.01 GB	0.714	2.043	69.81%
phi-3/lora.yaml	Phi-3-mini-4k-instruct	1	512	4	1xA10G	6.46 min	$0.2	19.75 GB	0.707	2.028	69.70%
phi-3/qlora.yaml	Phi-3-mini-4k-instruct	1	512	4	1xA10G	7.47 min	$0.2	19.13 GB	0.729	2.074	68.96%

stablelm-base-alpha-3b/full.yaml	stablelm-base-alpha-3b	1	512	1	4xA10G	70.13 min	$5.6	21.23 GB	1.513	4.540	23.2%
stablelm-base-alpha-3b/lora.yaml	stablelm-base-alpha-3b	4	512	1	1xA10G	13.07 min	$0.4	8.58 GB	1.361	3.900	25.9%
stablelm-base-alpha-3b/lora.yaml	stablelm-base-alpha-3b	4	512	1	4xA10G	13.16 min	$1.1	8.58 GB	1.362	3.906	25.9%
stablelm-base-alpha-3b/qlora.yaml	stablelm-base-alpha-3b	4	512	1	1xA10G	25.86 min	$0.8	5.24 GB	1.388	4.009	26.1%

tiny-llama/full.yaml	tiny-llama	1	512	4	1xA10G	2.58 min	$0.1	14.10 GB	1.088	2.968	24.6%
tiny-llama/full.yaml	tiny-llama	1	512	4	4xA10G	2.57 min	$0.2	14.10 GB	1.088	2.968	24.5%
tiny-llama/lora.yaml	tiny-llama	3	512	8	1xA10G	8.09 min	$0.2	13.50 GB	1.039	2.826	25.5%
tiny-llama/qlora.yaml	tiny-llama	3	512	8	1xA10G	8.70 min	$0.3	16.24 GB	1.056	2.874	25.3%

*OOM = Out of memory

Extending the context length

If you require a longer sequence length than the one used in a given config file, you can either edit the max_seq_length in the config file or pass an additional argument when running the finetuning command, for example, --max_seq_length 4096 to override the sequence length provided in the config file.

Training on GPUs without bfloat16 support

If you are training on GPUs without bfloat-16 support, you need to change the precision option to 16-true (16-bit floating point precision) or 16-mixed (16/32-bit mixed precision) training:

litgpt finetune lora \
  --config config_hub/finetune/phi-2/lora.yaml \
  --precision 16-true

or

litgpt finetune lora \
  --config config_hub/finetune/phi-2/lora.yaml \
  --precision 16-mixed

Note that 16-true is more compute and memory-efficient, but it can sometimes lead to training convergence issues. In this case, it's recommended to use 16-mixed.

Multi-GPU experiments

All runs are single-GPU experiments, use --devices 4 to utilize more than one GPU:

litgpt finetune lora \
  --config config_hub/finetune/phi-2/lora.yaml \
  --devices 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finetune

finetune

README.md

Config files

Extending the context length

Training on GPUs without bfloat16 support

Multi-GPU experiments

Files

finetune

Directory actions

More options

Directory actions

More options

Latest commit

History

finetune

Folders and files

parent directory

README.md

Config files

Extending the context length

Training on GPUs without bfloat16 support

Multi-GPU experiments