Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

202306_OneFlow_libai_Report_On_master #85

Open
github-actions bot opened this issue Jun 5, 2023 · 5 comments
Open

202306_OneFlow_libai_Report_On_master #85

github-actions bot opened this issue Jun 5, 2023 · 5 comments

Comments

@github-actions
Copy link

github-actions bot commented Jun 5, 2023

202306_OneFlow_libai_Report_On_master

@github-actions
Copy link
Author

github-actions bot commented Jun 5, 2023

NVIDIA GeForce RTX 3090 oneflow@4ac3692d + libai@5dcbc0e36b983616b3124443b8e078e996e00080
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs16_gbs128_acc8_1n1g 12137 MiB / 15.38 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs32_gbs32_acc1_1n1g 12321 MiB / 19.32 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp8_pp1_zerofalse_stage0_mbs32_gbs256_acc8_1n8g 9260 MiB / 80.64 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp1_pp4_zerotrue_stage2_mbs32_gbs512_acc8_1n8g 12106 MiB / 666.24 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs32_gbs512_acc8_1n8g 10740 MiB / 233.44 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs32_gbs128_acc1_1n4g 10538 MiB / 182.08 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs32_gbs1024_acc8_1n8g 12218 MiB / 428.08 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs32_gbs128_acc1_1n8g 9664 MiB / 272.32 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs32_gbs2048_acc8_1n8g 8172 MiB / 922.64 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs32_gbs256_acc1_1n8g 6832 MiB / 467.52 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp4_zerofalse_stage0_mbs16_gbs128_acc8_1n4g 12866 MiB / 95.4 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp8_zerofalse_stage0_mbs24_gbs384_acc16_1n8g 9976 MiB / 299.84 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp4_pp1_zerofalse_stage0_mbs32_gbs256_acc8_1n4g 13129 MiB / 18.96 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs8_gbs64_acc8_1n1g 16284 MiB / 6.27 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs8_gbs8_acc1_1n1g 14316 MiB / 6.38 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp4_zerofalse_stage0_mbs12_gbs96_acc8_1n4g 13230 MiB / 78.2 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp8_pp1_zerofalse_stage0_mbs8_gbs64_acc8_1n8g 8374 MiB / 29.84 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp1_pp4_zerotrue_stage2_mbs8_gbs128_acc8_1n8g 11716 MiB / 266.56 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs8_gbs128_acc8_1n8g 9962 MiB / 98.24 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs8_gbs256_acc8_1n4g 11420 MiB / 84.92 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs8_gbs32_acc1_1n4g 9932 MiB / 63.56 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs8_gbs256_acc8_1n8g 7936 MiB / 143.92 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs8_gbs32_acc1_1n8g 8264 MiB / 110.08 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs8_gbs512_acc8_1n8g 7850 MiB / 285.52 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs8_gbs64_acc1_1n8g 9496 MiB / 167.36 samples/s
libai_gpt2_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp8_zerofalse_stage0_mbs6_gbs96_acc16_1n8g 11592 MiB / 158.16 samples/s
libai_gpt2_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp4_pp1_zerofalse_stage0_mbs8_gbs64_acc8_1n4g 0 MiB / 0 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp1_zerotrue_stage2_mbs128_gbs1024_acc8_1n1g 11238 MiB / 227.4 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp1_zerotrue_stage2_mbs256_gbs256_acc1_1n1g 11462 MiB / 338.49 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp4_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 17760 MiB / 773.92 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp4_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 10908 MiB / 613.8 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp2_pp2_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 14862 MiB / 754.16 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp2_pp2_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 10404 MiB / 603.4 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp4_pp1_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 10812 MiB / 427.56 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp4_pp1_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 10256 MiB / 623.92 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp1_pp2_zerotrue_stage2_mbs128_gbs256_acc1_1n4g 8696 MiB / 1590.32 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp1_pp2_zerotrue_stage2_mbs64_gbs1024_acc8_1n4g 11356 MiB / 1734.72 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp1_zerotrue_stage2_mbs128_gbs256_acc1_1n4g 8512 MiB / 914.72 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp1_zerotrue_stage2_mbs64_gbs1024_acc8_1n4g 9070 MiB / 948.36 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs128_gbs2048_acc8_1n8g 11690 MiB / 1710.24 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs256_gbs512_acc1_1n8g 14156 MiB / 1744.56 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs32_gbs1024_acc8_1n4g 7830 MiB / 2676.24 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs64_gbs256_acc1_1n4g 7636 MiB / 1701.56 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp2_zerotrue_stage2_mbs128_gbs512_acc1_1n8g 4531 MiB / 4367.76 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp2_zerotrue_stage2_mbs64_gbs2048_acc8_1n8g 14358 MiB / 3461.28 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs128_gbs512_acc1_1n8g 12216 MiB / 3356.0 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs64_gbs2048_acc8_1n8g 14308 MiB / 3196.96 samples/s

@Tendo33
Copy link

Tendo33 commented Jun 5, 2023

  1. 吞吐量为总吞吐,乘了卡数
  2. 显存不准,28号机周天有 zhouhongjun 的任务

@github-actions
Copy link
Author

github-actions bot commented Jun 18, 2023

NVIDIA GeForce RTX 3090 oneflow@f72ebf6 + libai@5dcbc0e36b983616b3124443b8e078e996e00080
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs16_gbs128_acc8_1n1g 6863 MiB / 31.39 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs32_gbs32_acc1_1n1g 7047 MiB / 32.51 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp8_pp1_zerofalse_stage0_mbs32_gbs256_acc8_1n8g 2994 MiB / 15.37 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp1_pp4_zerotrue_stage2_mbs32_gbs512_acc8_1n8g 6928 MiB / 172.04 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs32_gbs512_acc8_1n8g 5560 MiB / 58.37 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs32_gbs128_acc1_1n4g 4272 MiB / 97.44 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs32_gbs1024_acc8_1n8g 5254 MiB / 71.56 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs32_gbs128_acc1_1n8g 4486 MiB / 64.57 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs32_gbs2048_acc8_1n8g 5176 MiB / 225.59 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs32_gbs256_acc1_1n8g 3836 MiB / 185.76 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp4_zerofalse_stage0_mbs16_gbs128_acc8_1n4g 6600 MiB / 47.55 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp8_zerofalse_stage0_mbs24_gbs384_acc16_1n8g 6980 MiB / 71.35 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp4_pp1_zerofalse_stage0_mbs32_gbs256_acc8_1n4g 6264 MiB / 8.35 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs8_gbs64_acc8_1n1g 8143 MiB / 13.13 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs8_gbs8_acc1_1n1g 7159 MiB / 12.82 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp4_zerofalse_stage0_mbs12_gbs96_acc8_1n4g 7520 MiB / 39.76 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp8_pp1_zerofalse_stage0_mbs8_gbs64_acc8_1n8g 2664 MiB / 7.55 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp1_pp4_zerotrue_stage2_mbs8_gbs128_acc8_1n8g 6530 MiB / 67.35 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs8_gbs128_acc8_1n8g 4778 MiB / 25.81 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs8_gbs256_acc8_1n4g 5710 MiB / 42.57 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs8_gbs32_acc1_1n4g 4222 MiB / 32.92 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs8_gbs256_acc8_1n8g 5274 MiB / 32.19 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs8_gbs32_acc1_1n8g 4478 MiB / 27.3 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs8_gbs512_acc8_1n8g 5186 MiB / 82.77 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs8_gbs64_acc1_1n8g 3786 MiB / 63.35 samples/s
libai_gpt2_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp8_zerofalse_stage0_mbs6_gbs96_acc16_1n8g 5882 MiB / 32.63 samples/s
libai_gpt2_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp4_pp1_zerofalse_stage0_mbs8_gbs64_acc8_1n4g 5710 MiB / 4.04 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp1_zerotrue_stage2_mbs128_gbs1024_acc8_1n1g 5998 MiB / 328.21 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp1_zerotrue_stage2_mbs256_gbs256_acc1_1n1g 5460 MiB / 340.34 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp4_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 10830 MiB / 267.14 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp4_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 5338 MiB / 289.5 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp2_pp2_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 9298 MiB / 268.22 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp2_pp2_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 4832 MiB / 249.16 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp4_pp1_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 5574 MiB / 191.08 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp4_pp1_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 4256 MiB / 226.35 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp1_pp2_zerotrue_stage2_mbs128_gbs256_acc1_1n4g 3124 MiB / 576.92 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp1_pp2_zerotrue_stage2_mbs64_gbs1024_acc8_1n4g 5354 MiB / 556.5 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp1_zerotrue_stage2_mbs128_gbs256_acc1_1n4g 2940 MiB / 488.72 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp1_zerotrue_stage2_mbs64_gbs1024_acc8_1n4g 3508 MiB / 407.77 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs128_gbs2048_acc8_1n8g 9338 MiB / 314.47 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs256_gbs512_acc1_1n8g 4858 MiB / 324.05 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs32_gbs1024_acc8_1n4g 2356 MiB / 972.04 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs64_gbs256_acc1_1n4g 2066 MiB / 957.93 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp2_zerotrue_stage2_mbs128_gbs512_acc1_1n8g 3046 MiB / 707.28 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp2_zerotrue_stage2_mbs64_gbs2048_acc8_1n8g 5058 MiB / 606.97 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs128_gbs512_acc1_1n8g 21359 MiB / 626.34 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs64_gbs2048_acc8_1n8g 3480 MiB / 582.67 samples/s

@Tendo33
Copy link

Tendo33 commented Jun 29, 2023

NVIDIA GeForce RTX 3090 oneflow@fb423fa + libai@60404d090c97902b04373fdf1f273bd2774db515
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs16_gbs128_acc8_1n1g 6863 MiB / 31.33 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs32_gbs32_acc1_1n1g 7047 MiB / 32.49 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp8_pp1_zerofalse_stage0_mbs32_gbs256_acc8_1n8g 2994 MiB / 15.45 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp1_pp4_zerotrue_stage2_mbs32_gbs512_acc8_1n8g 6928 MiB / 171.78 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs32_gbs512_acc8_1n8g 5560 MiB / 57.92samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs32_gbs128_acc1_1n4g 4272 MiB / 97.43 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs32_gbs1024_acc8_1n8g 5254 MiB / 71.67 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs32_gbs128_acc1_1n8g 4486 MiB / 64.58 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs32_gbs2048_acc8_1n8g 5176 MiB / 225.73 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs32_gbs256_acc1_1n8g 3836 MiB / 186.02 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp4_zerofalse_stage0_mbs16_gbs128_acc8_1n4g 6600 MiB / 47.53 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp8_zerofalse_stage0_mbs24_gbs384_acc16_1n8g 6980 MiB / 70.84 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp4_pp1_zerofalse_stage0_mbs32_gbs256_acc8_1n4g 6264 MiB / 8.33 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp8_pp1_zerofalse_stage0_mbs8_gbs64_acc8_1n8g 2664 MiB / 7.56 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp1_pp4_zerotrue_stage2_mbs8_gbs128_acc8_1n8g 6530 MiB / 67.26 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs8_gbs128_acc8_1n8g 4778 MiB / 25.79 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs8_gbs256_acc8_1n8g 5274 MiB / 32.2 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs8_gbs32_acc1_1n8g 4478 MiB / 27.34 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs8_gbs512_acc8_1n8g 5186 MiB / 82.8 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs8_gbs64_acc1_1n8g 3786 MiB / 63.22 samples/s
libai_gpt2_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp8_zerofalse_stage0_mbs6_gbs96_acc16_1n8g 5882 MiB / 32.62 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp1_zerotrue_stage2_mbs128_gbs1024_acc8_1n1g 5998 MiB / 333.52 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp1_zerotrue_stage2_mbs256_gbs256_acc1_1n1g 5460 MiB / 344.1 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp4_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 10830 MiB / 251.93 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp4_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 5338 MiB / 253.93 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp2_pp2_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 9298 MiB / 276.79 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp2_pp2_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 4832 MiB / 256.39 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp4_pp1_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 5574 MiB / 194.23 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp4_pp1_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 4256 MiB / 226.01 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp1_pp2_zerotrue_stage2_mbs128_gbs256_acc1_1n4g 3124 MiB / 584.21 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp1_pp2_zerotrue_stage2_mbs64_gbs1024_acc8_1n4g 5354 MiB / 559.01 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp1_zerotrue_stage2_mbs128_gbs256_acc1_1n4g 2940 MiB / 486.14 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp1_zerotrue_stage2_mbs64_gbs1024_acc8_1n4g 3508 MiB / 418.05 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs128_gbs2048_acc8_1n8g 9338 MiB / 299.56 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs256_gbs512_acc1_1n8g 4858 MiB / 338.16 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs32_gbs1024_acc8_1n4g 2356 MiB / 992.9 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs64_gbs256_acc1_1n4g 2066 MiB / 1012.73 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp2_zerotrue_stage2_mbs128_gbs512_acc1_1n8g 3046 MiB / 705.76 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp2_zerotrue_stage2_mbs64_gbs2048_acc8_1n8g 5058 MiB / 663.3 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs128_gbs512_acc1_1n8g 2918 MiB / 618.24 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs64_gbs2048_acc8_1n8g 3480 MiB / 589.07 samples/s

@Oneflow-Inc Oneflow-Inc deleted a comment from github-actions bot Jul 1, 2023
@Tendo33
Copy link

Tendo33 commented Jul 6, 2023

NVIDIA GeForce RTX 3090 oneflow@ae52678 + libai@60404d090c97902b04373fdf1f273bd2774db515
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs16_gbs128_acc8_1n1g 6863 MiB / 31.27 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs32_gbs32_acc1_1n1g 7047 MiB / 32.48 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp8_pp1_zerofalse_stage0_mbs32_gbs256_acc8_1n8g 2994 MiB / 15.45 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp1_pp4_zerotrue_stage2_mbs32_gbs512_acc8_1n8g 6928 MiB / 172.2 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs32_gbs512_acc8_1n8g 5576 MiB / 57.88 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs32_gbs128_acc1_1n4g 4272 MiB / 96.93 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs32_gbs1024_acc8_1n8g 5254 MiB / 71.63 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs32_gbs128_acc1_1n8g 4486 MiB / 64.46 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs32_gbs2048_acc8_1n8g 5176 MiB / 225.52 samples/s
libai_bert_large_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs32_gbs256_acc1_1n8g 3836 MiB / 186.07 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp4_zerofalse_stage0_mbs16_gbs128_acc8_1n4g 6600 MiB / 47.98 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp8_zerofalse_stage0_mbs24_gbs384_acc16_1n8g 6980 MiB / 74.12 samples/s
libai_bert_large_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp4_pp1_zerofalse_stage0_mbs32_gbs256_acc8_1n4g 6264 MiB / 8.34 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs8_gbs64_acc8_1n1g 8143 MiB / 13.11 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp1_zerofalse_stage0_mbs8_gbs8_acc1_1n1g 7159 MiB / 12.85 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp1_pp4_zerofalse_stage0_mbs12_gbs96_acc8_1n4g 7520 MiB / 40.35 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp1_mp8_pp1_zerofalse_stage0_mbs8_gbs64_acc8_1n8g 2664 MiB / 7.54 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp1_pp4_zerotrue_stage2_mbs8_gbs128_acc8_1n8g 6530 MiB / 67.79 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs8_gbs128_acc8_1n8g 4778 MiB / 25.26 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs8_gbs256_acc8_1n4g 5710 MiB / 42.6 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs8_gbs32_acc1_1n4g 4222 MiB / 33.07 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs8_gbs256_acc8_1n8g 6999 MiB / 32.25 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs8_gbs32_acc1_1n8g 6203 MiB / 27.37 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs8_gbs512_acc8_1n8g 5186 MiB / 83.01 samples/s
libai_gpt2_pretrain_graph_nl24_nah16_hs1024_fp16_actrue_dp8_mp1_pp1_zerotrue_stage2_mbs8_gbs64_acc1_1n8g 3786 MiB / 63.25 samples/s
libai_gpt2_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp1_pp8_zerofalse_stage0_mbs6_gbs96_acc16_1n8g 5882 MiB / 34.24 samples/s
libai_gpt2_pretrain_graph_nl48_nah16_hs1024_fp16_actrue_dp1_mp4_pp1_zerofalse_stage0_mbs8_gbs64_acc8_1n4g 5710 MiB / 4.06 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp1_zerotrue_stage2_mbs128_gbs1024_acc8_1n1g 5998 MiB / 336.53 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp1_zerotrue_stage2_mbs256_gbs256_acc1_1n1g 5460 MiB / 350.41 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp4_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 10830 MiB / 250.29 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp4_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 5338 MiB / 268.82 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp2_pp2_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 9298 MiB / 227.69 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp2_pp2_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 4832 MiB / 250.33 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp4_pp1_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 5574 MiB / 192.65 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp4_pp1_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 4256 MiB / 225.54 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp1_pp2_zerotrue_stage2_mbs128_gbs256_acc1_1n4g 3124 MiB / 475.87 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp1_pp2_zerotrue_stage2_mbs64_gbs1024_acc8_1n4g 5354 MiB / 493.47 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp1_zerotrue_stage2_mbs128_gbs256_acc1_1n4g 2940 MiB / 484.22 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp1_zerotrue_stage2_mbs64_gbs1024_acc8_1n4g 3508 MiB / 414.16 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs128_gbs2048_acc8_1n8g 9338 MiB / 285.32 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp2_zerotrue_stage2_mbs256_gbs512_acc1_1n8g 4858 MiB / 312.68 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs32_gbs1024_acc8_1n4g 2356 MiB / 863.62 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs64_gbs256_acc1_1n4g 2066 MiB / 1021.74 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp2_zerotrue_stage2_mbs128_gbs512_acc1_1n8g 3046 MiB / 657.89 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp2_zerotrue_stage2_mbs64_gbs2048_acc8_1n8g 5058 MiB / 570.16 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs128_gbs512_acc1_1n8g 2918 MiB / 536.49 samples/s
libai_swin_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp2_pp1_zerotrue_stage2_mbs64_gbs2048_acc8_1n8g 3480 MiB / 515.8 samples/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant