We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compared with your pretrained ms1mv3-resnet18.pt, my loss only converge to 7~9.But yours can lower to 3. And vertification accuracy on LFW, CFP-FP,AGEDB is lower than yours. My training log as below: Training: 2024-08-23 18:05:28,434-rank_id: 0 Training: 2024-08-23 18:05:35,756-: margin_list (1.0, 0.5, 0.0) Training: 2024-08-23 18:05:35,757-: network vit_t_dp005_mask0 Training: 2024-08-23 18:05:35,757-: resume False Training: 2024-08-23 18:05:35,757-: save_all_states False Training: 2024-08-23 18:05:35,757-: output ./output Training: 2024-08-23 18:05:35,757-: embedding_size 512 Training: 2024-08-23 18:05:35,757-: sample_rate 1.0 Training: 2024-08-23 18:05:35,757-: interclass_filtering_threshold0 Training: 2024-08-23 18:05:35,757-: fp16 True Training: 2024-08-23 18:05:35,757-: batch_size 128 Training: 2024-08-23 18:05:35,757-: optimizer adamw Training: 2024-08-23 18:05:35,757-: lr 0.001 Training: 2024-08-23 18:05:35,757-: momentum 0.9 Training: 2024-08-23 18:05:35,757-: weight_decay 0.1 Training: 2024-08-23 18:05:35,757-: verbose 2000 Training: 2024-08-23 18:05:35,757-: frequent 10 Training: 2024-08-23 18:05:35,757-: dali False Training: 2024-08-23 18:05:35,757-: dali_aug False Training: 2024-08-23 18:05:35,757-: gradient_acc 12 Training: 2024-08-23 18:05:35,757-: seed 2048 Training: 2024-08-23 18:05:35,757-: num_workers 2 Training: 2024-08-23 18:05:35,757-: wandb_key XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Training: 2024-08-23 18:05:35,757-: suffix_run_name None Training: 2024-08-23 18:05:35,757-: using_wandb False Training: 2024-08-23 18:05:35,757-: wandb_entity entity Training: 2024-08-23 18:05:35,757-: wandb_project project Training: 2024-08-23 18:05:35,757-: wandb_log_all True Training: 2024-08-23 18:05:35,757-: save_artifacts False Training: 2024-08-23 18:05:35,757-: wandb_resume False Training: 2024-08-23 18:05:35,757-: rec /home/diwu/ms1m-retinaface-t1 Training: 2024-08-23 18:05:35,757-: num_classes 93431 Training: 2024-08-23 18:05:35,757-: num_image 5179510 Training: 2024-08-23 18:05:35,757-: num_epoch 40 Training: 2024-08-23 18:05:35,757-: warmup_epoch 4 Training: 2024-08-23 18:05:35,757-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2024-08-23 18:05:35,757-: total_batch_size 256 Training: 2024-08-23 18:05:35,757-: warmup_step 80928 Training: 2024-08-23 18:05:35,757-: total_step 809280 Training: 2024-08-23 18:05:58,809-Reducer buckets have been rebuilt in this iteration. Training: 2024-08-23 18:06:01,313-Speed 1832.48 samples/sec Loss 41.0577 LearningRate 0.000000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2024-08-23 18:06:02,709-Speed 1834.00 samples/sec Loss 40.9355 LearningRate 0.000000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2024-08-23 18:06:04,107-Speed 1831.92 samples/sec Loss 40.8581 LearningRate 0.000000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2024-08-23 18:06:05,504-Speed 1831.81 samples/sec Loss 41.0100 LearningRate 0.000001 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 65536 Required: 44 hours .... .... .... Training: 2024-08-26 18:54:07,722-[lfw][202000]XNorm: 22.565637 Training: 2024-08-26 18:54:07,722-[lfw][202000]Accuracy-Flip: 0.99450+-0.00299 Training: 2024-08-26 18:54:07,722-[lfw][202000]Accuracy-Highest: 0.99650 Training: 2024-08-26 18:54:18,676-[cfp_fp][202000]XNorm: 18.734116 Training: 2024-08-26 18:54:18,676-[cfp_fp][202000]Accuracy-Flip: 0.95271+-0.01478 Training: 2024-08-26 18:54:18,677-[cfp_fp][202000]Accuracy-Highest: 0.95671 Training: 2024-08-26 18:54:27,894-[agedb_30][202000]XNorm: 21.819971 Training: 2024-08-26 18:54:27,894-[agedb_30][202000]Accuracy-Flip: 0.96633+-0.00605 Training: 2024-08-26 18:54:27,894-[agedb_30][202000]Accuracy-Highest: 0.96700 Training: 2024-08-26 18:54:29,187-Speed 82.41 samples/sec Loss 8.5546 LearningRate 0.010015 Epoch: 19 Global Step: 202010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:54:30,479-Speed 1981.47 samples/sec Loss 8.6193 LearningRate 0.010015 Epoch: 19 Global Step: 202020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:54:31,774-Speed 1976.17 samples/sec Loss 8.1831 LearningRate 0.010014 Epoch: 19 Global Step: 202030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:54:33,071-Speed 1975.02 samples/sec Loss 7.3021 LearningRate 0.010014 Epoch: 19 Global Step: 202040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:54:34,368-Speed 1973.67 samples/sec Loss 8.6017 LearningRate 0.010013 Epoch: 19 Global Step: 202050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:35,665-Speed 1974.46 samples/sec Loss 9.4399 LearningRate 0.010013 Epoch: 19 Global Step: 202060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:36,963-Speed 1972.38 samples/sec Loss 7.9415 LearningRate 0.010012 Epoch: 19 Global Step: 202070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:38,261-Speed 1972.60 samples/sec Loss 9.0044 LearningRate 0.010012 Epoch: 19 Global Step: 202080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:39,561-Speed 1969.95 samples/sec Loss 7.3248 LearningRate 0.010011 Epoch: 19 Global Step: 202090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:40,861-Speed 1968.29 samples/sec Loss 7.8766 LearningRate 0.010011 Epoch: 19 Global Step: 202100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:42,162-Speed 1968.74 samples/sec Loss 7.9212 LearningRate 0.010010 Epoch: 19 Global Step: 202110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:43,461-Speed 1971.19 samples/sec Loss 8.5024 LearningRate 0.010010 Epoch: 19 Global Step: 202120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:44,762-Speed 1967.31 samples/sec Loss 7.9085 LearningRate 0.010009 Epoch: 19 Global Step: 202130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:46,064-Speed 1967.08 samples/sec Loss 8.9264 LearningRate 0.010009 Epoch: 19 Global Step: 202140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:47,362-Speed 1972.59 samples/sec Loss 8.8305 LearningRate 0.010008 Epoch: 19 Global Step: 202150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:48,663-Speed 1968.19 samples/sec Loss 8.6759 LearningRate 0.010008 Epoch: 19 Global Step: 202160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:49,961-Speed 1972.03 samples/sec Loss 7.8577 LearningRate 0.010007 Epoch: 19 Global Step: 202170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:51,262-Speed 1968.46 samples/sec Loss 9.6192 LearningRate 0.010007 Epoch: 19 Global Step: 202180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:52,562-Speed 1968.97 samples/sec Loss 8.2040 LearningRate 0.010006 Epoch: 19 Global Step: 202190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:54:53,857-Speed 1977.23 samples/sec Loss 8.2249 LearningRate 0.010006 Epoch: 19 Global Step: 202200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:54:55,160-Speed 1964.82 samples/sec Loss 7.8687 LearningRate 0.010005 Epoch: 19 Global Step: 202210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:54:56,459-Speed 1971.58 samples/sec Loss 7.7311 LearningRate 0.010005 Epoch: 19 Global Step: 202220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:54:57,760-Speed 1968.06 samples/sec Loss 7.2900 LearningRate 0.010004 Epoch: 19 Global Step: 202230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:54:59,056-Speed 1975.46 samples/sec Loss 8.7321 LearningRate 0.010004 Epoch: 19 Global Step: 202240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:55:00,354-Speed 1971.76 samples/sec Loss 7.9745 LearningRate 0.010003 Epoch: 19 Global Step: 202250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:55:01,653-Speed 1972.12 samples/sec Loss 9.1854 LearningRate 0.010003 Epoch: 19 Global Step: 202260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:55:02,950-Speed 1973.24 samples/sec Loss 7.3085 LearningRate 0.010002 Epoch: 19 Global Step: 202270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:55:04,247-Speed 1974.55 samples/sec Loss 8.3732 LearningRate 0.010002 Epoch: 19 Global Step: 202280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:55:05,548-Speed 1967.94 samples/sec Loss 8.4167 LearningRate 0.010001 Epoch: 19 Global Step: 202290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2024-08-26 18:55:06,844-Speed 1975.90 samples/sec Loss 8.4624 LearningRate 0.010001 Epoch: 19 Global Step: 202300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:55:08,155-Speed 1952.32 samples/sec Loss 7.9820 LearningRate 0.010000 Epoch: 19 Global Step: 202310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2024-08-26 18:55:09,452-Speed 1975.00 samples/sec Loss 9.1936 LearningRate 0.010000 Epoch: 19 Global Step: 202320 Fp16 Grad Scale: 65536 Required: 8 hours
The text was updated successfully, but these errors were encountered:
Enlarge the total batch-size
Sorry, something went wrong.
No branches or pull requests
Compared with your pretrained ms1mv3-resnet18.pt, my loss only converge to 7~9.But yours can lower to 3.
And vertification accuracy on LFW, CFP-FP,AGEDB is lower than yours.
My training log as below:
Training: 2024-08-23 18:05:28,434-rank_id: 0
Training: 2024-08-23 18:05:35,756-: margin_list (1.0, 0.5, 0.0)
Training: 2024-08-23 18:05:35,757-: network vit_t_dp005_mask0
Training: 2024-08-23 18:05:35,757-: resume False
Training: 2024-08-23 18:05:35,757-: save_all_states False
Training: 2024-08-23 18:05:35,757-: output ./output
Training: 2024-08-23 18:05:35,757-: embedding_size 512
Training: 2024-08-23 18:05:35,757-: sample_rate 1.0
Training: 2024-08-23 18:05:35,757-: interclass_filtering_threshold0
Training: 2024-08-23 18:05:35,757-: fp16 True
Training: 2024-08-23 18:05:35,757-: batch_size 128
Training: 2024-08-23 18:05:35,757-: optimizer adamw
Training: 2024-08-23 18:05:35,757-: lr 0.001
Training: 2024-08-23 18:05:35,757-: momentum 0.9
Training: 2024-08-23 18:05:35,757-: weight_decay 0.1
Training: 2024-08-23 18:05:35,757-: verbose 2000
Training: 2024-08-23 18:05:35,757-: frequent 10
Training: 2024-08-23 18:05:35,757-: dali False
Training: 2024-08-23 18:05:35,757-: dali_aug False
Training: 2024-08-23 18:05:35,757-: gradient_acc 12
Training: 2024-08-23 18:05:35,757-: seed 2048
Training: 2024-08-23 18:05:35,757-: num_workers 2
Training: 2024-08-23 18:05:35,757-: wandb_key XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Training: 2024-08-23 18:05:35,757-: suffix_run_name None
Training: 2024-08-23 18:05:35,757-: using_wandb False
Training: 2024-08-23 18:05:35,757-: wandb_entity entity
Training: 2024-08-23 18:05:35,757-: wandb_project project
Training: 2024-08-23 18:05:35,757-: wandb_log_all True
Training: 2024-08-23 18:05:35,757-: save_artifacts False
Training: 2024-08-23 18:05:35,757-: wandb_resume False
Training: 2024-08-23 18:05:35,757-: rec /home/diwu/ms1m-retinaface-t1
Training: 2024-08-23 18:05:35,757-: num_classes 93431
Training: 2024-08-23 18:05:35,757-: num_image 5179510
Training: 2024-08-23 18:05:35,757-: num_epoch 40
Training: 2024-08-23 18:05:35,757-: warmup_epoch 4
Training: 2024-08-23 18:05:35,757-: val_targets ['lfw', 'cfp_fp', 'agedb_30']
Training: 2024-08-23 18:05:35,757-: total_batch_size 256
Training: 2024-08-23 18:05:35,757-: warmup_step 80928
Training: 2024-08-23 18:05:35,757-: total_step 809280
Training: 2024-08-23 18:05:58,809-Reducer buckets have been rebuilt in this iteration.
Training: 2024-08-23 18:06:01,313-Speed 1832.48 samples/sec Loss 41.0577 LearningRate 0.000000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2024-08-23 18:06:02,709-Speed 1834.00 samples/sec Loss 40.9355 LearningRate 0.000000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2024-08-23 18:06:04,107-Speed 1831.92 samples/sec Loss 40.8581 LearningRate 0.000000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2024-08-23 18:06:05,504-Speed 1831.81 samples/sec Loss 41.0100 LearningRate 0.000001 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 65536 Required: 44 hours
....
....
....
Training: 2024-08-26 18:54:07,722-[lfw][202000]XNorm: 22.565637
Training: 2024-08-26 18:54:07,722-[lfw][202000]Accuracy-Flip: 0.99450+-0.00299
Training: 2024-08-26 18:54:07,722-[lfw][202000]Accuracy-Highest: 0.99650
Training: 2024-08-26 18:54:18,676-[cfp_fp][202000]XNorm: 18.734116
Training: 2024-08-26 18:54:18,676-[cfp_fp][202000]Accuracy-Flip: 0.95271+-0.01478
Training: 2024-08-26 18:54:18,677-[cfp_fp][202000]Accuracy-Highest: 0.95671
Training: 2024-08-26 18:54:27,894-[agedb_30][202000]XNorm: 21.819971
Training: 2024-08-26 18:54:27,894-[agedb_30][202000]Accuracy-Flip: 0.96633+-0.00605
Training: 2024-08-26 18:54:27,894-[agedb_30][202000]Accuracy-Highest: 0.96700
Training: 2024-08-26 18:54:29,187-Speed 82.41 samples/sec Loss 8.5546 LearningRate 0.010015 Epoch: 19 Global Step: 202010 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:30,479-Speed 1981.47 samples/sec Loss 8.6193 LearningRate 0.010015 Epoch: 19 Global Step: 202020 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:31,774-Speed 1976.17 samples/sec Loss 8.1831 LearningRate 0.010014 Epoch: 19 Global Step: 202030 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:33,071-Speed 1975.02 samples/sec Loss 7.3021 LearningRate 0.010014 Epoch: 19 Global Step: 202040 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:34,368-Speed 1973.67 samples/sec Loss 8.6017 LearningRate 0.010013 Epoch: 19 Global Step: 202050 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:35,665-Speed 1974.46 samples/sec Loss 9.4399 LearningRate 0.010013 Epoch: 19 Global Step: 202060 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:36,963-Speed 1972.38 samples/sec Loss 7.9415 LearningRate 0.010012 Epoch: 19 Global Step: 202070 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:38,261-Speed 1972.60 samples/sec Loss 9.0044 LearningRate 0.010012 Epoch: 19 Global Step: 202080 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:39,561-Speed 1969.95 samples/sec Loss 7.3248 LearningRate 0.010011 Epoch: 19 Global Step: 202090 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:40,861-Speed 1968.29 samples/sec Loss 7.8766 LearningRate 0.010011 Epoch: 19 Global Step: 202100 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:42,162-Speed 1968.74 samples/sec Loss 7.9212 LearningRate 0.010010 Epoch: 19 Global Step: 202110 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:43,461-Speed 1971.19 samples/sec Loss 8.5024 LearningRate 0.010010 Epoch: 19 Global Step: 202120 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:44,762-Speed 1967.31 samples/sec Loss 7.9085 LearningRate 0.010009 Epoch: 19 Global Step: 202130 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:46,064-Speed 1967.08 samples/sec Loss 8.9264 LearningRate 0.010009 Epoch: 19 Global Step: 202140 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:47,362-Speed 1972.59 samples/sec Loss 8.8305 LearningRate 0.010008 Epoch: 19 Global Step: 202150 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:48,663-Speed 1968.19 samples/sec Loss 8.6759 LearningRate 0.010008 Epoch: 19 Global Step: 202160 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:49,961-Speed 1972.03 samples/sec Loss 7.8577 LearningRate 0.010007 Epoch: 19 Global Step: 202170 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:51,262-Speed 1968.46 samples/sec Loss 9.6192 LearningRate 0.010007 Epoch: 19 Global Step: 202180 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:52,562-Speed 1968.97 samples/sec Loss 8.2040 LearningRate 0.010006 Epoch: 19 Global Step: 202190 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:53,857-Speed 1977.23 samples/sec Loss 8.2249 LearningRate 0.010006 Epoch: 19 Global Step: 202200 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:55,160-Speed 1964.82 samples/sec Loss 7.8687 LearningRate 0.010005 Epoch: 19 Global Step: 202210 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:56,459-Speed 1971.58 samples/sec Loss 7.7311 LearningRate 0.010005 Epoch: 19 Global Step: 202220 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:57,760-Speed 1968.06 samples/sec Loss 7.2900 LearningRate 0.010004 Epoch: 19 Global Step: 202230 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:59,056-Speed 1975.46 samples/sec Loss 8.7321 LearningRate 0.010004 Epoch: 19 Global Step: 202240 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:00,354-Speed 1971.76 samples/sec Loss 7.9745 LearningRate 0.010003 Epoch: 19 Global Step: 202250 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:01,653-Speed 1972.12 samples/sec Loss 9.1854 LearningRate 0.010003 Epoch: 19 Global Step: 202260 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:02,950-Speed 1973.24 samples/sec Loss 7.3085 LearningRate 0.010002 Epoch: 19 Global Step: 202270 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:04,247-Speed 1974.55 samples/sec Loss 8.3732 LearningRate 0.010002 Epoch: 19 Global Step: 202280 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:05,548-Speed 1967.94 samples/sec Loss 8.4167 LearningRate 0.010001 Epoch: 19 Global Step: 202290 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:06,844-Speed 1975.90 samples/sec Loss 8.4624 LearningRate 0.010001 Epoch: 19 Global Step: 202300 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:55:08,155-Speed 1952.32 samples/sec Loss 7.9820 LearningRate 0.010000 Epoch: 19 Global Step: 202310 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:55:09,452-Speed 1975.00 samples/sec Loss 9.1936 LearningRate 0.010000 Epoch: 19 Global Step: 202320 Fp16 Grad Scale: 65536 Required: 8 hours
The text was updated successfully, but these errors were encountered: