Questions about reproducing resnet18 training result with ms1mv3 dataset #2664

FengMu1995 · 2024-10-15T03:52:54Z

Compared with your pretrained ms1mv3-resnet18.pt, my loss only converge to 7~9.But yours can lower to 3.
And vertification accuracy on LFW, CFP-FP,AGEDB is lower than yours.
My training log as below:
Training: 2024-08-23 18:05:28,434-rank_id: 0
Training: 2024-08-23 18:05:35,756-: margin_list (1.0, 0.5, 0.0)
Training: 2024-08-23 18:05:35,757-: network vit_t_dp005_mask0
Training: 2024-08-23 18:05:35,757-: resume False
Training: 2024-08-23 18:05:35,757-: save_all_states False
Training: 2024-08-23 18:05:35,757-: output ./output
Training: 2024-08-23 18:05:35,757-: embedding_size 512
Training: 2024-08-23 18:05:35,757-: sample_rate 1.0
Training: 2024-08-23 18:05:35,757-: interclass_filtering_threshold0
Training: 2024-08-23 18:05:35,757-: fp16 True
Training: 2024-08-23 18:05:35,757-: batch_size 128
Training: 2024-08-23 18:05:35,757-: optimizer adamw
Training: 2024-08-23 18:05:35,757-: lr 0.001
Training: 2024-08-23 18:05:35,757-: momentum 0.9
Training: 2024-08-23 18:05:35,757-: weight_decay 0.1
Training: 2024-08-23 18:05:35,757-: verbose 2000
Training: 2024-08-23 18:05:35,757-: frequent 10
Training: 2024-08-23 18:05:35,757-: dali False
Training: 2024-08-23 18:05:35,757-: dali_aug False
Training: 2024-08-23 18:05:35,757-: gradient_acc 12
Training: 2024-08-23 18:05:35,757-: seed 2048
Training: 2024-08-23 18:05:35,757-: num_workers 2
Training: 2024-08-23 18:05:35,757-: wandb_key XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Training: 2024-08-23 18:05:35,757-: suffix_run_name None
Training: 2024-08-23 18:05:35,757-: using_wandb False
Training: 2024-08-23 18:05:35,757-: wandb_entity entity
Training: 2024-08-23 18:05:35,757-: wandb_project project
Training: 2024-08-23 18:05:35,757-: wandb_log_all True
Training: 2024-08-23 18:05:35,757-: save_artifacts False
Training: 2024-08-23 18:05:35,757-: wandb_resume False
Training: 2024-08-23 18:05:35,757-: rec /home/diwu/ms1m-retinaface-t1
Training: 2024-08-23 18:05:35,757-: num_classes 93431
Training: 2024-08-23 18:05:35,757-: num_image 5179510
Training: 2024-08-23 18:05:35,757-: num_epoch 40
Training: 2024-08-23 18:05:35,757-: warmup_epoch 4
Training: 2024-08-23 18:05:35,757-: val_targets ['lfw', 'cfp_fp', 'agedb_30']
Training: 2024-08-23 18:05:35,757-: total_batch_size 256
Training: 2024-08-23 18:05:35,757-: warmup_step 80928
Training: 2024-08-23 18:05:35,757-: total_step 809280
Training: 2024-08-23 18:05:58,809-Reducer buckets have been rebuilt in this iteration.
Training: 2024-08-23 18:06:01,313-Speed 1832.48 samples/sec Loss 41.0577 LearningRate 0.000000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2024-08-23 18:06:02,709-Speed 1834.00 samples/sec Loss 40.9355 LearningRate 0.000000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2024-08-23 18:06:04,107-Speed 1831.92 samples/sec Loss 40.8581 LearningRate 0.000000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2024-08-23 18:06:05,504-Speed 1831.81 samples/sec Loss 41.0100 LearningRate 0.000001 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 65536 Required: 44 hours
....
....
....
Training: 2024-08-26 18:54:07,722-[lfw][202000]XNorm: 22.565637
Training: 2024-08-26 18:54:07,722-[lfw][202000]Accuracy-Flip: 0.99450+-0.00299
Training: 2024-08-26 18:54:07,722-[lfw][202000]Accuracy-Highest: 0.99650
Training: 2024-08-26 18:54:18,676-[cfp_fp][202000]XNorm: 18.734116
Training: 2024-08-26 18:54:18,676-[cfp_fp][202000]Accuracy-Flip: 0.95271+-0.01478
Training: 2024-08-26 18:54:18,677-[cfp_fp][202000]Accuracy-Highest: 0.95671
Training: 2024-08-26 18:54:27,894-[agedb_30][202000]XNorm: 21.819971
Training: 2024-08-26 18:54:27,894-[agedb_30][202000]Accuracy-Flip: 0.96633+-0.00605
Training: 2024-08-26 18:54:27,894-[agedb_30][202000]Accuracy-Highest: 0.96700
Training: 2024-08-26 18:54:29,187-Speed 82.41 samples/sec Loss 8.5546 LearningRate 0.010015 Epoch: 19 Global Step: 202010 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:30,479-Speed 1981.47 samples/sec Loss 8.6193 LearningRate 0.010015 Epoch: 19 Global Step: 202020 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:31,774-Speed 1976.17 samples/sec Loss 8.1831 LearningRate 0.010014 Epoch: 19 Global Step: 202030 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:33,071-Speed 1975.02 samples/sec Loss 7.3021 LearningRate 0.010014 Epoch: 19 Global Step: 202040 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:34,368-Speed 1973.67 samples/sec Loss 8.6017 LearningRate 0.010013 Epoch: 19 Global Step: 202050 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:35,665-Speed 1974.46 samples/sec Loss 9.4399 LearningRate 0.010013 Epoch: 19 Global Step: 202060 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:36,963-Speed 1972.38 samples/sec Loss 7.9415 LearningRate 0.010012 Epoch: 19 Global Step: 202070 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:38,261-Speed 1972.60 samples/sec Loss 9.0044 LearningRate 0.010012 Epoch: 19 Global Step: 202080 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:39,561-Speed 1969.95 samples/sec Loss 7.3248 LearningRate 0.010011 Epoch: 19 Global Step: 202090 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:40,861-Speed 1968.29 samples/sec Loss 7.8766 LearningRate 0.010011 Epoch: 19 Global Step: 202100 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:42,162-Speed 1968.74 samples/sec Loss 7.9212 LearningRate 0.010010 Epoch: 19 Global Step: 202110 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:43,461-Speed 1971.19 samples/sec Loss 8.5024 LearningRate 0.010010 Epoch: 19 Global Step: 202120 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:44,762-Speed 1967.31 samples/sec Loss 7.9085 LearningRate 0.010009 Epoch: 19 Global Step: 202130 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:46,064-Speed 1967.08 samples/sec Loss 8.9264 LearningRate 0.010009 Epoch: 19 Global Step: 202140 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:47,362-Speed 1972.59 samples/sec Loss 8.8305 LearningRate 0.010008 Epoch: 19 Global Step: 202150 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:48,663-Speed 1968.19 samples/sec Loss 8.6759 LearningRate 0.010008 Epoch: 19 Global Step: 202160 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:49,961-Speed 1972.03 samples/sec Loss 7.8577 LearningRate 0.010007 Epoch: 19 Global Step: 202170 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:51,262-Speed 1968.46 samples/sec Loss 9.6192 LearningRate 0.010007 Epoch: 19 Global Step: 202180 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:52,562-Speed 1968.97 samples/sec Loss 8.2040 LearningRate 0.010006 Epoch: 19 Global Step: 202190 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:54:53,857-Speed 1977.23 samples/sec Loss 8.2249 LearningRate 0.010006 Epoch: 19 Global Step: 202200 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:55,160-Speed 1964.82 samples/sec Loss 7.8687 LearningRate 0.010005 Epoch: 19 Global Step: 202210 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:56,459-Speed 1971.58 samples/sec Loss 7.7311 LearningRate 0.010005 Epoch: 19 Global Step: 202220 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:57,760-Speed 1968.06 samples/sec Loss 7.2900 LearningRate 0.010004 Epoch: 19 Global Step: 202230 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:54:59,056-Speed 1975.46 samples/sec Loss 8.7321 LearningRate 0.010004 Epoch: 19 Global Step: 202240 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:00,354-Speed 1971.76 samples/sec Loss 7.9745 LearningRate 0.010003 Epoch: 19 Global Step: 202250 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:01,653-Speed 1972.12 samples/sec Loss 9.1854 LearningRate 0.010003 Epoch: 19 Global Step: 202260 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:02,950-Speed 1973.24 samples/sec Loss 7.3085 LearningRate 0.010002 Epoch: 19 Global Step: 202270 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:04,247-Speed 1974.55 samples/sec Loss 8.3732 LearningRate 0.010002 Epoch: 19 Global Step: 202280 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:05,548-Speed 1967.94 samples/sec Loss 8.4167 LearningRate 0.010001 Epoch: 19 Global Step: 202290 Fp16 Grad Scale: 32768 Required: 8 hours
Training: 2024-08-26 18:55:06,844-Speed 1975.90 samples/sec Loss 8.4624 LearningRate 0.010001 Epoch: 19 Global Step: 202300 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:55:08,155-Speed 1952.32 samples/sec Loss 7.9820 LearningRate 0.010000 Epoch: 19 Global Step: 202310 Fp16 Grad Scale: 65536 Required: 8 hours
Training: 2024-08-26 18:55:09,452-Speed 1975.00 samples/sec Loss 9.1936 LearningRate 0.010000 Epoch: 19 Global Step: 202320 Fp16 Grad Scale: 65536 Required: 8 hours

nttstar · 2024-10-25T14:29:48Z

Enlarge the total batch-size

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about reproducing resnet18 training result with ms1mv3 dataset #2664

Questions about reproducing resnet18 training result with ms1mv3 dataset #2664

FengMu1995 commented Oct 15, 2024

nttstar commented Oct 25, 2024

Questions about reproducing resnet18 training result with ms1mv3 dataset #2664

Questions about reproducing resnet18 training result with ms1mv3 dataset #2664

Comments

FengMu1995 commented Oct 15, 2024

nttstar commented Oct 25, 2024