Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

实验复现结果不一致 #7

Open
wljcode opened this issue Dec 14, 2021 · 17 comments
Open

实验复现结果不一致 #7

wljcode opened this issue Dec 14, 2021 · 17 comments

Comments

@wljcode
Copy link

wljcode commented Dec 14, 2021

作者您好,我们通过下载您的代码并对您提出的VAC进行了重跑了50个epoch(没有使用BN),结果最好只有35.1%的词错率。此外,我们调整代码中的权重,对baseline算法进行实验(不使用BN),发现结果也与论文中结果相差甚多,请问是否代码版本不一致,又或我们训练时间过短?

@wljcode
Copy link
Author

wljcode commented Dec 14, 2021

第一份是使用VAC算法的log文件
log.txt
第二份是使用baseline的log文件
log.txt

@ycmin95
Copy link
Collaborator

ycmin95 commented Dec 14, 2021

@wljcode
Thanks for your attention to our work. It seems like you used batch size=1, which may affect the robustness of the model. Besides, the learning rate does not decay during training? Perhaps because there may exist some bugs in checkpoint loading, I will check this later.

Relevant logs are uploaded for comparson.

baseline.txt
baseline_bn.txt
baseline_VAC.txt

@ycmin95
Copy link
Collaborator

ycmin95 commented Jan 6, 2022

@wljcode
Have you successfully reimplemented the experimental results? I checked the relevant logs and found that you adopt the load_weights to continue training rather than load_checkpoints, the former only load the model weights and the latter will load all training relevant parameters. It is expected to adopt load_checkpoints to continue training.

@wljcode
Copy link
Author

wljcode commented Jan 6, 2022

Thank you for your reply. Due to the limitation of GPU memory, we did not continue the experiment reproduction work recently. We will complete your work later when the equipment is ready !

1 similar comment
@wljcode
Copy link
Author

wljcode commented Jan 6, 2022

Thank you for your reply. Due to the limitation of GPU memory, we did not continue the experiment reproduction work recently. We will complete your work later when the equipment is ready !

@sunke123
Copy link

Hi, @ycmin95 , thanks for your great work.
I try to reproduce the work recently. The final result is 0.4% worse than yours.
Here is my training log.
log.txt
After 70th epoch, the performance cannot be improved as yours.
Besides, I find "label_smoothing = 0.1" in your log, but not in the released code.
Could you provide some advice?

@ycmin95
Copy link
Collaborator

ycmin95 commented Apr 24, 2022

Hi, @sunke123, thanks for your attention to our work.
We will explain this performance gap in our next update, perhaps in two weeks,
which can achieve better performance (about 20% WER) with fewer training epochs.
You can conduct further experiments on this codebase, the update won't change the network structure and the training process.

The parameter label_smoothing is adopted in our early experiment about iterative training and I forget to delete this parameter, I will correct it in the next update.

@sunke123
Copy link

@ycmin95
Cooooool!
Thanks for your reply.
Looking forward to that~

@ycmin95
Copy link
Collaborator

ycmin95 commented May 14, 2022

Hi, @sunke123,
the code has been updated~

@herochen7372
Copy link

你好,请问我下载了代码重新训练了一下,但几轮下来DEV wer依旧是100%.我把batchsize调成1,lr调成0.000010.可以给点意见吗,谢谢.

@ycmin95
Copy link
Collaborator

ycmin95 commented Jul 13, 2022

@herochen7372
You can first check whether the evaluation script runs as expected with the provided pretrained model, and than check whether the loss decreases as the iteration progresses.

@herochen7372
Copy link

@ycmin95
Thanks for your reply.

@kido1412y2y
Copy link

@wljcode Have you successfully reimplemented the experimental results? I checked the relevant logs and found that you adopt the load_weights to continue training rather than load_checkpoints, the former only load the model weights and the latter will load all training relevant parameters. It is expected to adopt load_checkpoints to continue training.
@ycmin95
Hello author, I have encountered the same problem. Can you provide more detailed information on how to solve this problem? Sorry, I didn't understand the method here. Thank you very much. I only have a 3060 GPU, so my batchsize = 1.

I noticed in your log that there are Dev WER and Test WER for each epoch of training, but mine is only Dev WER.

Looking forward to your help.
dev.txt
log.txt

@ycmin95
Copy link
Collaborator

ycmin95 commented Aug 3, 2023

It seems that you only train the baseline without the proposed VAC or SMKD, please follow the Readme.md to set the configuration file. We remove the evaluation on the test set during training for efficiency, you can modify this process in the main.py.

@kido1412y2y
Copy link

kido1412y2y commented Aug 27, 2023

Hello, @ycmin95.
After configuring the settings according to the readme, I ran 40 epoch, but the best result achieved was only 32.3%. Could you please let me know if I might have missed any settings?
log_SMKD_no_ConvCTC.txt
dev_SMKD_no_ConvCTC.txt

I've noticed "# ConvCTC: 1.0" in the baseline.yaml file. I added ConvCTC: 1.0 and retrained 80 epoch, but the results were even worse.
log_SMKD_ConvCTC.txt
dev_SMKD_ConvCTC.txt

@ycmin95
Copy link
Collaborator

ycmin95 commented Aug 28, 2023

Hi, @kido1412y2y
Can you report the evaluation results with a batch size larger than 1? I never run experiments with batch size of 1, and not sure the influence of it.

@lzy910
Copy link

lzy910 commented Oct 4, 2024

home/pickleball/公共的/oldli/VAC_CSLR-main
preprocess.sh ./work_dir/baseline_res18/output-hypothesis-test-conv.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm
2024年 10月 04日 星期五 17:27:11 CST
Preprocess Finished.
/home/pickleball/公共的/oldli/VAC_CSLR-main
preprocess.sh ./work_dir/baseline_res18/output-hypothesis-test.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm
2024年 10月 04日 星期五 17:27:13 CST
Preprocess Finished.
[ Fri Oct 4 17:27:14 2024 ] Epoch 6667, test 73.10%
[ Fri Oct 4 17:27:14 2024 ] Evaluation Done.
After testing with the resnet18_slr_pretrained_distill25.pt file, why are the test results so high?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants