Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The best performance of pretrained model #68

Closed
xwhkkk opened this issue Mar 6, 2023 · 9 comments
Closed

The best performance of pretrained model #68

xwhkkk opened this issue Mar 6, 2023 · 9 comments

Comments

@xwhkkk
Copy link

xwhkkk commented Mar 6, 2023

Thanks for sharing your released the pretrained model. I wonder whether the default model XMem.pth is training with stage 03 ?The proposed result on val and test set is only the 107K pth ?
Many thanks in advance !

@hkchengrex
Copy link
Owner

The default is s03.
It is a 107K model if I recall correctly.

@xwhkkk
Copy link
Author

xwhkkk commented Mar 6, 2023

Thanks for your kind reply.
Have you trained the model with stage 02 ? What's the best model in stage 02 ?
I tested the 160K model in stage02 is 86.2, but in stage 03 it decrease to 85.8 in 107K model. It is norm ? How should we choose which stage to use ? (2 or 3)

Thanks for your patience.

@hkchengrex
Copy link
Owner

hkchengrex commented Mar 6, 2023

I have tried s02 but it has basically the same performance as in s03 so I opted for shorter training instead. If s02 works better in your case then go for it. I have not observed any overfitting in training stage 2, and the only caveat is that it takes longer to train.

@xwhkkk
Copy link
Author

xwhkkk commented Mar 8, 2023

I have tried s02 but it has basically the same performance as in s03 so I opted for shorter training instead. If s02 works better in your case then go for it. I have not observed any overfitting in training stage 2, and the only caveat is that it takes longer to train.
Thanks. I evaulate my base training(stage03) result on val set with 2 A-100 gpu and 4 A-100 gpu training (keep batch size = 8 ),but the result is only 85.8 and 84.9. Could you give some suggestions what caused that ?

@hkchengrex
Copy link
Owner

I have only trained it on the few machines that I have access to and I have not had any significantly worse results so I have little idea. Have you tried to look at all the last few network weights (105K-110K) and see if any of them is better?

Another reason can be PIL 8 vs. PIL 9 -- they use different JPEG reading algorithms and I recently run into problems with one of my recent projects but I am not sure if it affects XMem.

If longer training works in your case it is probably the easiest thing to do.

@hkchengrex
Copy link
Owner

There seems to be a similar issue in #60. This is probably not an isolated case but it is very hard for me to debug...

@xwhkkk
Copy link
Author

xwhkkk commented Mar 9, 2023

Thanks for your kind reply. The only difference between stage 02 and stage 03 is the number of iterations , right ? So I think the result of stage 02 and stage 03 in 100K iterations shoud be nearly same, but I tested them on DAVIS val set found it is 84.0 and 84.7, respectively. Could you give me some suggestions ?

@hkchengrex
Copy link
Owner

We adjust the maximum skip between frames (curriculum learning) using the training progress in terms of the percentage of total iterations. So they are not the same when the total number of iterations is different.

I will try to investigate the training issue.

@hkchengrex
Copy link
Owner

Continue in #71.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants