Training from scratch #55

Ugenteraan · 2024-03-10T13:10:46Z

Hey everyone!

First off, thanks for the great work. I implemented my own version of I-JEPA (https://github.com/Ugenteraan/I-JEPA) by referencing to this repository.

I used the Doges 77 Breeds (https://www.kaggle.com/datasets/madibokishev/doges-77-breeds) dataset for the training. The loss goes down in a convincing manner during the SSL training. However during the downstream, when I load the pre-trained weights from the encoder and use probing, the accuracy is no better than a randomly initialized encoder weights.

Does anyone have a clue on what might have been the cause of this?

Thanks in advance! Cheers.

FalsoMoralista · 2024-04-08T18:58:26Z

Did you checked the paper's appendix? I guess that you could find more intuition with respect to things specially linear probing.
Did you perform avg pool on the encoder output? (i.e., "We use the target-encoder for evaluation and average pool its output to produce a global image representation.").
What is the dimensionality of the encoder output embeddings that you are using? After pooling the target encoder output I managed to modify it into (batch_size, 1280) but I have seen people using (batch_size, 256) and I'm still not sure yet about which one is the appropriate one (similar issue).

I was having a similar issue in which loss wasn't decreasing, then I realized that I was initializing the optimizer with the model parameters before adding the linear head to it, therefore the parameters related to classification wasn't getting accounted into the optimizer. That solved I'm still struggling to train it in a supervised fashion.

I would suggest that we try to unite into a communication channel such as discord or something to share progress about this stuff.

Ugenteraan · 2024-04-09T14:07:40Z

Hey @FalsoMoralista thanks for the comment! Sure, let's take this to discord. My handle is johnweak15. Do add me there!

bdytx5 · 2024-06-27T18:18:46Z

You all able to solve this? - Brett

FalsoMoralista · 2024-07-01T18:05:29Z

@bdytx5 yes we did. @lazarosgogos was also able to conduct some insightful experiments with it as well. What do you wanted to know specifically?

bdytx5 · 2024-07-01T23:52:43Z

Well, I tried training with IJEPA on cifar10 and then using the pretrained model to fine tune on cifar10, using the labels during fine tuning (with just the target encoder). I compared the fine tuning to randomly initialized model, and the results seemed to be the same. I averaged the output embeddings of the last layer. Does this seem strange? Note I just used the tiny_vit.

FalsoMoralista · 2024-07-02T14:45:21Z

Curious! For how many epochs did you pre-trained over cifar10?

bdytx5 · 2024-07-02T15:29:39Z

Around 10 or so, as after that train and validation loss began to rise.

lazarosgogos · 2024-07-02T20:11:25Z

If you've left the config file untouched, there is most likely a warmup period of eg. 40 epochs out of the 300 in total. The loss going up after some epochs (depends on your configuration and total number of epochs) is a normal behavior as mentioned in #41.

Try letting your model train for at lest 50-60 epochs (with appropriate changes in the configuration) and then try a downstream task. In the early epochs the model doesn't learn semantic representations of the data, even though the loss seems goes down (I've tested this personally).

Once I get to test the ViT-tiny and ViT-small models, I will get back with the differences.

bdytx5 · 2024-07-04T23:10:42Z

Ah, I overlooked this. Good catch

akshayneema · 2024-07-15T21:28:34Z

@lazarosgogos Did you get to test ViT-small models? It'd be really helpful if you could share the working configuration for those. Thanks.

lazarosgogos · 2024-07-16T11:30:56Z

@lazarosgogos Did you get to test ViT-small models? It'd be really helpful if you could share the working configuration for those. Thanks.

@akshayneema It heavily depends on what type of resources (e.g. GPUs) you have at hand. The more VRAM you have, the bigger the model you can load. The bigger the images you use, the larger the VRAM you'll need

For example,to train on ImageNet's images, with a ViT-small model, on 16GB VRAM, I was able to load at most a batch of 60 images per iteration (the rest of the config was untouched)

akshayneema · 2024-07-16T18:21:59Z

Thanks for the reply @lazarosgogos

Can you also share how were the results like for you using ViT-small? Were the results competitive with ViT-H or ViT-G? Did you also change the predictor model architecture to suit ViT-small architecture?

I am currently using 1 GeForce RTX 3090 GPU training with 32 batch size. I am using UMAP to visualise the embeddings generated by the target-encoder and it does not look that great.

lazarosgogos · 2024-07-18T09:10:20Z

@akshayneema The results using ViT-small were not competitive with ViT-huge or ViT-Giant, not even close. The difference in some linear probing tasks was immense (>30%).

The point of using ViT-small or ViT-base is mostly, in my opinion, to run tests and see how the model performs, in order to then train a ViT-Huge for final results.

I did not touch the architecture of the predictor when testing how ViT-small behaves. Batch size plays a role in training as well, keep that in mind.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training from scratch #55

Training from scratch #55

Ugenteraan commented Mar 10, 2024

FalsoMoralista commented Apr 8, 2024 •

edited

Loading

Ugenteraan commented Apr 9, 2024

bdytx5 commented Jun 27, 2024

FalsoMoralista commented Jul 1, 2024 •

edited

Loading

bdytx5 commented Jul 1, 2024 •

edited

Loading

FalsoMoralista commented Jul 2, 2024 •

edited

Loading

bdytx5 commented Jul 2, 2024

lazarosgogos commented Jul 2, 2024

bdytx5 commented Jul 4, 2024

akshayneema commented Jul 15, 2024

lazarosgogos commented Jul 16, 2024

akshayneema commented Jul 16, 2024

lazarosgogos commented Jul 18, 2024

Training from scratch #55

Training from scratch #55

Comments

Ugenteraan commented Mar 10, 2024

FalsoMoralista commented Apr 8, 2024 • edited Loading

Ugenteraan commented Apr 9, 2024

bdytx5 commented Jun 27, 2024

FalsoMoralista commented Jul 1, 2024 • edited Loading

bdytx5 commented Jul 1, 2024 • edited Loading

FalsoMoralista commented Jul 2, 2024 • edited Loading

bdytx5 commented Jul 2, 2024

lazarosgogos commented Jul 2, 2024

bdytx5 commented Jul 4, 2024

akshayneema commented Jul 15, 2024

lazarosgogos commented Jul 16, 2024

akshayneema commented Jul 16, 2024

lazarosgogos commented Jul 18, 2024

FalsoMoralista commented Apr 8, 2024 •

edited

Loading

FalsoMoralista commented Jul 1, 2024 •

edited

Loading

bdytx5 commented Jul 1, 2024 •

edited

Loading

FalsoMoralista commented Jul 2, 2024 •

edited

Loading