While the volume of data collected for vision based tasks has increased exponentially in recent times, annotating all unstructured datasets is practically impossible.
DINO
which is based self supervised learning
, does not require large amounts of labelled data to achieve state of the art results on segmentation tasks, unlike traditional supervised methods.
To be specific, DINO is self distillation
with NO labels
, wherein 2 models (teacher and student) are used. While they have the same model architecture, the teacher model is trained using an exponentially weighted average of the student model's parameters.
This technique was introduced in the research paper by Facebook AI titled "Emerging Properties in Self-Supervised Vision Transformers"
.
Visualization of the generated attention maps highlight that DINO can learn class specific features automatically, which help us generate accurate segmentation maps without the need of labelled data in vision based tasks.
gdown --id 1Lw_XPTbkoHUtWpG4U9ByYIwwmLlufvyj
unzip PASCALVOC2007.zip
git clone https://github.com/TanyaChutani/DINO_Tf2.x.git
pip install -r requirements.txt
python main.py