This repository contains the implementation of a Vision Transformer (ViT) model for food type detection. The project aims to reproduce the original ViT study on this task and compare the performance of a model trained from scratch against a transfer learning-based model, along with the EfficientDet model.
Food type detection is a challenging computer vision task that involves classifying different types of food items from images. The Vision Transformer (ViT) model has shown promising results in various image classification tasks, including food type detection. This project aims to implement and evaluate the performance of ViT on the food type detection dataset.
The dataset used for this project is a custom food type detection dataset. Unfortunately, due to copyright and licensing restrictions, we cannot share the dataset publicly. However, you can use your own dataset or obtain a suitable food type detection dataset from open datasets available online.
To set up the environment and install the required dependencies, follow these steps:
- Clone the repository:
git clone https://github.com/PG-9-9/Food-Net.git
cd Food-Net