Skip to content

Dhruvoza2512/StatisticalLearningProject

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

StatisticalLearningProject

Object Detection with CNNs, ViT, and YOLO

Project Description: Object Detection Algorithm Comparison on Pascal VOC 2007

Overview

This project aims to implement and compare the performance of several object detection algorithms, specifically Faster R-CNN with a Vision Transformer (ViT) backbone, and You Only Look Once (YOLO), using the Pascal VOC 2007 dataset. The goal is to evaluate these models based on their accuracy, efficiency, and speed in detecting objects across various categories in the dataset.

Dataset: Pascal VOC 2007

Source: The Pascal Visual Object Classes (VOC) 2007 dataset.

Content: The dataset contains images across 20 categories with annotations, including object labels and bounding boxes.

Task: Object detection, requiring the model to predict both the classes and the locations of objects in images.

Data Processing

Preprocessing:

Resize images to a uniform size (e.g., 800x800 pixels) for model input.

Normalize pixel values to a range suitable for neural network inputs.

Convert annotations to a format compatible with the models, including class labels and bounding box coordinates.

Augmentation (Optional): Apply data augmentation techniques such as flipping, rotation, and scaling to increase the diversity of training data and improve model robustness.

Data Loader Setup: Implement data loaders with efficient batching, shuffling, and parallel processing to streamline training and evaluation.

Algorithms

Faster R-CNN

YOLO

ViT

Training

Environment: Python with PyTorch, Google Colab, leveraging CUDA for GPU acceleration.

Loss Functions: Utilize appropriate loss functions for object detection, combining classification and bounding box regression losses.

Optimization: Apply optimizers like SGD or Adam, with learning rate schedules and regularization techniques to improve training outcomes.

Evaluation and Comparison

Metrics:

Evaluate model performance using metrics such as mean Average Precision (mAP), precision, recall, IoU and F1 score. Measure inference speed and computational efficiency to compare model practicality in real-world applications.

Analysis:

Conduct a thorough analysis of the models' performance across different object categories and under varying conditions (e.g., object sizes, occlusion). Compare the models qualitatively by visualizing detection results, highlighting strengths and weaknesses.

Reporting and Documentation

Results: Compile detailed results, including quantitative metrics and qualitative assessments, in a structured report or presentation.

Insights: Discuss insights gained from the comparison, including the impact of architectural choices on performance and potential areas for further research or application.

Code Documentation: Ensure the project code is well-documented, with clear explanations of the implementation details and usage instructions.

Note: ReadMe generated with GPT

Conclusion and Future Work

Summarize the key findings from the comparison, emphasizing practical implications for object detection tasks. Suggest avenues for future research, such as exploring hybrid models, applying the algorithms to other datasets, or integrating additional enhancements to improve accuracy and efficiency

About

Object Detection with CNNs, ViT, and YOLO

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 64.2%
  • Python 35.8%