Image Captioning with Attention

Overview

This project implements an image captioning system that uses a combination of convolutional neural networks (CNN) and gated recurrent units (GRU) with multi-head attention to generate descriptive captions for images. This approach enhances the standard CNN-LSTM model by incorporating more robust attention mechanisms and recurrent units for improved context capturing in captions.

Features

Encoder-Decoder Architecture: Utilizes ResNet-50 as the backbone for the encoder and a GRU-based decoder.
Multi-Head Attention: Implements a custom multi-head attention mechanism to focus on different parts of the image.
Dataset: Trained on the Flickr8k dataset, which consists of 8,000 images each paired with five different captions.

Requirements

To run this project, you need the following libraries:

Python 3.8+
PyTorch 1.7+
torchvision
nltk
PIL
matplotlib

Dataset

The Flickr8k dataset is used for training the image captioning model. This dataset consists of 8,000 images each paired with five different captions, which is ideal for training and testing our model.

Download the Dataset

You can download the images and annotations from the following links:

Images: Download Flickr8k_Dataset
Annotations: Download Flickr8k_Text

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
attention.py		attention.py
data_loader.py		data_loader.py
decoder.py		decoder.py
encoder.py		encoder.py
encoder_decoder.py		encoder_decoder.py
inference.ipynb		inference.ipynb
train.ipynb		train.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning with Attention

Overview

Features

Requirements

Dataset

Download the Dataset

About

Releases

Packages

Languages

swapnil110399/Image-Captioning-with-Attention

Folders and files

Latest commit

History

Repository files navigation

Image Captioning with Attention

Overview

Features

Requirements

Dataset

Download the Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages