Overview

This is repository is created as a part of final project for Fundamentals of Machine Learning (EEL5840) under Prof Alina Zare in University of Florida for the Master's in Computer Science program. An implentation of Deep convolutional neural network inspired by the famous "Lenet"(http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf) Architecture with Pytorch to recognize Handwritten Characters.

Requirement

The detailed requirement of this project will be found out in the file project1.pdf

DataSet

The Dataset is custom handwritten charecters provided by Prof Alina Zare created by her students of Electrical and Computer Engineering Department of the University of Florida.

Easy DataSet

The “easy” test set is composed of hand-written ‘a’ and ‘b’ characters. The code should produce the labels ’1’ for ’a’ and ’2’ for ’b’

Hard DataSet

The goal is to train the system to distinguish between handwritten characters. The “hard” data set consists of the following characters: ’a’, ’b’,’c’,’d’,’h’, ’i’, ’j’ and ’k’ and ’unknown’. There will be test data points from classes that do not appear in the training data. So, the system have come up with a way to identify when a test point class is “unknown” or what not in the training data. The label you should return for this case is -1. The code outputs a class label that matches the class value in the provided training data. These should be: 1,2,3,4,5,6,7,8, and -1 respectively.

Models

We came up with three types of models -

with Batch Normalization (https://arxiv.org/pdf/1502.03167.pdf)
without Batch Normalization
with Dropout (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)

We first split the dataset into train, test and validation set. Then we used different parameters(learning rates, epochs, batch sizes) to train our network and based on the results we found out the model without batch normalization produces highest accuracy(97.469%). So we selected that model as our final deliverable model.

Parameters of the Convolution Network

Crossvalidation framework

skorch (0.7)

Pytorch version

1.3.1

Easy Dataset

Training

Training “easy” blind data set

For (“easy” blind test data set) all the parameters(ex epoch, learning rate) are listed in ./Handwritten-Character-Recognition/train.py file.
For specifying the paths for the files of the dataset and label set, please use the variables data_set_path and label_set_path.
Please place the the files of the dataset and label set in the Handwritten-Character-Recognition folder.
The model will be generated in the ./Handwritten-Character-Recognition/model folder.
All the details of the models during training process will be genrated in the ./Handwritten-Character-Recognition/metrics folder.

Testing

Testing “easy” blind data set

For (“easy” blind test data set) all the parameters(ex epoch, learning rate) are listed in ./Handwritten-Character-Recognition/test.py file.
For specifying the paths for the files of the dataset, please use the variables data_set_path variable.

Hard Dataset

Training

Training Hard Dataset

For (“easy” blind test data set) all the parameters(ex epoch, learning rate) are listed in ./Handwritten-Character-Recognition/train_extra_credit.py file.
For specifying the paths for the files of the dataset and label set, please use the variables data_set_path and label_set_path.
Please place the the files of the dataset and label set in the Handwritten-Character-Recognition folder.
The model will be generated in the ./Handwritten-Character-Recognition/model folder.
All the details of the models during training process will be genrated in the ./Handwritten-Character-Recognition/metrics folder.

Testing

Testing Hard Dataset

For (“easy” blind test data set) all the parameters(ex epoch, learning rate) are listed in ./Handwritten-Character-Recognition/test_extra_credit.py file.
For specifying the paths for the files of the dataset, please use the variables data_set_path variable.

Caution

To generate a new model with new parameters, please run train.py and train_extra_credit.py file first. Existing models are found at the location Handwritten-Character-Recognition/model

How to run

Easy Dataset

Training:
cd Handwritten-Character-Recognition
python train.py
Testing:
python test.py

Hard Dataset

Training:
cd Handwritten-Character-Recognition
python train_extra_credit.py
Testing:
python test_extra_credit.py

Final Output

The final output after running the ./Handwritten-Character-Recognition/test.py and ./Handwritten-Character-Recognition/test_extra_credit.py files will be found at the files easy_file.npy and hard_file.npy respectively. In these two files the predicted labels are stored as numpy arrays.

Project report

Project report is included in the FML_final.pdf

Final Result on real test data

As per the teaching assistants, when they ran the model on the test data set, the model produces an accuracy of 97.3% and 86.5% on the easy and hard test dataset respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.idea		.idea
Handwritten-Character-Recognition		Handwritten-Character-Recognition
ab_plots		ab_plots
extra_credit_plots		extra_credit_plots
.DS_Store		.DS_Store
FML_Project.pdf		FML_Project.pdf
Parameters.png		Parameters.png
README.md		README.md
project1.pdf		project1.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Requirement

DataSet

Easy DataSet

Hard DataSet

Models

Parameters of the Convolution Network

Crossvalidation framework

Pytorch version

Easy Dataset

Training

Testing

Hard Dataset

Training

Testing

Caution

How to run

Easy Dataset

Hard Dataset

Final Output

Project report

Final Result on real test data

About

Releases

Packages

Contributors 3

Languages

shantanu-ai/Handwritten-Character-Recognition

Folders and files

Latest commit

History

Repository files navigation

Overview

Requirement

DataSet

Easy DataSet

Hard DataSet

Models

Parameters of the Convolution Network

Crossvalidation framework

Pytorch version

Easy Dataset

Training

Testing

Hard Dataset

Training

Testing

Caution

How to run

Easy Dataset

Hard Dataset

Final Output

Project report

Final Result on real test data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages