Skip to content

Latest commit

 

History

History
83 lines (62 loc) · 4.68 KB

README.md

File metadata and controls

83 lines (62 loc) · 4.68 KB

Learn the American Manual Alphabet (AMA)

Introduction

This repository contains scripts to identify the letters of the American Manual Alphabet (AMA).

You can either train & run it locally or head directly to this repositories GitHub Page to see a demonstration using your webcam.

Requirements

This project was done using Python 3.8. The following packages were used:

Dataset acquirement

Dataset sources

Two on kaggle published datasets from SigNN Team were used.

The first one only contains images from the alphabet excluding J and Z. The second dataset contains video files of the letters J and Z, because these signs involve movements.

Extraction

To extract the landmarks, the solution MediaPipe Hands is used. Passing an image to MediaPipe it results a list of hand landmarks.

21 hand landmarks

The figure above shows the resulting hand landmarks (MediaPipe Hands).

This project includes two script to extract landmarks from either image- or video-files. You can set the number of workers, to accelerate the extraction. Every worker processes one letter in the dataset and yields a CSV file.

If the extraction encounters an image or video with a left hand, it mirrors the x-axis of the landmarks, so it behaves like a right hand.

These resulting 26 files (A.csv, B.csv, ..., Z.csv) then can be merged into one single CSV file and used for training a model.

Training

This project includes Jupyter Notebooks to train two different models. Both notebooks take the same extracted dataset CSV file.

The CatBoostClassifier converges quickly and yields great accuracy. However, while developing this project, there was this idea to include a model into a single webpage, ideally with no Python backend. So I decided to train a Multilayer perceptron with TensorFlow. The trained model then can be converted for the TensorFlow.js library and included directly in JavaScript without the need of a Python backend server.

Local inference

You can run your trained models by either running run_asl_catboost.py or run_asl_neuralnetwork.py.

Web Demo

To demonstrate and play with the trained model you can head to this repositories GitHub Page.

It loads the trained model, and uses the JavaScript capabilities of MediaPipe. The extracted landmarks from your webcam get passed to the Multilayer perceptron and the prediction is displayed on the screen.

Dependencies

The following dependencies are used for the web demo:

The modules are compiled using webpack.js., the source files can be found here.