Skip to content

jet-c-21/ASL_Translator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASL Translator

ASL Schematic Diagram

Dataset

Original Data on Kaggle

In this project, we create our own dataset, the Dataset-A, by arranging the following data from Kaggle.com:

Dataset-A

We want to challenge if our model is general and robust or not, so we build this hybrid dataset. Links for download:

Structure of Dataset-A

In both train data and test data, they contain alphabet A to Z, 26 classes folder of right hand image instances data.

Training Data

  • All data are the subset of dataset1
  • We get rid of some images that cannot pass our image-pipeline in dataset1
  • The image count for each alphabet is approximately to the amount 2220.

Here's the chart of our image count distribution in Dataset-A training set:

image count distribution in Dataset-A training set

Testing Data

  • For each alphabet, we select 555 image instances (2220 * 0.2 = 555) from dataset2, dataset3 and dataset4

Here's the chart of our image count distribution in Dataset-A testing set:

image count distribution in Dataset-A testing set

Methodology

a. Data Preprocessing

The demo images are in the folder pipeline-demo, the image file name prefix indicates the pipeline fucntion type

image-pipeline, with two different type

I. General Pipeline:

This kind of pipeline can be used in any kinds of preprocessing stage.

// work flow
1. roi normalization (by mediapipe)
2. background normalization (by rembg)
3. skin normalization
4. channel normalization
5. resolution normaliztion

II. Training Pipeline:

This kind of pipeline can only be used in training preprocessing stage.

// work flow
1. background normalization (by rembg)
2. roi normalization (by mediapipe)
3. skin normalization
4. channel normalization
5. resolution normaliztion

Data Augmentation

  1. Implement by keras.ImageDataGenerator with zoom_range=0.1,, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1
  2. Implement by keras.model.Engine, we create our own Spatial Transformer Layer stn().

b. Model Building

You can download our models at here: saved_models_v1

Normal Model - Pure CNN Structure without Spatial Transform Layers:

The implement code is in asl_model/models.py-get_model_1()

model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), strides=(1, 1), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.BatchNormalization())

model.add(layers.MaxPool2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), strides=(1, 1), activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.BatchNormalization())

model.add(layers.MaxPool2D((2, 2)))

model.add(layers.Conv2D(128, (3, 3), strides=(1, 1), activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool2D((2, 2)))

# finish feature extraction
model.add(layers.Flatten())

model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.25))

model.add(layers.Dense(26, activation='softmax'))

STL Model - Spatial Transform Layer with CNN Structure

The implement code is in asl_model/stl/struct_a/model.py-get_stn_a_model_8()

input_layers = layers.Input((size, size, 1))
x = stn(input_layers)

x = layers.Conv2D(32, (3, 3), strides=(1, 1), activation='relu',
                  kernel_initializer=initializers.glorot_uniform(seed=STN_SEED))(x)
x = layers.BatchNormalization()(x)

x = layers.Conv2D(64, (3, 3), strides=(1, 1), activation='relu',
                  kernel_initializer=initializers.glorot_uniform(seed=STN_SEED))(x)
x = layers.BatchNormalization()(x)

x = layers.Conv2D(128, (3, 3), strides=(1, 1), activation='relu',
                  kernel_initializer=initializers.glorot_uniform(seed=STN_SEED))(x)
x = layers.BatchNormalization()(x)

x = layers.Flatten()(x)

x = layers.Dense(512, activation='relu')(x)
x = layers.Dropout(0.5)(x)

x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.25)(x)

output_layers = layers.Dense(26, activation="softmax")(x)

model = tf.keras.Model(input_layers, output_layers)

c. Model Training

Basic

  • 57717 train images, 20% will become the validation data
  • 14430 test image, 555 test images for each alphabet

First, data-structure-selection

Select the best structure for normal-model and stl-model. With following hyper-parameters:

lr = 0.001
epoch = 10
batch_size = 128

optimizer=tf.keras.optimizers.Adam(learning_rate = lr),
loss='categorical_crossentropy',
metrics=['accuracy']

Second, use callback function to train the best model of each type. With following settings:

BATCH = 128
EPOCH = 100 # max epoch

# call back functions
es_callback = tf.keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
reduce_lr_callback = tf.keras.callbacks.ReduceLROnPlateau()

loss="categorical_crossentropy" 
optimizer="adam" 
metrics=["accuracy"]

d. Model Evaluation

Normal Model

Validation Data - Epoch Accuracy

  • Train : 0.9917 (Orange)
  • Valid : 0.9864 (Blue)

    normal-acc-s normal-acc-l

Validation Data - Epoch Loss

  • Train : 0.02555 (Orange)
  • Valid : 0.05084 (Blue)

    normal-loss-s normal-loss-l

Testing Data - Total Accuracy : 89.4%

Testing Data - F1-Score Report:

STL Model

Validation Data - Epoch Accuracy

  • Train : 0.9871 (Orange)
  • Valid : 0.9883 (Blue)

    stl-acc-s stl-acc-l

Validation Data - Epoch Loss

  • Train : 0.05062 (Orange)
  • Valid : 0.04649 (Blue)

    stl-loss-s stl-loss-l

Testing Data - Total Accuracy : 90.6%

Testing Data - F1-Score Report:

There are more evaluation charts in the folder charts

Getting Starting

Environment

conda create --name aslt python=3.8 -y

(Options) - If you want to use jupyter run these commands

// actactivate venv
pip install ipykernel
python -m ipykernel install --user --name aslt --display-name "ASLT"

install Pytorch for package rembg

  1. get pyTorch install instructions on pytorch.org For example:
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Python 0.2%