Skip to content

Latest commit

 

History

History
101 lines (64 loc) · 5.05 KB

README.md

File metadata and controls

101 lines (64 loc) · 5.05 KB

FaceCTRL: control your media player with your face

After being interrupted dozens of times a day while coding with my headphones on, I decided to find a solution that eliminates the stress of pausing and re-playing the song I was listening to.

The idea is trivial:

  • When you're in front of your PC with your headphones on: the music plays.
  • Someone interrupts you, and you have to remove your headphones: the music pause.
  • You walk away from your PC: the music pause.
  • You come back to your PC, and you put the headphones on: the music plays again.

However, the manual control of your player is still possible. If you decide to pause the music while you're still in front of your PC with your headphones on, the control of the media player is transferred to the player itself. To give back the control to playerctrl, just walk away from your PC (or hide the image captured from your webcam using a finger, for some second).

FaceCTRL takes control of your webcam as soon as you open your media player, and releases it when you close the player.

Demonstration video: https://www.youtube.com/watch?v=48N4IU5XB6c

Requirements

  • A webcam
  • Playerctl installed (pacman -S playerctl on Archlinux)
  • Python >= 3.7
  • OpenCV is not required to be installed system-wise, but it is recommended. The python package of OpenCV doesn't contain the pre-trained models for face localization (XML files) and you have to download them from the OpenCV repository. OpenCV installed system-wise, instead, usually ships them in the /usr/share/opencv4/haarcascades/ folder.

Installation

If you just want to use this tool without making any change, you use pip:

pip install --upgrade facectrl

Please note that this software is still alpha software.

For development: clone the repository and just pip install -e .

Usage

The project does not ship a pre-trained model; you have to train a model by yourself and use it.

Why? Because I don't have enough data of people with and without headphones to train a model able to generalize well. If you're interested in contributing by sharing your dataset (to remove the training phase and give to the user a ready to use model), please open an issue.

Dataset creation

The process of dataset creation is entirely automatic. Just execute:

python -m facectrl.dataset \
           --dataset-path ~/face \
           --classifier-params /usr/share/opencv4/haarcascades/haarcascade_frontalface_alt2.xml

where:

  • --dataset-path is the destination folder of your dataset. It contains 2 folders (on and off) with the captured images with headphones on and off.
  • --classifier-params is the path of the XML file containing your face detector parameters (Viola-Jones Haar classifier)

Follow the instructions displayed in the terminal.

Hint: move in front of the camera until you see your face in a window with an increasing number on the bottom right corner. Your face is now being tracked, thus try to acquire as many images as possible with different appearances. Acquire at least 1000 images with headphones on and 1000 images with headphones off.

Hint²: you can execute this script as many times you want, the new images are added to the dataset. Thus, repeat the dataset creation phase with natural light, artificial light, and different light conditions if possible.

If you want to share your dataset, please, open an issue! In this way, we can reach the goal of shipping a well-trained model together with FaceCTRL.

Training

You can train the 3 models available with this simple bash script:

for model in ae vae classifier; do
    python -m facectrl.ml.train --dataset-path ~/face/ --logdir ~/log_$model --epochs 100  --model $model
done

where:

  • --dataset-path is the path of your training dataset (see Dataset creation).
  • --logdir is the path of your trained model. This folder contains the logs (use tensorboard to see the training progress/result tensorboard --logdir $path), and the model that reached the highest validation performance converted in SavedModel file format.

Execution

The execution is straightforward, and I highly recommend to put this script in the startup script of your system (it's easy with systemd).

NOTE: you must execute this script before starting your media player.

python -m facectrl.ctrl \
           --logdir ~/log/ \
           --player spotify \
           --classifier-params /usr/share/opencv4/haarcascades/haarcascade_frontalface_alt2.xml \
           --metric binary_accuracy

where:

  • --logdir is the log directory specified during the training.
  • --player is the media player to control.
  • --classifier-params is the path of the XML file containing your face detector parameters (Viola-Jones Haar classifier). Use the same parameters using during the Dataset creation.
  • --metric is the metric used during the training for the model selection. For the classifier model is the binary_accuracy, for the vae and ae model is the AEAccuracy.