Skip to content

Implementation of a novel 'helicality' algorithm that quantifies the octave equivalence of frequency sub-bands in an audio dataset.

License

Notifications You must be signed in to change notification settings

sripathisridhar/sridhar2020ismir

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Helicality: An Isomap-based Measure of Octave Equivalence in Audio Data

This is the repository pertaining to the above-titled Late-Breaking Demo presented at ISMIR 2020.
In this paper, we introduce a novel algorithm to measure the octave-equivalence of audio datasets. Octave equivalence serves as domain-knowledge in MIR systems, including chromagram, spiral convolutional networks, and harmonic CQT. Prior work has applied the Isomap manifold learning algorithm to unlabeled audio data to embed frequency sub-bands in 3-D space where the Euclidean distances are inversely proportional to the strength of their Pearson correlations. However, discovering octave equivalence via Isomap requires visual inspection and is not scalable. To address this problem, we define "helicality" as the goodness of fit of the 3-D Isomap embedding to a Shepherd-Risset helix. Our method is unsupervised and uses a custom Frank-Wolfe algorithm to minimize a least-squares objective inside a convex hull. Numerical experiments indicate that isolated musical notes have a higher helicality than speech, followed by drum hits.

Dependencies

mir-data
sklearn, scipy, numpy (core numerical computation)
librosa (audio feature extraction)
matplotlib, colorcet (plotting)
h5py, json (data handling)

Download and run

Dataset features are pre-computed and stored in the corresponding .h5 files in the root directory.
Execute main.py from a command line terminal with the name of the dataset you want to test.

python3 main.py -d tinysol

Plots are stored in the ./convexHull sub-directory by default.
Numerical results are stored in the <dataset>_helicality.json format in the main directory.

Datasets

TinySOL (Isolated notes played on 14 different instruments)
ENST-drums (dry_mix subset which contains isolated hits on drums)
NTVOW (North Texas Vowel Dataset, containing 12 vowel utterances from 50 speakers)

Links

Pre-print
ISMIR Presentation Video

About

Implementation of a novel 'helicality' algorithm that quantifies the octave equivalence of frequency sub-bands in an audio dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published