WIP data augmentations for audio dataset #83

BirgerMoell · 2022-12-08T10:20:58Z

Here is a first attempt of adding data augmentations to the whisper training script. Some nice improvements would be using a flag instead of running it for all data.
There is a run.sh that seems to be working for trying it out.
Since the data is using streaming mode it would probably take a while to load things in and try it out.

BirgerMoell · 2022-12-08T10:44:56Z

I got an error running the code. So this is currently NOT WORKING.

  File "/home/bmoell/community-events/whisper-fine-tuning-event/stream_with_augmentations.py", line 661, in <module>
    main()
  File "/home/bmoell/community-events/whisper-fine-tuning-event/stream_with_augmentations.py", line 398, in main
    raw_datasets["train"] = augment_dataset(raw_datasets["train"])
  File "/home/bmoell/community-events/whisper-fine-tuning-event/stream_with_augmentations.py", line 299, in augment_dataset
    dataset_name = interleave_datasets([dataset_name, augmented_noise, augmented_pitch, augmented_time_stretch])
  File "/home/bmoell/miniconda3/envs/fine-tune/lib/python3.9/site-packages/datasets/combine.py", line 128, in interleave_datasets
    return _interleave_iterable_datasets(
  File "/home/bmoell/miniconda3/envs/fine-tune/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1478, in _interleave_iterable_datasets
    _check_if_features_can_be_aligned([dset.features for dset in datasets])
  File "/home/bmoell/miniconda3/envs/fine-tune/lib/python3.9/site-packages/datasets/features/features.py", line 2000, in _check_if_features_can_be_aligned
    raise ValueError(
ValueError: The features can't be aligned because the key audio of features {'client_id': Value(dtype='string', id=None), 'path': Value(dtype='string', id=None), 'audio': {'array': Sequence(feature=Value(dtype='float32', id=None), length=-1, id=None), 'path': Value(dtype='string', id=None), 'sampling_rate': Value(dtype='int64', id=None)}, 'sentence': Value(dtype='string', id=None), 'up_votes': Value(dtype='int64', id=None), 'down_votes': Value(dtype='int64', id=None), 'age': Value(dtype='string', id=None), 'gender': Value(dtype='string', id=None), 'accent': Value(dtype='string', id=None), 'locale': Value(dtype='string', id=None), 'segment': Value(dtype='string', id=None)} has unexpected type - {'array': Sequence(feature=Value(dtype='float32', id=None), length=-1, id=None), 'path': Value(dtype='string', id=None), 'sampling_rate': Value(dtype='int64', id=None)} (expected either Audio(sampling_rate=48000, mono=True, decode=True, id=None) or Value("null").

sanchit-gandhi · 2022-12-08T15:10:58Z

Hey @BirgerMoell! Super cool PR! Would love to see how data aug impacts Whisper training. Could you try updating datasets to main and seeing if that fixes the issue?

pip install git+https://github.com/huggingface/datasets

Vaibhavs10 · 2022-12-20T17:28:37Z

Hi @BirgerMoell - This is a really wonderful PR. just wondering if you double-checked @sanchit-gandhi's suggestion? We'd love to merge this!

BirgerMoell added 2 commits December 8, 2022 11:09

Added augmentation code

a66b0a8

Working augmentations

259c430

Vaibhavs10 requested a review from sanchit-gandhi December 8, 2022 10:38

asr-lord mentioned this pull request Dec 16, 2022

augmentation part parambharat/whisper-finetuning#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP data augmentations for audio dataset #83

WIP data augmentations for audio dataset #83

BirgerMoell commented Dec 8, 2022

BirgerMoell commented Dec 8, 2022

sanchit-gandhi commented Dec 8, 2022

Vaibhavs10 commented Dec 20, 2022

WIP data augmentations for audio dataset #83

Are you sure you want to change the base?

WIP data augmentations for audio dataset #83

Conversation

BirgerMoell commented Dec 8, 2022

BirgerMoell commented Dec 8, 2022

sanchit-gandhi commented Dec 8, 2022

Vaibhavs10 commented Dec 20, 2022