Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP data augmentations for audio dataset #83

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

BirgerMoell
Copy link

Here is a first attempt of adding data augmentations to the whisper training script. Some nice improvements would be using a flag instead of running it for all data.
There is a run.sh that seems to be working for trying it out.
Since the data is using streaming mode it would probably take a while to load things in and try it out.

@BirgerMoell
Copy link
Author

I got an error running the code. So this is currently NOT WORKING.

  File "/home/bmoell/community-events/whisper-fine-tuning-event/stream_with_augmentations.py", line 661, in <module>
    main()
  File "/home/bmoell/community-events/whisper-fine-tuning-event/stream_with_augmentations.py", line 398, in main
    raw_datasets["train"] = augment_dataset(raw_datasets["train"])
  File "/home/bmoell/community-events/whisper-fine-tuning-event/stream_with_augmentations.py", line 299, in augment_dataset
    dataset_name = interleave_datasets([dataset_name, augmented_noise, augmented_pitch, augmented_time_stretch])
  File "/home/bmoell/miniconda3/envs/fine-tune/lib/python3.9/site-packages/datasets/combine.py", line 128, in interleave_datasets
    return _interleave_iterable_datasets(
  File "/home/bmoell/miniconda3/envs/fine-tune/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1478, in _interleave_iterable_datasets
    _check_if_features_can_be_aligned([dset.features for dset in datasets])
  File "/home/bmoell/miniconda3/envs/fine-tune/lib/python3.9/site-packages/datasets/features/features.py", line 2000, in _check_if_features_can_be_aligned
    raise ValueError(
ValueError: The features can't be aligned because the key audio of features {'client_id': Value(dtype='string', id=None), 'path': Value(dtype='string', id=None), 'audio': {'array': Sequence(feature=Value(dtype='float32', id=None), length=-1, id=None), 'path': Value(dtype='string', id=None), 'sampling_rate': Value(dtype='int64', id=None)}, 'sentence': Value(dtype='string', id=None), 'up_votes': Value(dtype='int64', id=None), 'down_votes': Value(dtype='int64', id=None), 'age': Value(dtype='string', id=None), 'gender': Value(dtype='string', id=None), 'accent': Value(dtype='string', id=None), 'locale': Value(dtype='string', id=None), 'segment': Value(dtype='string', id=None)} has unexpected type - {'array': Sequence(feature=Value(dtype='float32', id=None), length=-1, id=None), 'path': Value(dtype='string', id=None), 'sampling_rate': Value(dtype='int64', id=None)} (expected either Audio(sampling_rate=48000, mono=True, decode=True, id=None) or Value("null").

@sanchit-gandhi
Copy link
Contributor

Hey @BirgerMoell! Super cool PR! Would love to see how data aug impacts Whisper training. Could you try updating datasets to main and seeing if that fixes the issue?

pip install git+https://github.com/huggingface/datasets

@Vaibhavs10
Copy link
Member

Hi @BirgerMoell - This is a really wonderful PR. just wondering if you double-checked @sanchit-gandhi's suggestion? We'd love to merge this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants