-
-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add balance_weights
to weight balanced batches
#1588
Draft
FrenchKrab
wants to merge
6
commits into
pyannote:develop
Choose a base branch
from
FrenchKrab:balance_weights
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
But before caching training metadata was introduced Squashed commit of the following: commit d41ce0a Author: Hervé BREDIN <[email protected]> Date: Thu Jan 11 13:04:18 2024 +0100 doc: fix typo in README commit 8f477fa Author: Hervé BREDIN <[email protected]> Date: Tue Jan 9 13:06:09 2024 +0100 fix(task): fix random generators (pyannote#1594) Before this change, each worker would select the same files, resulting in less randomness than expected. commit eda0c51 Author: Hervé BREDIN <[email protected]> Date: Mon Jan 8 17:05:05 2024 +0100 Delete .github/ISSUE_TEMPLATE/feature_request.md commit eb2e813 Author: Hervé BREDIN <[email protected]> Date: Mon Jan 8 17:04:22 2024 +0100 github: update config.yml (pyannote#1607) commit 27cd91f Author: Hervé BREDIN <[email protected]> Date: Mon Jan 8 17:02:40 2024 +0100 github: create config.yml commit 42ef141 Author: Hervé BREDIN <[email protected]> Date: Mon Jan 8 16:53:52 2024 +0100 github: add bug_report.yml template commit 808b170 Author: Hervé BREDIN <[email protected]> Date: Mon Jan 8 16:36:24 2024 +0100 feat: add MRE template commit e21e7bb Author: Hervé BREDIN <[email protected]> Date: Mon Jan 8 09:52:19 2024 +0100 ci: deactivate FAQtory commit 80634c9 Author: Clément Pagés <[email protected]> Date: Fri Dec 22 09:16:12 2023 +0100 fix: update `isort` version to 5.12.0 in pre-commit-config (pyannote#1596) Co-authored-by: clement-pages <[email protected]> commit 7bd88d5 Author: Hervé BREDIN <[email protected]> Date: Wed Dec 20 21:26:42 2023 +0100 feat(pipeline): add Waveform and SampleRate preprocessors (pyannote#1593) commit 4d2d16b Author: Hervé BREDIN <[email protected]> Date: Wed Dec 20 16:03:13 2023 +0100 doc: update benchmark section (pyannote#1592) commit 66dd72b Author: Hervé BREDIN <[email protected]> Date: Fri Dec 15 16:10:51 2023 +0100 feat(model): add `num_frames` and `receptive_field` to segmentation models Co-authored-by: Bilal Rahou <[email protected]>
(not tested in this branch)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
balance
option of the segmentation tasks allows to pass a list ofProtocolFile
fields, e.g.['database', 'foo']
. Then when batches are sampled, it looks at all existing combinations of values for these fields in the task protocol.For example if they come from databases
aishell
andami
, and theirfoo
field is eithera
orb
, we compute the cartesian product[('aishell', 'a'), ('aishell', 'b'), ('ami', 'a'), ('ami', 'b')]
, batches are created by randomly selecting one of these tuples and picking a sample from a matching file.The PR allows to weight the random choice from the cartesian product. For example with
we will sample from the cartesian product using random.choices with these weights:
e.g. for each tuple of the cartesian product, we find the longest matching (tuple) prefix in
balance_weights
and use this weight.I'm not sure this approach is flexible/clean enough to be PR-ready, and it's hard to make the docstring concise, but i think it could be really useful :)