You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am currently analyzing your code in an attempt to reproduce the paper performance of bs_roformer. While examining the code, I have come across a few points that I am curious about and would like to inquire further.
Upon reviewing the settings related to bs_roformer, I noticed that the configuration predominantly uses only vocals and other in the instruments setup. Generally, MSS datasets like MUSDB18 are composed of a 4-stem setup: [vocals, drums, bass, other], and the mixture is a combination of these four stems. However, your default setting is [vocals, other], and using this setting results in a mixture composed only of vocals + other. I am curious whether this configuration is an error or if there was a specific task intended for this particular setup.
Additionally, while examining the code, I noticed that loudness augmentation is applied once during the load_random_mix function when performing mixup, and again in getitem. I would like to clarify whether applying loudness augmentation twice is by design, a misunderstanding on my part, or a coding error.
I would appreciate your response to these two questions.
Thank you.
The text was updated successfully, but these errors were encountered:
Additionally, I would like to reproduce the performance presented in the paper using only the MUSDB18-HQ dataset. Have you trained the model using only the MUSDB18-HQ dataset? Also, did the performance results align closely with those reported in the paper?
Hello. I didn't try to reproduce the paper results using only MUSDB18 dataset. I trained only vocals and bass models using big datasets. Also as I remember from paper authors trained independent model for each musdb stem.
Hello, I am currently analyzing your code in an attempt to reproduce the paper performance of bs_roformer. While examining the code, I have come across a few points that I am curious about and would like to inquire further.
Upon reviewing the settings related to bs_roformer, I noticed that the configuration predominantly uses only vocals and other in the instruments setup. Generally, MSS datasets like MUSDB18 are composed of a 4-stem setup: [vocals, drums, bass, other], and the mixture is a combination of these four stems. However, your default setting is [vocals, other], and using this setting results in a mixture composed only of vocals + other. I am curious whether this configuration is an error or if there was a specific task intended for this particular setup.
Additionally, while examining the code, I noticed that loudness augmentation is applied once during the load_random_mix function when performing mixup, and again in getitem. I would like to clarify whether applying loudness augmentation twice is by design, a misunderstanding on my part, or a coding error.
I would appreciate your response to these two questions.
Thank you.
The text was updated successfully, but these errors were encountered: