Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about parameters #122

Open
miniTsl opened this issue Oct 14, 2023 · 11 comments
Open

questions about parameters #122

miniTsl opened this issue Oct 14, 2023 · 11 comments

Comments

@miniTsl
Copy link

miniTsl commented Oct 14, 2023

I am new to series forecasting and have some questions about dataset parameters. What's the meaning of d_input, d_output, l_output, L in src/dataloaders/xxx.py, and what is __l_max inc configs/datasets/xxx.yaml

Thank you so much~

@albertfgu
Copy link
Contributor

These refer to the dimensionality of the target inputs and outputs. __l_max specifies the maximum sequence length for that dataset; it's a convenience flag that other parts of the config can use via interpolation. For example, some sequence model layers require knowing the maximum sequence length, and they can set an internal length parameter to ${dataset.__l_max} to grab this value in the config.

@longdvt
Copy link

longdvt commented Jan 11, 2024

I'm attempting to apply S4 to my experiments, but I'm confused about the maximum sequence length (l_max). If my dataset consists of sequences all with a length of l_max, am I allowed to sample a subsequent sequence with a length (seq_len) less than l_max during training, or must I use sequences of the full length exclusively?
In addition, if I only sample a sequence length (seq_len) less than l_max when training, will it affect the results during inference when I use the step_function to infer a sequence with a length reaching l_max?

@albertfgu
Copy link
Contributor

The sequences should be padded to the max length if they're shorter. If you need to deal with generation and also have varying length sequences, I'd strongly recommend using the diagonal (S4D) versions.

@hai-h-nguyen
Copy link

Hi @albertfgu , I am trying to use S4 in POMDP RL. During training, I sample sub-episodes with a fixed length 64 to train S4. The whole episode, however, can last up to 1000 timesteps, and I currently set l_max = 1000. I have to do that because when I sample whole episodes (padded to have the same length of 1000) for training, the training will be very slow. Do you have any suggestions in this case? Thank you!

@albertfgu
Copy link
Contributor

I'm not sure I understand the question. If during training you only see sequences of length 64, why not set l_max=64? Or if using the diagonal version, you don't need to set l_max at all.

@hai-h-nguyen
Copy link

Hi, I forgot to mention that setting l-max 64 got bad performance.

@albertfgu
Copy link
Contributor

I can't help diagnose without much more information about the training setup and model. As a first step, I would use the diagonal variant and don't set l_max

@vahuynh12
Copy link

vahuynh12 commented Apr 9, 2024

Hello, I am very new to working with audio data and had some confusion about the parameters.

I am trying to train Sashimi for music generation. In sashimi.py, there is a comment that d_model is 64 for all the Sashimi experiments. Looking at the file and trying the example code in Sashimi's README, it seems that d_input == d_model == d_output (i.e., all are 64)? What does it mean for d_input and d_output to be 64 for audio data (e.g., beethoven dataset)? When I load my audio dataset using torchaudio, an audio sample has shape of, for example, 1 x 128000 so I was thinking that meant the audio data has one dimension.

My apologies for the basic questions. Thank you for your help!

@albertfgu
Copy link
Contributor

The audio signal have a dimension of $1$, but the first step in the usual sequence model pipeline is that it gets projected by an encoder (in this case, you'd want to use a simple nn.Linear projection) to the model dimension (d_model=64 in this case).

@vahuynh12
Copy link

vahuynh12 commented Apr 9, 2024

Thank you, that makes sense! So for a custom sequence model, I would have an encoder to project my audio data to d_model dimensions, then feed that to a Sashimi layer(s), then use a decoder (e..g, another nn.Linear) to project the Sashimi output to 1 dimension as the final output audio sequence?

@albertfgu
Copy link
Contributor

That's right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants