Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning loss problem & predict procedure #27

Open
techzzt opened this issue Nov 1, 2022 · 6 comments
Open

Learning loss problem & predict procedure #27

techzzt opened this issue Nov 1, 2022 · 6 comments

Comments

@techzzt
Copy link

techzzt commented Nov 1, 2022

Hi George, When perform "train models from scratch" with my dataset, rmse loss turn to nan.
I want to check normalizer for my dataset, I wonder how.
If normalization is performed, I wonder if the normalized value is predicted even in the mask section.

Also, I want to check finetuning model structer, but I cannot find. Can you tell me how?
I wonder if the mask is applied to the test set even when finetuned.

Thank you

@gzerveas
Copy link
Owner

gzerveas commented Nov 2, 2022

Hi, normalization is done before applying any masks, and it can be done either on a "dataset-wide" basis (default) or a "per_sample" basis (by choosing --normalization per_sample_std or per_sample_minmax). You can check what are the extracted normalizing values within the Normalizer object, and what is the effect of the normalizer on your dataset dataframe, right after line 130 in main.py . If you think that standardization (subtracting the mean and dividing by the standard deviation) is the problem, you can use minmax normalization or per_sample normalization. However, first check that your initial dataframe does not contain any NaN and other problematic values, and if need be, you can exclude the problematic data points or use interpolation.

I am not sure what you want to find out about fine-tuning. There is no mask applied when fine-tuning (for regression and classification), either on training or test data. Masking is done by the ImputationDataset or TransductionDataset (see here) only during self-supervised pre-training, i.e. when --task imputation or transduction respectively.

@techzzt
Copy link
Author

techzzt commented Nov 3, 2022

Hi George, Thanks for your reply.
Thanks to you, my doubts have been resolved.

I also have additional questions, please comment.

When imputation is performed with the normalized data, I wonder if the normalized value of the mask area will come out as a predicted value.

Also, DummyTSTransformerEncoder is not defined in the ts_transformer.py part of the code, but I wonder if it does not function separately.

Thank you very much for your reply.

@gzerveas
Copy link
Owner

gzerveas commented Nov 3, 2022

The DummyTSTransformerEncoder was a linear model, using the same interface as the TSTransformerEncoder, meant to test that end-to-end results make sense. I removed it from the ts_transformer.py, but I haven't removed it from the model selection code -- partially because I initially forgot about it, partially because I thought it may serve as an example of how to add another model. You can completely ignore it.

Regarding your first question: if I correctly understand what you are asking yes, the predicted values will belong to the normalized distribution. During training and validation, they are compared with the normalized unmasked values available from the original data, so everything works. However, during inference, this means that you would have to de-normalize the predictions.

So, imagine that you want to use this code specifically for missing values imputation. You would first train it following the normal pre-training process, which means that the code will take care of normalization for you (we assume standardization here, which is the default). After this, you will have a trained model (and stored normalization values, i.e. mean and std). During an actual inference application (i.e. using --test_only mode), the data samples would be normalized by these stored values. However, you have to make sure that:

  1. You create the boolean masks marking the actual missing values yourself (this can be done e.g. by finding the indices where values are NaN, or whatever unambiguously marks values as missing) and provide those masks in line 35 of the ImputationDataset, instead of using noise_mask to stochastically generate masks which emulate missingness. You can, for example, modify ImputationDataset with an extra argument to switch to an "inference" mode, which would use your own masks instead of noise_mask.
  2. The values you consider missing (e.g. NaN) in the data should be replaced by the same value used during training for masking (by default, this is 0). Right now, this is done inside the collate function. I am mentioning this in case you optionally want to experiment with replacement values other than 0.

Finally, the model's predictions at the indices marked by the masks would have to be de-standardized (multiply by the stored std, and then add the mean). So, if you want to merge the newly predicted values with your original data, you can do so by replacing the missing values (using the boolean mask indices) with the predictions, and then de-standardizing the entire test dataframe (which has been previously normalized here). Or, separately de-standardize the predicted values, and merge them with the original test data.

I hope this helps!

@techzzt
Copy link
Author

techzzt commented Nov 6, 2022

Your reply was very helpful. I am currently studying time series presentation learning, and I was able to understand it thanks to your answer.

Lastly, I have a question while reading the paper, I have an additional question. In Figure 5 of the paper, you visualized the mask input and prediction, and I wonder if it is the prediction result for the missing value or the prediction result for the mask section.

There are similar questions in other issues, but I'm asking you because there are some difficulties to understand. I also want to check the difference between the predicted value and the actual value, but in Figure 5, I wonder if the values of some areas were extracted separately and used. I wonder if this is also defined in the Analyzer class at analysis.py.

Thank you for writing a good paper.

@gzerveas
Copy link
Owner

Your reply was very helpful. I am currently studying time series presentation learning, and I was able to understand it thanks to your answer.

Lastly, I have a question while reading the paper, I have an additional question. In Figure 5 of the paper, you visualized the mask input and prediction, and I wonder if it is the prediction result for the missing value or the prediction result for the mask section.

There are similar questions in other issues, but I'm asking you because there are some difficulties to understand. I also want to check the difference between the predicted value and the actual value, but in Figure 5, I wonder if the values of some areas were extracted separately and used. I wonder if this is also defined in the Analyzer class at analysis.py.

Thank you for writing a good paper.

Assuming I understand correctly what you are asking: the dataset I am using does not naturally have missing values. To emulate "missing" values, I use the standard stochastic masking described in the paper (the same one used in the standard task of imputation). After training the model through the standard task of imputation, the model is evaluated for the same task on some validation/test set. As usual, the input is masked at various locations and variables, and the orange dots are the model's prediction for the masked locations. You can compare the predicted values with the original values at masked locations using something like MSE. The Analyzer class is useful for classification tasks.

@techzzt
Copy link
Author

techzzt commented Nov 29, 2022

Thank you for your reply. Thanks to the code organized in Github, I was able to understand a lot about the paper you wrote. Also, I am very interested in presentation learning, and thank you for opening this source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants