Handling Hyper-Param Optimization for multiple unique_ids in Auto_Neural_Forecast #1103
Unanswered
masadshoaib
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When I have train_data for two unique_ids, how to specify that to Auto_DilatedRNN?
My train_data, as shared above, has three columns, unique_id, ds, and y. But now instead of a single unique_id (test-0), I have two (test-0, test-1).
Do I need to specify unique_ids somehow when calling the Auto_model?
If I do hyper-param optimization on the train_data using Auto_DilatedRNN having single unique_id, I get an error lower than the non-hyper-optimized model (i.e., the regular DilatedRNN). And this is expected (and the goal of hyper-param optimization).
However, when I call the Auto_model (i.e., AutoDilatedRNN) on the train dataset having two unique_id, I end up with a higher error than if I just called the non-optimized model (i.e. regular DilatedRNN) for one unique_id and the opposite for the other.
Can you please guide me about this? Many thanks!
Here's the code snippet (should you need to run it):
Imports and Installs
!pip install nixtla
!pip install NeuralForecast
from neuralforecast import NeuralForecast
from neuralforecast.auto import DilatedRNN, AutoDilatedRNN
from neuralforecast.losses.pytorch import RMSE, MAE
import pandas as pd
from typing import List, Tuple, Dict
from sklearn.metrics import mean_absolute_error, mean_squared_error
Setting up the dataframes
train_data2 = pd.DataFrame({
'unique_id': ['test-0'] * 60 + ['test-1'] * 60, # 60 elements
'ds': pd.date_range(start='2018-01-01', periods=60, freq='MS').tolist() + pd.date_range(start='2018-01-01', periods=60, freq='MS').tolist(), # 60 elements each for 'test-0' and 'test-1'
'y': [63202.0, 103224.5, 111117.98, 114329.0, 108873.25, 54203.02, 86218.1202, 93635.24, 128048.0, 124754.46, 131659.00999999998, 121901.49, 89554.36009999999, 138511.0, 170329.0, 143534.47999999998, 77849.5001, 76367.70999999998, 88739.4099, 122639.92, 139260.0, 158036.0001, 159479.5, 53163.895500000006, 104816.511, 161734.01, 229404.3072, 219401.8639, 74706.5978, 120536.01999999996, 108064.7785, 107818.415, 153914.315, 129599.932, 117563.8949, 62912.5835, 174501.56300000002, 191323.105, 226644.8306, 149521.4569, 39708.3959, 82081.96659999999, 51392.1871, 107311.5811, 110006.954, 121564.97430000002, 112380.19569999998, 89952.5831, 142210.4084, 172362.2267, 163108.3762, 118614.05010000002, 51305.5352, 84403.1251, 110113.65610000002, 128331.4126, 123344.89500000002, 145871.16280000002, 118498.74099999998, 81374.17670000001, 48710.0, 50648.33, 62477.0, 63663.0, 53089.0, 32879.0, 29453.0, 27483.0, 57825.0, 71060.0, 59058.0, 55709.0, 41270.79, 66417.0, 81619.0, 108195.0001, 20043.0, 50103.0, 34913.0, 50942.0, 62446.0, 53002.0, 93837.0, 28171.0, 54429.0, 81507.9167, 100153.833, 122369.0, 41400.0, 54521.583, 36117.2503, 45895.25, 78801.25, 60997.25, 56257.5834, 47694.8333, 68566.12520000001, 78821.209, 125403.4159, 54204.4999, 10475.6249, 75008.7077, 41501.1667, 41225.0, 35159.75, 77580.9584, 55342.500100000005, 42327.5417, 88843.9168, 104285.0417, 50964.0001, 42008.3333, 72844.169, 73522.5421, 69892.83350000001, 82048.9583, 106956.3334, 105103.0416, 100294.2511, 60981.7502]
})
actual_data = pd.DataFrame({
'unique_id': ['test-0'] * 12 + ['test-1']*12,
'ds': pd.date_range(start='2023-01-01', periods=12, freq='MS').tolist() + pd.date_range(start='2023-01-01', periods=12, freq='MS').tolist(),
'Actual_Qty': [177578.9375, 263270.37490000005, 130901.2323, 46545.7235, 79886.51950000001, 57278.882, 81313.90830000001, 114359.66060000002, 66363.0761, 151170.48500000002, 107519.225, 114884.1, 214928.6671, 24430.334000000003, 77216.0906, 59655.919, 58646.174900000005, 66150.0943, 56996.2968, 85112.5022, 67122.20880000001, 65644.544, 85356.415, 84726.014]
})
Default Dilated RNN without hyperparameter optimization
horizon = 12
max_steps = 1000
models = [
DilatedRNN(h=horizon, max_steps=max_steps),
]
nf = NeuralForecast(models=models, freq='MS')
nf.fit(df=train_data2, val_size = 12)
fcst_df = nf.predict()
fcst_df = fcst_df.reset_index()
AutoDilated RNN (Hyper-optimization) with Optuna backend
nf4 = NeuralForecast(
models=[
AutoDilatedRNN(h=horizon, num_samples = 25, backend = 'optuna'),
],
freq='MS'
)
nf4.fit(train_data2, val_size = 12)
fcst_df_auto = nf4.predict()
fcst_df_auto = fcst_df_auto.reset_index()
Merging the two forecasts as well as the actual data
final_df3 = fcst_df_auto.merge(fcst_df, on = ['unique_id','ds'])
final_df3 = final_df3.merge(actual_data, on = ['unique_id', 'ds'])
Result
print('DilatedRNN RMSE')
for id in final_df3['unique_id'].unique():
subset = final_df3[final_df3['unique_id'] == id]
print(id, mean_squared_error(subset['Actual_Qty'], subset['DilatedRNN'])**0.5)
print('AutoDilatedRNN RMSE')
for id in final_df3['unique_id'].unique():
subset = final_df3[final_df3['unique_id'] == id]
print(id, mean_squared_error(subset['Actual_Qty'], subset['AutoDilatedRNN'])**0.5)
Beta Was this translation helpful? Give feedback.
All reactions