Handling Hyper-Param Optimization for multiple unique_ids in Auto_Neural_Forecast #1103

masadshoaib · 2024-08-13T07:23:13Z

masadshoaib
Aug 13, 2024

When I have train_data for two unique_ids, how to specify that to Auto_DilatedRNN?

My train_data, as shared above, has three columns, unique_id, ds, and y. But now instead of a single unique_id (test-0), I have two (test-0, test-1).

Do I need to specify unique_ids somehow when calling the Auto_model?

If I do hyper-param optimization on the train_data using Auto_DilatedRNN having single unique_id, I get an error lower than the non-hyper-optimized model (i.e., the regular DilatedRNN). And this is expected (and the goal of hyper-param optimization).

However, when I call the Auto_model (i.e., AutoDilatedRNN) on the train dataset having two unique_id, I end up with a higher error than if I just called the non-optimized model (i.e. regular DilatedRNN) for one unique_id and the opposite for the other.

Can you please guide me about this? Many thanks!

Here's the code snippet (should you need to run it):

Imports and Installs

!pip install nixtla
!pip install NeuralForecast

from neuralforecast import NeuralForecast
from neuralforecast.auto import DilatedRNN, AutoDilatedRNN
from neuralforecast.losses.pytorch import RMSE, MAE
import pandas as pd

from typing import List, Tuple, Dict

from sklearn.metrics import mean_absolute_error, mean_squared_error

Setting up the dataframes

train_data2 = pd.DataFrame({
'unique_id': ['test-0'] * 60 + ['test-1'] * 60, # 60 elements
'ds': pd.date_range(start='2018-01-01', periods=60, freq='MS').tolist() + pd.date_range(start='2018-01-01', periods=60, freq='MS').tolist(), # 60 elements each for 'test-0' and 'test-1'
'y': [63202.0, 103224.5, 111117.98, 114329.0, 108873.25, 54203.02, 86218.1202, 93635.24, 128048.0, 124754.46, 131659.00999999998, 121901.49, 89554.36009999999, 138511.0, 170329.0, 143534.47999999998, 77849.5001, 76367.70999999998, 88739.4099, 122639.92, 139260.0, 158036.0001, 159479.5, 53163.895500000006, 104816.511, 161734.01, 229404.3072, 219401.8639, 74706.5978, 120536.01999999996, 108064.7785, 107818.415, 153914.315, 129599.932, 117563.8949, 62912.5835, 174501.56300000002, 191323.105, 226644.8306, 149521.4569, 39708.3959, 82081.96659999999, 51392.1871, 107311.5811, 110006.954, 121564.97430000002, 112380.19569999998, 89952.5831, 142210.4084, 172362.2267, 163108.3762, 118614.05010000002, 51305.5352, 84403.1251, 110113.65610000002, 128331.4126, 123344.89500000002, 145871.16280000002, 118498.74099999998, 81374.17670000001, 48710.0, 50648.33, 62477.0, 63663.0, 53089.0, 32879.0, 29453.0, 27483.0, 57825.0, 71060.0, 59058.0, 55709.0, 41270.79, 66417.0, 81619.0, 108195.0001, 20043.0, 50103.0, 34913.0, 50942.0, 62446.0, 53002.0, 93837.0, 28171.0, 54429.0, 81507.9167, 100153.833, 122369.0, 41400.0, 54521.583, 36117.2503, 45895.25, 78801.25, 60997.25, 56257.5834, 47694.8333, 68566.12520000001, 78821.209, 125403.4159, 54204.4999, 10475.6249, 75008.7077, 41501.1667, 41225.0, 35159.75, 77580.9584, 55342.500100000005, 42327.5417, 88843.9168, 104285.0417, 50964.0001, 42008.3333, 72844.169, 73522.5421, 69892.83350000001, 82048.9583, 106956.3334, 105103.0416, 100294.2511, 60981.7502]
})

actual_data = pd.DataFrame({
'unique_id': ['test-0'] * 12 + ['test-1']*12,
'ds': pd.date_range(start='2023-01-01', periods=12, freq='MS').tolist() + pd.date_range(start='2023-01-01', periods=12, freq='MS').tolist(),
'Actual_Qty': [177578.9375, 263270.37490000005, 130901.2323, 46545.7235, 79886.51950000001, 57278.882, 81313.90830000001, 114359.66060000002, 66363.0761, 151170.48500000002, 107519.225, 114884.1, 214928.6671, 24430.334000000003, 77216.0906, 59655.919, 58646.174900000005, 66150.0943, 56996.2968, 85112.5022, 67122.20880000001, 65644.544, 85356.415, 84726.014]
})

Default Dilated RNN without hyperparameter optimization

horizon = 12
max_steps = 1000

models = [
DilatedRNN(h=horizon, max_steps=max_steps),
]
nf = NeuralForecast(models=models, freq='MS')

nf.fit(df=train_data2, val_size = 12)

fcst_df = nf.predict()
fcst_df = fcst_df.reset_index()

AutoDilated RNN (Hyper-optimization) with Optuna backend

nf4 = NeuralForecast(
models=[
AutoDilatedRNN(h=horizon, num_samples = 25, backend = 'optuna'),
],
freq='MS'
)

nf4.fit(train_data2, val_size = 12)

fcst_df_auto = nf4.predict()
fcst_df_auto = fcst_df_auto.reset_index()

Merging the two forecasts as well as the actual data

final_df3 = fcst_df_auto.merge(fcst_df, on = ['unique_id','ds'])
final_df3 = final_df3.merge(actual_data, on = ['unique_id', 'ds'])

Result

print('DilatedRNN RMSE')
for id in final_df3['unique_id'].unique():
subset = final_df3[final_df3['unique_id'] == id]
print(id, mean_squared_error(subset['Actual_Qty'], subset['DilatedRNN'])**0.5)

print('AutoDilatedRNN RMSE')
for id in final_df3['unique_id'].unique():
subset = final_df3[final_df3['unique_id'] == id]
print(id, mean_squared_error(subset['Actual_Qty'], subset['AutoDilatedRNN'])**0.5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling Hyper-Param Optimization for multiple unique_ids in Auto_Neural_Forecast #1103

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Handling Hyper-Param Optimization for multiple unique_ids in Auto_Neural_Forecast #1103

masadshoaib Aug 13, 2024

Imports and Installs

Setting up the dataframes

Default Dilated RNN without hyperparameter optimization

AutoDilated RNN (Hyper-optimization) with Optuna backend

Merging the two forecasts as well as the actual data

Result

Replies: 0 comments

masadshoaib
Aug 13, 2024