Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnboundLocalError: local variable "conditional_cols" referenced before assignment when model trained with argument missing_imputation=False and there are seasonalities #1619

Open
frlm opened this issue Jul 29, 2024 · 0 comments
Labels
type:bug Something isn't working

Comments

@frlm
Copy link

frlm commented Jul 29, 2024

def _handle_missing_data(

We observed a bug when we try to predict a future dataframe using a trained model with missing_imputation = False and seasonalities set. The variable conditional_cols is used inside an if statement when dropped_trailing_y and predicting are true, but this variable is defined only if config_missing.impute_missing is True

    if config_missing.impute_missing:
        # impute missing values
        data_columns = []
        if n_lags > 0:
            data_columns.append("y")
        if config_lagged_regressors is not None:
            data_columns.extend(config_lagged_regressors.keys())
        if config_regressors is not None and config_regressors.regressors is not None:
            data_columns.extend(config_regressors.regressors.keys())
        if config_events is not None:
            data_columns.extend(config_events.keys())
        conditional_cols = []
        if config_seasonality is not None:
            conditional_cols = list(
                set(
                    [
                        value.condition_name
                        for key, value in config_seasonality.periods.items()
                        if value.condition_name is not None
                    ]
                )
            )
            data_columns.extend(conditional_cols)
        for column in data_columns:
            sum_na = df[column].isna().sum()
            if sum_na > 0:
                log.warning(f"{sum_na} missing values in column {column} were detected in total. ")
                # use 0 substitution for holidays and events missing values
                if config_events is not None and column in config_events.keys():
                    df[column].fillna(0, inplace=True)
                    remaining_na = 0
                else:
                    df.loc[:, column], remaining_na = df_utils.fill_linear_then_rolling_avg(
                        df[column],
                        limit_linear=config_missing.impute_linear,
                        rolling=config_missing.impute_rolling,
                    )
                log.info(f"{sum_na - remaining_na} NaN values in column {column} were auto-imputed.")
                if remaining_na > 0:
                    log.warning(
                        f"More than {2 * config_missing.impute_linear + config_missing.impute_rolling} consecutive \
                            missing values encountered in column {column}. "
                        f"{remaining_na} NA remain after auto-imputation. "
                    )
    _if dropped_trailing_y and predicting:
        # add trailing y values again if in predict mode
        df = pd.concat([df, df_to_add])
        if config_seasonality is not None and len(conditional_cols) > 0:
            df[conditional_cols] = df[conditional_cols].ffill()  # type: ignore_

@ourownstory ourownstory added the type:bug Something isn't working label Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants