Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correctly call predict for OLS in CUPAC #12

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mnicky
Copy link

@mnicky mnicky commented Jun 5, 2024

OLS predict() should be called on the fitted model.

At least in my environment, the original version ends with an exception:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[59], line 1
----> 1 ab_test_cupac = ABTest(abtest, ab_params).cupac()

File /opt/conda/envs/py311/lib/python3.11/site-packages/abacus/auto_ab/abtest.py:668, in ABTest.cupac(self)
    666 self.__check_required_metric_type("cupac")
    667 self.__check_required_columns(self.__dataset, "cupac")
--> 668 result_df = VarianceReduction.cupac(
    669     self.__dataset,
    670     target_prev_col=self.params.data_params.target_prev,
    671     target_now_col=self.params.data_params.target,
    672     factors_prev_cols=self.params.data_params.predictors_prev,
    673     factors_now_cols=self.params.data_params.predictors_now,
    674     groups_col=self.params.data_params.group_col,
    675 )
    677 params_new = copy.deepcopy(self.params)
    678 params_new.data_params.control = self.__get_group(
    679     self.params.data_params.control_name, result_df
    680 )

File /opt/conda/envs/py311/lib/python3.11/site-packages/abacus/auto_ab/variance_reduction.py:98, in VarianceReduction.cupac(cls, x, target_prev_col, target_now_col, factors_prev_cols, factors_now_cols, groups_col)
     79 """Perform CUPED on target variable with covariate calculated
     80 as a prediction from a linear regression model.
     81 
   (...)
     93     pandas.DataFrame: Pandas DataFrame with additional columns: target_pred and target_now_cuped
     94 """
     95 x = cls._target_encoding(
     96     x, list(set(factors_prev_cols + factors_now_cols)), target_prev_col
     97 )
---> 98 x.loc[:, "target_pred"] = cls._predict_target(
     99     x, target_prev_col, factors_prev_cols, factors_now_cols
    100 )
    101 x_new = cls.cuped(x, target_now_col, groups_col, "target_pred")
    102 return x_new

Cell In[58], line 30, in predict_target(x, target_prev_col, factors_prev_cols, factors_now_cols)
     27 print(results.summary())
     28 x_predict = x[factors_now_cols]
---> 30 return model.predict(x_predict)

File /opt/conda/envs/py311/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:411, in RegressionModel.predict(self, params, exog)
    408 if exog is None:
    409     exog = self.exog
--> 411 return np.dot(exog, params)

File <__array_function__ internals>:200, in dot(*args, **kwargs)

ValueError: shapes (2400372,5) and (2400372,5) not aligned: 5 (dim 1) != 2400372 (dim 0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant