[QUESTION] XGBModel fit method in Darts raises "sample_weight_eval_set's length does not equal eval_set's length" error when using val_sample_weight #2579

YutakaJX · 2024-11-03T00:17:08Z

Describe the issue linked to the documentation

I'm encountering an issue with Darts' XGBModel when trying to fit it with val_sample_weight provided.

Specifically, when using val_sample_weight, I receive a ValueError indicating that "sample_weight_eval_set's length does not equal eval_set's length", even though the shapes and lengths of the training and validation datasets appear to match.

Based on the Darts documentation for XGBModel, the fit method accepts series, past_covariates, sample_weight, val_series, val_past_covariates, and val_sample_weight, among others. While sample_weight works fine, adding val_sample_weight consistently raises an error.

Here’s a minimal code snippet that reproduces the issue (X is pd.DataFrame that contains covariates, targets and weights. So, splitting first into X_cov, y and w):

y = X.filter(regex='^Targets_')
w = X.filter(regex='^Weights_')
X_cov = X.filter(regex='^Covariates_')
y_train, y_val = train_test_split(y, test_size=0.4, shuffle=False)
w_train, w_val = train_test_split(w, test_size=0.4, shuffle=False)
X_cov_train, X_cov_val = train_test_split(X_cov, test_size=0.4, shuffle=False)

y_train_ts = TimeSeries.from_times_and_values(y_train.index, y_train.values, columns=y_train.columns)
w_train_ts = TimeSeries.from_times_and_values(w_train.index, w_train.values, columns=w_train.columns)
X_cov_train_ts = TimeSeries.from_times_and_values(X_cov_train.index, X_cov_train.values, columns=X_cov_train.columns)
y_val_ts = TimeSeries.from_times_and_values(y_val.index, y_val.values, columns=y_val.columns)
w_val_ts = TimeSeries.from_times_and_values(w_val.index, w_val.values, columns=w_val.columns)
X_cov_val_ts = TimeSeries.from_times_and_values(X_cov_val.index, X_cov_val.values, columns=X_cov_val.columns)

xgb_model = XGBModel(
    lags=16,
    lags_past_covariates=16,
    output_chunk_length=1,
    objective="binary:logistic",
    booster="gbtree",
)

xgb_model.fit(
    y_train_ts,
    past_covariates=X_cov_train_ts,
    val_series=y_val_ts, 
    val_past_covariates=X_cov_val_ts,
    sample_weight=w_train_ts,
    val_sample_weight=w_val_ts,
)

The above code produces the following error :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 15
     12 w_val_ts = TimeSeries.from_times_and_values(w_val.index, w_val.values, columns=w_val.columns)
     13 X_cov_val_ts = TimeSeries.from_times_and_values(X_cov_val.index, X_cov_val.values, columns=X_cov_val.columns)
---> 15 xgb_model.fit(
     16     y_train_ts,
     17     past_covariates=X_cov_train_ts,
     18     val_series=y_val_ts, 
     19     val_past_covariates=X_cov_val_ts,
     20     sample_weight=w_train_ts,
     21     val_sample_weight=w_val_ts,
     22 )

(... lines omitted for brevity ...)

File ~/miniforge3/envs/testdev/lib/python3.9/site-packages/xgboost/sklearn.py:548, in _wrap_evaluation_matrices(missing, X, y, group, qid, sample_weight, base_margin, feature_weights, eval_set, sample_weight_eval_set, base_margin_eval_set, eval_group, eval_qid, create_dmatrix, enable_categorical, feature_types)
    545     return meta
    547 if eval_set is not None:
--> 548     sample_weight_eval_set = validate_or_none(
    549         sample_weight_eval_set, "sample_weight_eval_set"
    550     )
    551     base_margin_eval_set = validate_or_none(
    552         base_margin_eval_set, "base_margin_eval_set"
    553     )
    554     eval_group = validate_or_none(eval_group, "eval_group")

File ~/miniforge3/envs/testdev/lib/python3.9/site-packages/xgboost/sklearn.py:541, in _wrap_evaluation_matrices.<locals>.validate_or_none(meta, name)
    539     return [None] * n_validation
    540 if len(meta) != n_validation:
--> 541     raise ValueError(
    542         f"{name}'s length does not equal `eval_set`'s length, "
    543         + f"expecting {n_validation}, got {len(meta)}"
    544     )
    545 return meta

ValueError: sample_weight_eval_set's length does not equal `eval_set`'s length, expecting 1, got 175310

Additional context
If I give None as validation datasets, there is no problem to run fit.

xgb_model.fit(
    y_train_ts,
    past_covariates=X_cov_train_ts,
    val_series=None, 
    val_past_covariates=None,
    sample_weight=w_train_ts,
    val_sample_weight=None
)

If I only set val_sample_weight to None, there is no problem to run fit. So, the problem seems to happen only when I set val_sample_weight to TimeSeries.

xgb_model.fit(
    y_train_ts,
    past_covariates=X_cov_train_ts,
    val_series=y_val_ts, 
    val_past_covariates=X_cov_val_ts,
    sample_weight=w_train_ts,
    val_sample_weight=None,
)

In addition, the same issue does not occur when using LightGBMModel in a similar setup, suggesting a difference in how val_sample_weight is handled for XGBModel versus LightGBMModel.

Am I missing something in the code? Any help to resolve this issue would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

yunakkano · 2024-11-04T08:36:02Z

I think XGBModel should override _add_val_set_to_kwargs.

XGBModel currently runs _add_val_set_to_kwargs from RegressionModel. However in this method, unlike the one overridden for LightGBMModel, val_weights is assigned val_weight directly without wrapping it in a list. (in the else block)

def _add_val_set_to_kwargs(
        self,
        kwargs: Dict,
        val_series: Sequence[TimeSeries],
        val_past_covariates: Optional[Sequence[TimeSeries]],
        val_future_covariates: Optional[Sequence[TimeSeries]],
        val_sample_weight: Optional[Union[Sequence[TimeSeries], str]],
        max_samples_per_ts: int,
    ) -> dict:
        
.....(omit).....

        else:
            val_sets = [(val_samples, val_labels)]
            val_weights = val_weight    <---------------------------------- Here


        val_set_name, val_weight_name = self.val_set_params
        return dict(kwargs, **{val_set_name: val_sets, val_weight_name: val_weights})

This means that if val_weight is TimeSeries, val_weights will also be TimeSeries, not a list containing it.

So, adding _add_val_set_to_kwargs like below to XGBModel class might resolve this issue. (At least on my side, this worked fine. But, I need someone to confirm the workaround does not affect the other part of the class.)

def _add_val_set_to_kwargs(
        self,
        kwargs: Dict,
        val_series: Sequence[TimeSeries],
        val_past_covariates: Optional[Sequence[TimeSeries]],
        val_future_covariates: Optional[Sequence[TimeSeries]],
        val_sample_weight: Optional[Union[Sequence[TimeSeries], str]],
        max_samples_per_ts: int,
    ) -> dict:
        """Creates a validation set and returns a new set of kwargs passed to `self.model.fit()` including the
        validation set. This method can be overridden if the model requires a different logic to add the eval set."""
        val_samples, val_labels, val_weight = self._create_lagged_data(
            series=val_series,
            past_covariates=val_past_covariates,
            future_covariates=val_future_covariates,
            max_samples_per_ts=max_samples_per_ts,
            sample_weight=val_sample_weight,
            last_static_covariates_shape=self._static_covariates_shape,
        )
        # create validation sets for MultiOutputRegressor
        if val_labels.ndim == 2 and isinstance(self.model, MultiOutputRegressor):
            val_sets, val_weights = [], []
            for i in range(val_labels.shape[1]):
                val_sets.append((val_samples, val_labels[:, i]))
                if val_weight is not None:
                    val_weights.append(val_weight[:, i])
            val_weights = val_weights or None
        else:
            val_sets = [(val_samples, val_labels)]
            val_weights = [val_weight]    <---------------------------------- Here


        val_set_name, val_weight_name = self.val_set_params
        return dict(kwargs, **{val_set_name: val_sets, val_weight_name: val_weights})

The above code is actually the same one used for LightGBMModel.

dennisbader · 2024-11-07T14:31:20Z

Thanks for reporting the issue @YutakaJX and also for the investigation @yunakkano. It looks indeed like a bug and I added it to our backlog. PR would always be welcome of course :)

YutakaJX added question Further information is requested triage Issue waiting for triaging labels Nov 3, 2024

dennisbader added bug Something isn't working good first issue Good for newcomers and removed question Further information is requested triage Issue waiting for triaging labels Nov 7, 2024

dennisbader added this to darts Nov 7, 2024

github-project-automation bot moved this to To do in darts Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] XGBModel fit method in Darts raises "sample_weight_eval_set's length does not equal eval_set's length" error when using val_sample_weight #2579

[QUESTION] XGBModel fit method in Darts raises "sample_weight_eval_set's length does not equal eval_set's length" error when using val_sample_weight #2579

YutakaJX commented Nov 3, 2024 •

edited

Loading

yunakkano commented Nov 4, 2024 •

edited

Loading

dennisbader commented Nov 7, 2024

[QUESTION] XGBModel fit method in Darts raises "sample_weight_eval_set's length does not equal eval_set's length" error when using val_sample_weight #2579

[QUESTION] XGBModel fit method in Darts raises "sample_weight_eval_set's length does not equal eval_set's length" error when using val_sample_weight #2579

Comments

YutakaJX commented Nov 3, 2024 • edited Loading

yunakkano commented Nov 4, 2024 • edited Loading

dennisbader commented Nov 7, 2024

YutakaJX commented Nov 3, 2024 •

edited

Loading

yunakkano commented Nov 4, 2024 •

edited

Loading