Cure Model - fitting p (cure probability) #1580

robchadil · 2023-11-23T17:07:37Z

robchadil
Nov 23, 2023

I am trying to understand how p is determined in the Cure model example. For the data I am using it seems to be very slightly larger than 1 - proportion of subjects that have experienced the event to date. This is resulting in a survival curve that almost immediately approaches its asymptote (p) despite just prior being a continually decreasing function and a significant proportion of the data still censored. The censored data has a large probability of cure. If there are some resources you could point to, to help guide me it would be much appreciated.

robchadil · 2023-12-15T16:36:34Z

robchadil
Dec 15, 2023
Author

Following up on this, anyone have any suggestions on resources?

0 replies

CamDavidsonPilon · 2023-12-30T21:20:44Z

CamDavidsonPilon
Dec 30, 2023
Maintainer

Hi @robchadil

At first glance, this looks reasonable to me. Do you have very sparse data with T > 50? If so, this is probably the best the model can do, and a reasonable solution. The model is suggesting "yup people die, but then I don't really see them die after 50, so they must be cured. I'll fit a good weibull model pre T=50, and then cure everyone after that."

If you have a prior on what you think, you can tweak the bounds that p can be, but this might lead to it still choosing the bound nearest to 0.8359.

I guess I'm pushing back: is there a reason you think this feels wrong?

1 reply

robchadil Dec 31, 2023
Author

Thank you for the reply @CamDavidsonPilon and happy new year. What I do have is a pile of priors in the form of previous, more mature months of data.

Zooming in on on just two of these months the parametric weibull fit for the more mature (2023-08-01) data is quite good. But I need to apply this same fit to less mature data, with the understanding that the cure value of p will be somewhere in the ball park of earlier months. And if the to date S(t) is lower (or higher) for the most recent cohort compared to the earlier at the same point in time then the final p value should be adjust accordingly.

So I could (and have fiddled with) enter my own bounds for p but in reality I need to create a parametric fit for a couple thousand permutations of a few covariates I am using, and obviously I would prefer a regression approach to this. However, following this example, the problem persists, in that if I use 'month' as a covariate to model p I get nearly the same output as the screen shot above where the fitted cure value (p or c depending on the example) is far too high and if i do not use month as a covariate ('beta_': '1'), all values of p converge to the same value, when again I would like to model that if S(t) is lagging or leading earlier months for the same time then the final value of p will adjust accordingly.

To me it seems that the amount of censored records is not being accounted for in the way p is fit and that would help form a more reliable estimate of p.

Thank you for your time, Rob

CamDavidsonPilon · 2024-01-03T20:55:10Z

CamDavidsonPilon
Jan 3, 2024
Maintainer

The calculus is going to give the model with the lowest log-likelihood, which isn't always the correct model. Not unlike programming, statistics requires precision (and little to no ambiguity) in the inference, else we may get wrong results. Since you have a prior in mind, we should specify this. Here's one way:

We'll reassign the _negative_log_likelihood_right_censoring function to include a prior / penalizer.

from autograd import numpy as np
from lifelines.fitters import ParametricUnivariateFitter

class CureFitter(ParametricUnivariateFitter):

    PENALIZER = 1.0
    _fitted_parameter_names = ["p_", "lambda_", "rho_"]

    _bounds = ((0, 1), (0, None), (0, None))

    def _cumulative_hazard(self, params, T):
        p, lambda_, rho_ = params
        sf = np.exp(-(T / lambda_) ** rho_)
        return -np.log(p + (1-p) * sf)

    def _negative_log_likelihood_right_censoring(self, params, Ts, E, entry, weights) -> float:
        T = Ts[0]
        non_zero_entries = entry > 0

        log_hz = self._log_hazard(params, T[E])
        cum_haz = self._cumulative_hazard(params, T)

        ll = (weights[E] * log_hz).sum() - (weights * cum_haz).sum()
        ll = ll + (weights[non_zero_entries] * self._cumulative_hazard(params, entry[non_zero_entries])).sum()
        return -ll / weights.sum() + self.PENALIZER * (params[0] - 0.2) ** 2

I've added the + self.PENALIZER * (params[0] - 0.2) ** 2 to the end of that function. This will try to push the value of p to near 0.2. You can tweak the PENALIZER to balance.

I haven't tested this, but I'm curious if the results are more expected or not.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cure Model - fitting p (cure probability) #1580

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Cure Model - fitting p (cure probability) #1580

robchadil Nov 23, 2023

Replies: 3 comments · 1 reply

robchadil Dec 15, 2023 Author

CamDavidsonPilon Dec 30, 2023 Maintainer

robchadil Dec 31, 2023 Author

CamDavidsonPilon Jan 3, 2024 Maintainer

robchadil
Nov 23, 2023

Replies: 3 comments 1 reply

robchadil
Dec 15, 2023
Author

CamDavidsonPilon
Dec 30, 2023
Maintainer

robchadil Dec 31, 2023
Author

CamDavidsonPilon
Jan 3, 2024
Maintainer