Replies: 3 comments 1 reply
-
Following up on this, anyone have any suggestions on resources? |
Beta Was this translation helpful? Give feedback.
-
Hi @robchadil At first glance, this looks reasonable to me. Do you have very sparse data with T > 50? If so, this is probably the best the model can do, and a reasonable solution. The model is suggesting "yup people die, but then I don't really see them die after 50, so they must be cured. I'll fit a good weibull model pre T=50, and then cure everyone after that." If you have a prior on what you think, you can tweak the bounds that I guess I'm pushing back: is there a reason you think this feels wrong? |
Beta Was this translation helpful? Give feedback.
-
The calculus is going to give the model with the lowest log-likelihood, which isn't always the correct model. Not unlike programming, statistics requires precision (and little to no ambiguity) in the inference, else we may get wrong results. Since you have a prior in mind, we should specify this. Here's one way: We'll reassign the from autograd import numpy as np
from lifelines.fitters import ParametricUnivariateFitter
class CureFitter(ParametricUnivariateFitter):
PENALIZER = 1.0
_fitted_parameter_names = ["p_", "lambda_", "rho_"]
_bounds = ((0, 1), (0, None), (0, None))
def _cumulative_hazard(self, params, T):
p, lambda_, rho_ = params
sf = np.exp(-(T / lambda_) ** rho_)
return -np.log(p + (1-p) * sf)
def _negative_log_likelihood_right_censoring(self, params, Ts, E, entry, weights) -> float:
T = Ts[0]
non_zero_entries = entry > 0
log_hz = self._log_hazard(params, T[E])
cum_haz = self._cumulative_hazard(params, T)
ll = (weights[E] * log_hz).sum() - (weights * cum_haz).sum()
ll = ll + (weights[non_zero_entries] * self._cumulative_hazard(params, entry[non_zero_entries])).sum()
return -ll / weights.sum() + self.PENALIZER * (params[0] - 0.2) ** 2 I've added the I haven't tested this, but I'm curious if the results are more expected or not. |
Beta Was this translation helpful? Give feedback.
-
I am trying to understand how p is determined in the Cure model example. For the data I am using it seems to be very slightly larger than 1 - proportion of subjects that have experienced the event to date. This is resulting in a survival curve that almost immediately approaches its asymptote (p) despite just prior being a continually decreasing function and a significant proportion of the data still censored. The censored data has a large probability of cure. If there are some resources you could point to, to help guide me it would be much appreciated.
Beta Was this translation helpful? Give feedback.
All reactions