The Random Survival Forest package provides a python implementation of the survival prediction method originally published by Ishwaran et al. (2008).
Reference: Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. The annals of applied statistics, 2(3), 841-860.
$ pip install random-survival-forest
This implemention is not optimized for being highly performant. It is programmed in pure python. If you have large datasets (large sample size) or use a very high number of trees, I suggest using the scikit-survival package.
import time
from lifelines import datasets
from sklearn.model_selection import train_test_split
from random_survival_forest.models import RandomSurvivalForest
from random_survival_forest.scoring import concordance_index
rossi = datasets.load_rossi()
# Attention: duration column (time until event occurs) must be index 1, event column index 0 in y
y = rossi.loc[:, ["arrest", "week"]]
X = rossi.drop(["arrest", "week"], axis=1)
X, X_test, y, y_test = train_test_split(X, y, test_size=0.33, random_state=10)
print("Start training...")
start_time = time.time()
rsf = RandomSurvivalForest(n_estimators=10, n_jobs=-1, random_state=10)
rsf = rsf.fit(X, y)
print(f'--- {round(time.time() - start_time, 3)} seconds ---')
y_pred = rsf.predict(X_test)
c_val = concordance_index(y_time=y_test["week"], y_pred=y_pred, y_event=y_test["arrest"])
print(f'C-index {round(c_val, 3)}')
If you are having issues or feedback, please let me know. I am happy to fix some bug or implement feature requests.
This package is open-source. If it helped you or you even use it comercially, I would be happy about a little support:
MIT