Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Isolation Forest Training support in cuML #6096

Open
singhmanas1 opened this issue Oct 3, 2024 · 0 comments
Open

[FEA] Isolation Forest Training support in cuML #6096

singhmanas1 opened this issue Oct 3, 2024 · 0 comments
Labels
? - Needs Triage Need team to review and classify feature request New feature or request

Comments

@singhmanas1
Copy link

singhmanas1 commented Oct 3, 2024

Is your feature request related to a problem? Please describe.
Isolation Forest (IF) is a popular unsupervised anomaly detection method used to identify fraud. Ex. Banks and Retail companies use IF to determine zero day threats i.e new patterns in threats which supervised algorithms like XGBoost and GNN are unable to determine because of class imbalance or other issues.

While cuML supports inferencing on scikit-learn's IF model via ForestInference Library (experimental feature) (Issue #3838), it would be great to have IF model training implemented in cuML similar to the implementation of Isolation Forest in scikit-learn

Describe the solution you'd like
Something like below -

from cuml.ensemble import IsolationForest
X = [[-1.1], [0.3], [0.5], [100]]
clf = IsolationForest(random_state=0).fit(X)
clf.predict([[0.1], [0], [90]])

Implementation Details
The following needs to be implemented and tested in cuML to enable IF-

  1. Splitting the decision tree randomly while building the trees via NodeSplitKernel
  2. Implementation for calculating path length to detect anomalies similar to scikit-learn implementation HERE
  3. Testing whether data quantization into bins would affect the performance of IsolationForest.

@vinaydes @dantegd @beckernick @hcho3

@singhmanas1 singhmanas1 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Oct 3, 2024
@dantegd dantegd changed the title [FEA] Isolation Forest Support in cuML [FEA] Isolation Forest Training support in cuML Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant