This package provides a scikit-learn transformer for feature selection using a quantum-classical hybrid solver.
This plugin makes use of a Leap™ quantum-classical hybrid solver. Developers can get started by signing up for the Leap quantum cloud service for free. Those seeking a more collaborative approach and assistance with building a production application can reach out to D-Wave directly and also explore the feature selection offering in AWS Marketplace.
The package's main class, SelectFromQuadraticModel
, can be used in any existing sklearn
pipeline.
For an introduction to hybrid methods for feature selection, see the Feature Selection for CQM.
A minimal example of using the plugin to select 20 of 30 features of an sklearn
dataset:
>>> from sklearn.datasets import load_breast_cancer
>>> from dwave.plugins.sklearn import SelectFromQuadraticModel
...
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> X_new = SelectFromQuadraticModel(num_features=20).fit_transform(X, y)
>>> X_new.shape
(569, 20)
For large problems, the default runtime may be insufficient. You can use the CQM solver's
min_time_limit
method to find the minimum accepted runtime for your problem; alternatively, simply submit as above
and check the returned error message for the required runtime.
The feature selector can be re-instantiated with a longer time limit.
>>> X_new = SelectFromQuadraticModel(num_features=20, time_limit=200).fit_transform(X, y)
You can use SelectFromQuadraticModel
with scikit-learn's
hyper-parameter optimizers.
For example, the number of features can be tuned using a grid search. Please note that this will submit many problems to the hybrid solver.
>>> import numpy as np
...
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.model_selection import GridSearchCV
>>> from sklearn.pipeline import Pipeline
>>> from dwave.plugins.sklearn import SelectFromQuadraticModel
...
>>> X, y = load_breast_cancer(return_X_y=True)
...
>>> num_features = X.shape[1]
>>> searchspace = np.linspace(1, num_features, num=5, dtype=int, endpoint=True)
...
>>> pipe = Pipeline([
>>> ('feature_selection', SelectFromQuadraticModel()),
>>> ('classification', RandomForestClassifier())
>>> ])
...
>>> clf = GridSearchCV(pipe, param_grid=dict(feature_selection__num_features=searchspace))
>>> search = clf.fit(X, y)
>>> print(search.best_params_)
{'feature_selection__num_features': 22}
To install the core package:
pip install dwave-scikit-learn-plugin
Released under the Apache License 2.0
Ocean's contributing guide has guidelines for contributing to Ocean packages.
dwave-scikit-learn-plugin makes use of reno to manage its release notes.
When making a contribution to dwave-scikit-learn-plugin that will affect users, create a new release note file by running
reno new your-short-descriptor-here
You can then edit the file created under releasenotes/notes/
.
Remove any sections not relevant to your changes.
Commit the file along with your changes.