-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scikitlearn clustering methods cleanup #209
Comments
Hmm, I think this requires a bit more thinking. For instance what would you suggest the transformation of DBSCAN would correspond to? DBScan doesn't have centroids etc, you run the algorithm and you get an "answer" in the form of labels. So the only meaningful thing I would see here is to map the transform to the predict which just gets those labels. But I think this is confusing and would prefer the distinction. What do you think?
Yes so what I did for all sklearn models is to dump in fitted params the output of the I'm not very happy with this separation Anyway back on topic: (1) I need your thoughts first (2) the refactoring is doable but will be annoying and I'm wondering whether adiding |
@tlienart You are right. DBSCAN is not like KMeans clustering. I stand corrected. However, I do wonder if the sk-learn way of conceptualising this class of clustering problems is the most useful. Might it not be better to view DBSCAN as a static transformer that transforms features
There was quite a bit of discussion in deciding upon this design where good points were made for separating the two interface points, in my view. I think there would need to be a pretty strong argument for changing the design now. I do think the name |
I think these methods could do with a review. Here are a few things that look like issues to me:
Some models do not implement a transform method (presumably because python scikit-learn does not) but could, no? Example: DBSCAN
Recall that MLJ makes a distinction between
report
andfitted_params
: the latter is for the learned parameters (in this case what is needed to assign a new observation to a class), and everything else goes inreport
. It seems that in the scikitlearn clustering wraps everything is just lumped intofitted_params
. In particular this has led to inconsistency with the Clustering.jl models KMeans and KMedoids (which separate things correctly, as far as I can tell).cc: @tlienart
The text was updated successfully, but these errors were encountered: