Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort-means appears to have a bug #14

Open
ghamerly opened this issue Jul 10, 2018 · 3 comments
Open

Sort-means appears to have a bug #14

ghamerly opened this issue Jul 10, 2018 · 3 comments
Labels
bug Something isn't working

Comments

@ghamerly
Copy link
Owner

Sort-means is giving slightly different results in some of my testing, compared to all the other algorithms. I have a test case where every other algorithm converges in 45 iterations, but it takes 42, and gives slightly different results (centers).

Steps to reproduce:

  • run naive k-means and sort-means on the attached input, with k=10, using the library's k-means++ initialization, with a max iterations of 1000
  • expected result: same number of iterations and centers
  • actual result: different centers, different iterations

The initial centers chosen by k-means++ on this dataset are:

    -0.379863       1.80735      0.658205      0.236417       1.52649 
    -0.456454      -0.44091      0.932621        1.4306      -0.77737 
      1.10611      -1.30305     -0.202388      -1.29549      0.150969 
     0.951702      0.885737    -0.0140388     -0.295967      -2.10619 
      2.52547      0.803819       1.57157      0.481766      0.168094 
     -0.35617      0.423922       0.72442     -0.557642      0.466705 
     0.619142      -1.08521      -1.85976      -1.19759     -0.763672 
    -0.929235      0.901898     -0.717973      -2.60481     0.0617718 
    -0.391341      0.258408     -0.368092       0.53643      -1.35026 
     0.242555       -1.5091   -0.00501431       0.30317         1.244 

bad_input.txt.gz

@ghamerly ghamerly added the bug Something isn't working label Jul 10, 2018
@cur33
Copy link
Contributor

cur33 commented Aug 19, 2018

I've started looking at this; so far, I've replicated these results, but nothing strikes me as obviously wrong about the implementation. I'll run some more detailed tests to see if I can find anything.

@ghamerly
Copy link
Owner Author

I would probably tackle this by printing out detailed information about each iteration (both for sort-means and for the naive algorithm), identifying the earliest iteration where something goes wrong (some assignment is incorrect), and drilling into why that happened.

@cur33
Copy link
Contributor

cur33 commented Aug 19, 2018

Will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants