Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k-measn++ uniqueness check #3

Open
ghamerly opened this issue May 10, 2018 · 1 comment
Open

k-measn++ uniqueness check #3

ghamerly opened this issue May 10, 2018 · 1 comment

Comments

@ghamerly
Copy link
Owner

ghamerly commented May 10, 2018

Migrated from BaylorCS/baylorml#3 (@BenjaminHorn)

I guess this loop's https://github.com/ghamerly/baylorml/blob/master/fast_kmeans/general_functions.cpp#L217 function is to prevent a point to be chosen more then once as center.

Let's say we have 10 points in our dataset. We init the centers with with init_centers_kmeanspp_v2.

  • The first center is picked uniformly randomly: chosen_pts = {2}.
  • The 2nd point is picked in the while loop, let's say: 5. chosen_pts = {2, 5}.
  • The 3rd draw results: chosen_pts[2] = 2;

https://github.com/ghamerly/baylorml/blob/master/fast_kmeans/general_functions.cpp#L218 results a unique = FALSE, we have to do the 3rd draw again.

// unique is instantiated as TRUE, outside(!) the loop
unique = unique && chosen_pts[2] != chosen_pts[0]  // TRUE && FALSE (2 != 2) = FALSE
unique = unique && chosen_pts[2] != chosen_pts[1] //  FALSE && TRUE (2 !=5) = FALSE
// !FALSE -> new loop
  • 3rd draw (2nd do while loop): chosen_pts[2] = 8;
// unique is still FALSE from the previous loop
unique = unique && chosen_pts[2] != chosen_pts[0] // FALSE && TRUE(8 != 2) = FALSE
unique = unique && chosen_pts[2] != chosen_pts[1] // FALSE && TRUE(8 !=5) = FALSE
// !FALSE -> new loop

Maybe I'm getting it wrong, but with some mini datasets (where is the chance to pick the same center again is bigger) I often run into an endless loop.

@ghamerly
Copy link
Owner Author

I think you're right, we should set unique = true inside the do / while loop as the first thing. Do you want to make this change and send a pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant