Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running kmeans with --vec takes longer than without #898

Open
philipportner opened this issue Nov 5, 2024 · 0 comments
Open

running kmeans with --vec takes longer than without #898

philipportner opened this issue Nov 5, 2024 · 0 comments
Labels
performance label for PRs of perf++ and issues of perf--

Comments

@philipportner
Copy link
Collaborator

When running the following kmeans script, adding the --vec flag increases the runtime of the program by a bit more than 50%.

kmeans.daphne

Same script can be found in test/api/cli/algorithms/kmeans.daphne

// K-means clustering.

// Arguments:
// - r ... number of records
// - c ... number of centroids
// - f ... number of features
// - i ... number of iterations

// Data generation.
X = rand($r, $f, 0.0, 1.0, 1, -1);
C = rand($c, $f, 0.0, 1.0, 1, -1);

// K-means clustering (decisive part).
for(i in 1:$i) {
    D = (X @ t(C)) * -2 + t(sum(C ^ 2, 0));
    minD = aggMin(D, 0);
    P = D <= minD;
    P = P / sum(P, 0);
    P_denom = sum(P, 1);
    C = (t(P) @ X) / t(P_denom);
}

// Result output.
print(C);
cli command to run kmeans.daphne

time bin/daphne test/api/cli/algorithms/kmeans.daphne r=1000000 f=100 c=50 i=10

time bin/daphne --vec test/api/cli/algorithms/kmeans.daphne r=1000000 f=100 c=50 i=10

time output

bin/daphne test/api/cli/algorithms/kmeans.daphne r=1000000 f=100 c=50 i=10 115.16s user 136.03s system 1109% cpu 22.634 total

bin/daphne --vec test/api/cli/algorithms/kmeans.daphne r=1000000 f=100 c=50 461.03s user 821.06s system 3718% cpu 34.477 total

@philipportner philipportner added the performance label for PRs of perf++ and issues of perf-- label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance label for PRs of perf++ and issues of perf--
Projects
None yet
Development

No branches or pull requests

1 participant