Parallel Online K - Nearest Neighbour Search using a Progressive k-d Tree

Authors

Kaushal Bhogale
Rahul John Roy

Introduction and Motivation

The k-nearest neighbor (KNN) algorithm is a fundamental classification and regression method for machine learning. KNN works by finding the k-nearest points to a given query point in the feature space.

With the advent of sophisticated data collection mechanisms, machine learning on large datasets has become very important. An existing parallel KNN algorithm is PANDA[1], which uses k-d trees. This work assumes data is assumed to be already available, and the entire k-d tree is built in a single shot.

In contrast to the one-shot algorithm, an online algorithm allows adding new points to trees even after the trees are built. This benefits interactive systems in that analysts do not have to wait until all data is loaded. Rather, the data is split into mini-batches, inserted into the k-d tree incrementally. But, as more points are inserted to a k-d tree, the tree can become unbalanced, deteriorating the query time. Thus, we propose a tree balancing method to address this issue.

Problem Definition

The aim of the project is to develop an parallel online k-Nearest Neighbour search algorithm, which works on the principle of progressively growing k-d trees. The algorithm provides a balancing algorithm for k-d trees, which minimizes the amount of communication latencies required between processes.

Project Goals

Implement the existing parallel KNN algorithm.
Design the online version of the same.
Experiment for different input distributions.
Test for various arrival frequencies and bandwidth of data.
Provide empirical justification for scenarios in which the online algorithm will be useful.

Useful Links

References

M. M. A. Patwary et al., “PANDA: Extreme Scale Parallel KNearest Neighbor on Distributed Architectures,” in IEEE Parallel and Distributed Processing Symposium, 2016, pp. 494–503

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
docs		docs
examples		examples
knn		knn
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
script.sh		script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Online K - Nearest Neighbour Search using a Progressive k-d Tree

Authors

Introduction and Motivation

Problem Definition

Project Goals

Useful Links

References

About

Releases

Packages

Languages

License

kaushal-py/parallel-online-knn

Folders and files

Latest commit

History

Repository files navigation

Parallel Online K - Nearest Neighbour Search using a Progressive k-d Tree

Authors

Introduction and Motivation

Problem Definition

Project Goals

Useful Links

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages