Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support other distance metrics #11

Open
eigenhombre opened this issue Oct 24, 2015 · 4 comments
Open

Support other distance metrics #11

eigenhombre opened this issue Oct 24, 2015 · 4 comments

Comments

@eigenhombre
Copy link
Contributor

Currently the (square of the) Euclidean distance is the only metric allowed, with the simplest (i.e., non-wrapping) topology. I'd be interested in getting this to work on e.g. a sphere with the Haversine (great circle) distance. Use cases for this include distances on the Earth (GIS). @abscondment would you welcome a PR for that?

@eigenhombre
Copy link
Contributor Author

Followup -- I started working on supporting additional distance functions. API question -- the current API exports a :dist-squared keyword on the points returned by nearest-neighbor, which obviously implies Euclidean distance (squared). In my current approach I am just renaming this to :dist-result, but this will break the API for anyone who relies on the old keyword. Alternatively, we could return :dist-squared when the default, Euclidiean distance is used, and :dist-result otherwise, though that seems a bit hacky to me. I checked the couple of projects that use clj-kdtree on crossclj.info and didn't find anyone using this keyword. Thoughts?

@nbeloglazov
Copy link
Collaborator

:dist-squared used both internally and exported as API. It would make sense to have just :dist without squared, but having squared optimizes things a little bit as you don't need to calculate sqrt at all. It's probably makes sense to have additional step before you give result back to user which replaces all :dist-squared with :dist by calculating sqrt. This way internally we can still use dist-squared as optimization and also have easier to understand API.

BTW, will yourchanges to introduce additional metrics support this 2 notions of distances for same metric (if needed), like normal dist for public API and "squared" internally?

@abscondment
Copy link
Owner

@eigenhombre Yes, I'd absolutely welcome a PR!

I like the notion of having the API expose :dist for when you need to actually know that value -- i.e. maybe even compute it lazily -- but allow for internal comparison via a potentially faster implementation.

I'm not sure how widespread use of this library is, but it might be friendlier to deprecate :dist-squared for one release and provide both fields. Thoughts on that? I've no strong opinion either way.

@eigenhombre
Copy link
Contributor Author

I like the more general :dist value, possibly computed lazily when needed from :dist-squared (or other internal metric value in the tree). However, I have a bigger worry. For spherical geometry the algorithm needs to "wrap" continuously at 0 and 2π (or -180 & 180 degrees longitude) and I haven't had the time to figure out how to accommodate that (see my question on cs.stackexchange, which is what led me to this library in the first place). This, along with the Haversine metric for calculating great circle distance, is what's needed to do nearest neighbor correctly on the sphere. I wonder if the API should accommodate both geometric (metric) and topological (connectedness) concerns. For example, one straightforward application could be a flat (Euclidean) metric in an n-dimensional torus (think the old school video game Asteroids as a use case). Any thoughts on whether it would be worth supporting connected edges in the API (aside from my use case, which I definitely need it for)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants