Interest in a ruby implementation? #7

dgollahon · 2024-05-05T23:27:22Z

Hi,

I am interested in using rapidfuzz-rs through magnus in Ruby. I have no problem doing this for just myself (it's very straightforward), but I was wondering if it would make sense to opensource a project there for others. I am happy to release it under my own github or "donate" it to this organization if that is desirable/helpful. I don't want to squat the rapidfuzz gem name if this group/someone else would like to own it.

Thanks!
Daniel

The text was updated successfully, but these errors were encountered:

maxbachmann · 2024-05-07T15:13:21Z

I think placing it in the rapidfuzz organisation would make sense for people to find it more easily. In terms of gems it would probably make sense to use some trusted publishing system via github actions similar to what is done for the Python version of the library.

There are a couple of things that I did differently in the Python version compared to the C++/Rust version to make it more useful for Python users:

there is a pure Python fallback implementation for platforms on which the faster C++/rust based solution can't be compiled (e.g. because no compiler is present)
the preferred implementation is the compiled one
Performing individual comparisons from Python is relatively slow. To speed this up I do provide the rapidfuzz.process module which allows the user to perform comparisons for complete datasets. E.g. process.extractOne to find the best match in a 1 x many comparison. This is generally faster, since it avoids interpreter overhead + in Python the global interpreter lock.
the cached scorer structs are not available from Python. Their speedup is simply to small in comparison to the function call overhead. Instead they are used under the hood by the process functions. This is done by tagging any scorer with an attribute giving access to these lowered functions.

I never used ruby myself. So I can't help with any ruby specific questions, but I would be more than happy to help with any questions in regards to the library.

dgollahon · 2024-05-07T22:21:01Z

Ok, that makes sense.

I think native ruby fallback would probably be something I don't have time to implement but I think a relatively "dumb" port using the magnus tooling I mentioned above would not be heavy lift. I'm not sure exactly when I'll get to this but I will plan on putting up a draft repo at some point and possibly reserve the relevant gem name and then figure out publishing lifecycle later on.

I think the overhead for functions bound via magnus (indirectly the C APIs) should be reasonable for most use-cases. Using the osa_distance function i found some minor test workloads to be 5-150 times as fast as a similar C-based gem in the ecosystem.

maxbachmann · 2024-05-08T00:07:40Z

Yes I started out without all of these things in the Python version as well and added them as I had time + need for them.

Wrapping the API using something like magnus is probably not too much work, since most of the functions share a similar interface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interest in a ruby implementation? #7

Interest in a ruby implementation? #7

dgollahon commented May 5, 2024

maxbachmann commented May 7, 2024

dgollahon commented May 7, 2024 •

edited

Loading

maxbachmann commented May 8, 2024

Interest in a ruby implementation? #7

Interest in a ruby implementation? #7

Comments

dgollahon commented May 5, 2024

maxbachmann commented May 7, 2024

dgollahon commented May 7, 2024 • edited Loading

maxbachmann commented May 8, 2024

dgollahon commented May 7, 2024 •

edited

Loading