-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coverage alignment heuristic #76
Conversation
|
||
def max_pair_coverage(array1: np.array, array2: np.array) -> float: | ||
def hash_pair(el1, el2): | ||
return el1 * 17 - el2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is around 100% faster to hash(el1) someOp hash(el2)
and is fine for this use case col wise
|
||
|
||
def max_pair_coverage(array1: npt.NDArray[np.int32], array2: npt.NDArray[np.int32]) -> float: | ||
def hash_pair(el1: np.int32, el2: np.int32): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing output type hint :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add with improved pre-commit hooks, as that should have picked it up
import numpy.typing as npt | ||
|
||
np.random.seed(123) | ||
max_size = 10**6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq, can max_size be estimated depending on input vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can, but this is directly related to batch sizes, where 1mil is a very very safe bound (many things go wrong before this is reached)
Adding a simple heuristic that enables computing proportion of aligned values. Makes pairwise comparisons breezy
https://jira.outbrain.com/browse/REF-51623