Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add vector_search_by_key method to sync and async clients vec-330 #53

Open
wants to merge 26 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
be35723
ci: split extensive vector search tests into another file
dwelch-spike Oct 1, 2024
4650019
ci: trigger extensive vector search tests on push to dev and main only
dwelch-spike Oct 1, 2024
f0b3428
trigger tests
dwelch-spike Oct 1, 2024
17bf383
ci: remove extensive vector tests from normal integration test workflow
dwelch-spike Oct 1, 2024
fc2d7ea
feat: add vector_search_by_key() client method
dwelch-spike Oct 2, 2024
143c7fe
feat: add vector_search_by_key async client method
dwelch-spike Oct 2, 2024
a6c9cfe
fix vector search by key test case bins
dwelch-spike Oct 2, 2024
8ae04d8
use test case set in vecto serach by key
dwelch-spike Oct 2, 2024
9ff9e05
set limit correctly in vector searh by key tet case
dwelch-spike Oct 2, 2024
db89cf0
chore: define __repr__ for Neighbor and Key types
dwelch-spike Oct 3, 2024
f816744
remove breakpoint
dwelch-spike Oct 3, 2024
67f6ed2
feat: define lt, le, gt, ge for Neighbor type
dwelch-spike Oct 3, 2024
2f3eae3
add missing set_name arg to vector search by key tests
dwelch-spike Oct 3, 2024
88f5003
add key_namespace to vector_search_by_key
dwelch-spike Oct 3, 2024
1ed18c5
merge dev into vec-330
dwelch-spike Oct 10, 2024
17cf07f
remove incorrect field_name docstring
dwelch-spike Oct 10, 2024
b413ea2
ci: only run test_vector_search_with_set_same_as_index when extensive…
dwelch-spike Oct 10, 2024
2381d9a
check key when sorting Neighbors if distance is equal
dwelch-spike Oct 11, 2024
9fec62f
rename vector_search_by_key argument from set_name to key_set and ind…
dwelch-spike Oct 11, 2024
6a30658
try a test run without sorting vector search results
dwelch-spike Oct 11, 2024
a6468f8
change neighbor comparison methods to check str representation of key…
dwelch-spike Oct 11, 2024
3cbce0c
add test case for search by key where search space and record are in …
dwelch-spike Oct 11, 2024
9c0d290
add a test for search by key with data and search records in differen…
dwelch-spike Oct 11, 2024
13ea21b
don't sort test results, fix incorrect result count in test cases
dwelch-spike Oct 14, 2024
03f6987
add set to record cleanup in async search by key test
dwelch-spike Oct 14, 2024
854e6ab
remove neighbor comparison magic methods
dwelch-spike Oct 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions src/aerospike_vector_search/aio/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -489,6 +489,95 @@ async def is_indexed(
raise types.AVSServerError(rpc_error=e)
return self._respond_is_indexed(response)

async def vector_search_by_key(
self,
*,
search_namespace: str,
index_name: str,
key: Union[int, str, bytes, bytearray],
key_namespace: str,
vector_field: str,
limit: int,
key_set: Optional[str] = None,
search_params: Optional[types.HnswSearchParams] = None,
include_fields: Optional[list[str]] = None,
exclude_fields: Optional[list[str]] = None,
timeout: Optional[int] = None,
) -> list[types.Neighbor]:
"""
Perform a Hierarchical Navigable Small World (HNSW) vector search in Aerospike Vector Search by primary record key.

:param search_namespace: The namespace that stores the records to be searched.
:type search_namespace: str

:param index_name: The name of the index to use in the search.
:type index_name: str

:param key: The primary key of the record that stores the vector to use in the search.
:type key: Union[int, str, bytes, bytearray]

:param key_namespace: The namespace that stores the record.
:type key_namespace: str

:param vector_field: The name of the field containing vector data.
:type vector_field: str

:param limit: The maximum number of neighbors to return. K value.
:type limit: int

:param key_set: The name of the set from which to read the record to search by. Defaults to None.
:type key_set: Optional[str]

:param search_params: Parameters for the HNSW algorithm.
If None, the default parameters for the index are used. Defaults to None.
:type search_params: Optional[types_pb2.HnswSearchParams]

:param include_fields: A list of field names to retrieve from the results.
When used, fields that are not included are not sent by the server,
saving on network traffic.
If a field is listed in both include_fields and exclude_fields,
exclude_fields takes priority, and the field is not returned.
If None, all fields are retrieved. Defaults to None.
:type include_fields: Optional[list[str]]

:param exclude_fields: A list of field names to exclude from the results.
When used, the excluded fields are not sent by the server,
saving on network traffic.
If None, all fields are retrieved. Defaults to None.
:type exclude_fields: Optional[list[str]]

:param timeout: Time in seconds this operation will wait before raising an :class:`AVSServerError <aerospike_vector_search.types.AVSServerError>`. Defaults to None.
:type timeout: int

Returns:
list[types.Neighbor]: A list of neighbors records found by the search.

Raises:
AVSServerError: Raised if an error occurs during the RPC communication with the server while attempting to vector search.
This error could occur due to various reasons such as network issues, server-side failures, or invalid request parameters.
"""
rec_and_key = await self.get(
namespace=key_namespace,
key=key,
set_name=key_set,
timeout=timeout,
)

vector = rec_and_key.fields[vector_field]

neighbors = await self.vector_search(
namespace=search_namespace,
index_name=index_name,
query=vector,
limit=limit,
search_params=search_params,
include_fields=include_fields,
exclude_fields=exclude_fields,
timeout=timeout,
)

return neighbors

async def vector_search(
self,
*,
Expand Down
94 changes: 94 additions & 0 deletions src/aerospike_vector_search/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,100 @@ def is_indexed(
raise types.AVSServerError(rpc_error=e)
return self._respond_is_indexed(response)

def vector_search_by_key(
self,
*,
search_namespace: str,
index_name: str,
key: Union[int, str, bytes, bytearray],
key_namespace: str,
vector_field: str,
limit: int,
key_set: Optional[str] = None,
search_params: Optional[types.HnswSearchParams] = None,
include_fields: Optional[list[str]] = None,
exclude_fields: Optional[list[str]] = None,
timeout: Optional[int] = None,
) -> list[types.Neighbor]:
"""
Perform a Hierarchical Navigable Small World (HNSW) vector search in Aerospike Vector Search by primary record key.

:param search_namespace: The namespace that stores the records to be searched.
:type search_namespace: str

:param index_name: The name of the index to use in the search.
:type index_name: str

:param key: The primary key of the record that stores the vector to use in the search.
:type key: Union[int, str, bytes, bytearray]

:param key_namespace: The namespace that stores the record.
:type key_namespace: str

:param vector_field: The name of the field containing vector data.
:type vector_field: str

:param limit: The maximum number of neighbors to return. K value.
:type limit: int

:param key_set: The name of the set from which to read the record to search by. Defaults to None.
:type key_set: Optional[str]

:param search_params: Parameters for the HNSW algorithm.
If None, the default parameters for the index are used. Defaults to None.
:type search_params: Optional[types_pb2.HnswSearchParams]

:param include_fields: A list of field names to retrieve from the results.
When used, fields that are not included are not sent by the server,
saving on network traffic.
If a field is listed in both include_fields and exclude_fields,
exclude_fields takes priority, and the field is not returned.
If None, all fields are retrieved. Defaults to None.
:type include_fields: Optional[list[str]]

:param exclude_fields: A list of field names to exclude from the results.
When used, the excluded fields are not sent by the server,
saving on network traffic.
If None, all fields are retrieved. Defaults to None.
:type exclude_fields: Optional[list[str]]

:param timeout: Time in seconds this operation will wait before raising an :class:`AVSServerError <aerospike_vector_search.types.AVSServerError>`. Defaults to None.
:type timeout: int

:param field_names: Deprecated, use include_fields instead.
:type field_names: Optional[list[str]]

Returns:
list[types.Neighbor]: A list of neighbors records found by the search.

Raises:
AVSServerError: Raised if an error occurs during the RPC communication with the server while attempting to vector search.
This error could occur due to various reasons such as network issues, server-side failures, or invalid request parameters.
"""
rec_and_key = self.get(
namespace=key_namespace,
key=key,
set_name=key_set,
timeout=timeout,
)

vector = rec_and_key.fields[vector_field]

neighbors = self.vector_search(
namespace=search_namespace,
index_name=index_name,
query=vector,
limit=limit,
search_params=search_params,
include_fields=include_fields,
exclude_fields=exclude_fields,
timeout=timeout,
)

return neighbors



def vector_search(
self,
*,
Expand Down
16 changes: 14 additions & 2 deletions src/aerospike_vector_search/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,13 @@ def __init__(self, *, namespace: str, set: str, key: Any) -> None:
self.set = set
self.key = key

def __repr__(self) -> str:
return (
f"Key(namespace={self.namespace}, "
f"set={self.set}, "
f"key={self.key})"
)

def __str__(self):
"""
Returns a string representation of the key.
Expand Down Expand Up @@ -128,6 +135,13 @@ def __init__(self, *, key: Key, fields: dict[str, Any], distance: float) -> None
self.fields = fields
self.distance = distance

def __repr__(self) -> str:
return (
f"Neighbor(key={self.key}, "
f"fields={self.fields}, "
f"distance={self.distance})"
)

def __str__(self):
"""
Returns a string representation of the neighboring record.
Expand All @@ -153,15 +167,13 @@ def __str__(self):
def __eq__(self, other) -> bool:
if not isinstance(other, Neighbor):
return NotImplemented

return (
self.distance == other.distance
and self.key == other.key
and self.fields == other.fields
)



class VectorDistanceMetric(enum.Enum):
"""
Enumeration of vector distance metrics used for comparing vectors.
Expand Down
2 changes: 1 addition & 1 deletion tests/aerospike.conf
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ network {
namespace avs-meta {
replication-factor 2
storage-engine memory {
data-size 2G
data-size 1G
}
nsup-period 100
}
Expand Down
2 changes: 1 addition & 1 deletion tests/assets/aerospike.conf
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ network {
namespace avs-meta {
replication-factor 2
storage-engine memory {
data-size 2G
data-size 1G
}
nsup-period 100
}
Expand Down
4 changes: 4 additions & 0 deletions tests/standard/aio/test_extensive_vector_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,8 +205,12 @@ async def test_vector_search_with_set_same_as_index(
query_numpy,
session_vector_client,
session_admin_client,
extensive_vector_search,
):

if not extensive_vector_search:
pytest.skip("Extensive vector tests disabled")

await session_admin_client.index_create(
namespace="test",
name="demo2",
Expand Down
Loading
Loading