-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: sharded read rows #766
feat: sharded read rows #766
Conversation
Raises: | ||
- ShardedReadRowsExceptionGroup: if any of the queries failed | ||
- ValueError: if the query_list is empty | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we need an error. Also the rows will be de-duplicated on the serverside
google/cloud/bigtable/client.py
Outdated
operation_timeout: int | float | None = 60, | ||
per_sample_timeout: int | float | None = 10, | ||
per_request_timeout: int | float | None = None, | ||
operation_timeout: int | float | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no attempt timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't implement retries since I noticed it wasn't on go/cbt-client-debugging-steps#client-retry-settings
I added retries, using the same retryable exceptions as mutate_rows. Let me know if that works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go/cbt-client-debugging-steps#client-retry-settings needs to be updated :P
# then add start_segment back to get the correct index | ||
cropped_pts = split_points[start_segment:] | ||
end_segment = ( | ||
bisect_left(cropped_pts, this_range.end.key) + start_segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to check this_range.end.is_inclusive here also? If it's exclusive we should do bisect_left but if it's inclusive I think it should go to right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for the end segment, since the split_points
mark the inclusive ends of each segment. Whether the range includes or excludes the split point, the results will be the same (left side)
Do you think I should add a comment to call this out?
google/cloud/bigtable/client.py
Outdated
] | ||
if exception_list: | ||
# if any sub-request failed, raise an exception instead of returning results | ||
raise ShardedReadRowsExceptionGroup(exception_list, len(query_list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are taking the effort to let all of the shards finish despite the error, you might as well add the partial results in the exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I added a successful_rows
field to the exception
# use binary search to find the segment the end key belongs to. | ||
# optimization: remove keys up to start_segment from searched list, | ||
# then add start_segment back to get the correct index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just pass lo=start_segment
to bisect_left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
* feat: add new v3.0.0 API skeleton (#745) * feat: improve rows filters (#751) * feat: read rows query model class (#752) * feat: implement row and cell model classes (#753) * feat: add pooled grpc transport (#748) * feat: implement read_rows (#762) * feat: implement mutate rows (#769) * feat: literal value filter (#767) * feat: row_exists and read_row (#778) * feat: read_modify_write and check_and_mutate_row (#780) * feat: sharded read rows (#766) * feat: ping and warm with metadata (#810) * feat: mutate rows batching (#770) * chore: restructure module paths (#816) * feat: improve timeout structure (#819) * fix: api errors apply to all bulk mutations * chore: reduce public api surface (#820) * feat: improve error group tracebacks on < py11 (#825) * feat: optimize read_rows (#852) * chore: add user agent suffix (#842) * feat: optimize retries (#854) * feat: add test proxy (#836) * chore(tests): add conformance tests to CI for v3 (#870) * chore(tests): turn off fast fail for conformance tets (#882) * feat: add TABLE_DEFAULTS enum for table method arguments (#880) * fix: pass None for retry in gapic calls (#881) * feat: replace internal dictionaries with protos in gapic calls (#875) * chore: optimize gapic calls (#863) * feat: expose retryable error codes to users (#879) * chore: update api_core submodule (#897) * chore: merge main into experimental_v3 (#900) * chore: pin conformance tests to v0.0.2 (#903) * fix: bulk mutation eventual success (#909) --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Blocked on Read Rows PR: #762
This PR adds query sharding.
query.shard()
, to split one large query into multiple smaller onesclient.sample_keys
, to get a list of "sample points" for a table, that are used to efficiently shardclient.read_rows_sharded
, to execute a set of sharded queries in parallel, and return the results as a single list