feat: sharded read rows #766

daniel-sanche · 2023-04-17T21:01:22Z

Blocked on Read Rows PR: #762

This PR adds query sharding.

Adds query.shard(), to split one large query into multiple smaller ones
Adds client.sample_keys, to get a list of "sample points" for a table, that are used to efficiently shard
Adds client.read_rows_sharded, to execute a set of sharded queries in parallel, and return the results as a single list
made some usability changes to RowRange and Query classes

google/cloud/bigtable/read_rows_query.py

igorbernstein2 · 2023-05-31T20:43:36Z

google/cloud/bigtable/client.py

+        Raises:
+            - ShardedReadRowsExceptionGroup: if any of the queries failed
+            - ValueError: if the query_list is empty
+        """


I dont think we need an error. Also the rows will be de-duplicated on the serverside

google/cloud/bigtable/client.py

igorbernstein2 · 2023-05-31T21:20:19Z

google/cloud/bigtable/client.py

-        operation_timeout: int | float | None = 60,
-        per_sample_timeout: int | float | None = 10,
-        per_request_timeout: int | float | None = None,
+        operation_timeout: int | float | None = None,


Why no attempt timeout?

I didn't implement retries since I noticed it wasn't on go/cbt-client-debugging-steps#client-retry-settings

I added retries, using the same retryable exceptions as mutate_rows. Let me know if that works

go/cbt-client-debugging-steps#client-retry-settings needs to be updated :P

google/cloud/bigtable/client.py

google/cloud/bigtable/read_rows_query.py

mutianf · 2023-06-16T18:59:49Z

google/cloud/bigtable/read_rows_query.py

+                # then add start_segment back to get the correct index
+                cropped_pts = split_points[start_segment:]
+                end_segment = (
+                    bisect_left(cropped_pts, this_range.end.key) + start_segment


do we need to check this_range.end.is_inclusive here also? If it's exclusive we should do bisect_left but if it's inclusive I think it should go to right?

Not for the end segment, since the split_points mark the inclusive ends of each segment. Whether the range includes or excludes the split point, the results will be the same (left side)

Do you think I should add a comment to call this out?

google/cloud/bigtable/client.py

igorbernstein2 · 2023-06-21T17:30:53Z

google/cloud/bigtable/client.py

+        ]
+        if exception_list:
+            # if any sub-request failed, raise an exception instead of returning results
+            raise ShardedReadRowsExceptionGroup(exception_list, len(query_list))


Since you are taking the effort to let all of the shards finish despite the error, you might as well add the partial results in the exception

Good point, I added a successful_rows field to the exception

igorbernstein2 · 2023-06-21T17:37:18Z

google/cloud/bigtable/read_rows_query.py

+            # use binary search to find the segment the end key belongs to.
+            # optimization: remove keys up to start_segment from searched list,
+            # then add start_segment back to get the correct index


why not just pass lo=start_segment to bisect_left?

* feat: add new v3.0.0 API skeleton (#745) * feat: improve rows filters (#751) * feat: read rows query model class (#752) * feat: implement row and cell model classes (#753) * feat: add pooled grpc transport (#748) * feat: implement read_rows (#762) * feat: implement mutate rows (#769) * feat: literal value filter (#767) * feat: row_exists and read_row (#778) * feat: read_modify_write and check_and_mutate_row (#780) * feat: sharded read rows (#766) * feat: ping and warm with metadata (#810) * feat: mutate rows batching (#770) * chore: restructure module paths (#816) * feat: improve timeout structure (#819) * fix: api errors apply to all bulk mutations * chore: reduce public api surface (#820) * feat: improve error group tracebacks on < py11 (#825) * feat: optimize read_rows (#852) * chore: add user agent suffix (#842) * feat: optimize retries (#854) * feat: add test proxy (#836) * chore(tests): add conformance tests to CI for v3 (#870) * chore(tests): turn off fast fail for conformance tets (#882) * feat: add TABLE_DEFAULTS enum for table method arguments (#880) * fix: pass None for retry in gapic calls (#881) * feat: replace internal dictionaries with protos in gapic calls (#875) * chore: optimize gapic calls (#863) * feat: expose retryable error codes to users (#879) * chore: update api_core submodule (#897) * chore: merge main into experimental_v3 (#900) * chore: pin conformance tests to v0.0.2 (#903) * fix: bulk mutation eventual success (#909) --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

daniel-sanche added 30 commits March 31, 2023 17:03

did some restructuring

38e5662

got some tests working

5155800

improved tests

522f7fa

renamed RowResponse and CellResponse to Row and Cell

9429244

fixed tests

1aa7424

simplified row construction

a603649

added RowRange object

68a5a0f

added comments

cc2e7c8

added api-core submodule

ba629c8

copied in rough retryable logic

75d2c10

Merge branch 'v3_row_response' into read_rows_state_machine

d5eca2a

updated Row and Cell class names

2a26797

fixed tests

bcd394f

added last scanned row class

037af0d

ran blacken

e17d9bc

Merge branch 'read_rows_state_machine' into read_rows_retries

db80d22

handle last scanned rows

b3d977d

Merge branch 'add_new_transport' into read_rows_retries

1f85462

updated add_keys

1fba6ea

removed chaining

c4f82b0

improved to_dicts

caca14c

improving row_ranges

5f9ce85

fixed properties

8e5f60a

added type checking to range

57184c1

got tests passing

3eda7f4

blacken, mypy

65f5a2a

ran blacken

3e724db

improved API usage

45eadce

use invalid chunk

c06213f

added per request timeouts

6e75a2f

renamed sample_keys to sample_row_keys

42cac01

igorbernstein2 requested changes May 31, 2023

View reviewed changes

daniel-sanche added 5 commits June 6, 2023 13:12

Merge branch 'v3' into sharded_read_rows

632a106

added metadata to sample_row_keys

05a311e

changed shard points to be range ends instead of starts

f53af32

added concurrency limit

ac4378d

added retries for sample_keys

9eaa279

igorbernstein2 reviewed Jun 13, 2023

View reviewed changes

google/cloud/bigtable/read_rows_query.py Outdated Show resolved Hide resolved

igorbernstein2 reviewed Jun 13, 2023

View reviewed changes

google/cloud/bigtable/read_rows_query.py Outdated Show resolved Hide resolved

igorbernstein2 reviewed Jun 13, 2023

View reviewed changes

google/cloud/bigtable/read_rows_query.py Outdated Show resolved Hide resolved

daniel-sanche added 2 commits June 15, 2023 14:58

cleaned up code block

6cca7cf

documented and simplified sharding function

88e88d4

mutianf reviewed Jun 16, 2023

View reviewed changes

daniel-sanche added 2 commits June 18, 2023 11:51

Merge branch 'v3' into sharded_read_rows

26ffe0c

split row_range sharding into own helper

9302286

igorbernstein2 reviewed Jun 21, 2023

View reviewed changes

google/cloud/bigtable/client.py Outdated Show resolved Hide resolved

igorbernstein2 reviewed Jun 21, 2023

View reviewed changes

daniel-sanche added 9 commits June 21, 2023 11:49

added type alias

3f4dd0e

modify timeouts with batch

bb72b5e

added successfult rows to ShardedReadRowsExceptionGroup

71b034c

improved end segment search

ceb8129

removed changes to mutation exception

0e277f4

added excaption tests for new exception types

37b4967

fixed error in sharded_read_rows

9508a0f

added timeouts to batching test

d3f6b0f

ran black

a4f606e

igorbernstein2 approved these changes Jun 22, 2023

View reviewed changes

daniel-sanche merged commit ec2b983 into googleapis:v3 Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sharded read rows #766

feat: sharded read rows #766

daniel-sanche commented Apr 17, 2023 •

edited

Loading

igorbernstein2 May 31, 2023

igorbernstein2 May 31, 2023

daniel-sanche Jun 7, 2023

mutianf Jun 16, 2023

mutianf Jun 16, 2023

daniel-sanche Jun 16, 2023

igorbernstein2 Jun 21, 2023

daniel-sanche Jun 21, 2023

igorbernstein2 Jun 21, 2023

daniel-sanche Jun 21, 2023

feat: sharded read rows #766

feat: sharded read rows #766

Conversation

daniel-sanche commented Apr 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-sanche commented Apr 17, 2023 •

edited

Loading