Stateful partitioners #758

wprzytula · 2023-07-07T15:58:36Z

Motivation

Currently, partitioners require the whole input to be passed in a single contiguous slice, which often forces an allocation that could else be avoided.

What's done

Partitioners' API is refactored so it resembles std::hash API:
- Partitioner corresponds to BuildHasher as it becomes kind of a factory,
- PartitionerHasher is introduced and corresponds to Hasher. It contains the hashing state, which enables feeding it with multiple chunks of data separately.
  A test is added that ensures consistent hashing result no matter how the partition key is partitioned into chunks.
After the partitioners gain ability to partition by chunks, the codebase is modified so that all needless allocations of partition key are avoided, i.e., allocations that only served the purpose of calculating token.
To this end, PartitionKey struct is introduced, whose motivation is as follows: the algorithm for computing partition key for a given prepared statement and bound values can be divided into steps. The first of them involves extracting values that consistute the partition key and putting them in proper partition key order. PartitionKey performs this step on construction, and serves as an entry point for further steps, with token calculation among others. For those further steps to be done, an iterator over partition key values is accessible, which yields pairs (value, spec) for each column that constitutes partition key, in partition key order.
As there was one more place where materialised partition key was used - by decoding it into the produced trace - an adaptor is introduced that operates on PartitionKey's iterator provides values to be deserialised by the mechanism added recently in session: include partition key in RequestSpan in human readable form #766. Allocation is then saved there, too.
An API cleanup is done regarding partition key and token computation. calculate_token_from_partition_key() is moved to partitioner.rs, calculate_token() to prepared_statement.rs, and a number of helper functions are deleted as no longer needed.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
~~[ ] I have adjusted the documentation in ./docs/source/.~~
~~[ ] I added appropriate Fixes: annotations to PR description.~~

mykaul · 2023-07-09T06:02:39Z

Any measurable performance impact?

avelanarius

As a part of this change, you should also tackle issues with public API for calculating tokens.

Currently, there are some problems with that: There are 3 functions to calculate Token: Session::calculate_token, Session::calculate_token_for_partition_key and ClusterData::compute_token. It doesn't make sense to put them inside Session or ClusterData - wrong level of abstraction. A similar case for functions to calculate partition key.

More context: #757 (comment)
My initial exploration on fixing it: https://github.com/avelanarius/scylla-rust-driver/tree/token_calc_cleanup

wprzytula · 2023-07-24T10:07:08Z

Any measurable performance impact?

Number of allocations has dropped (6 -> 5 per insert, 19 -> 18 per select).

Before:

Args { mode: All, node: "127.0.0.1:9042", parallelism: 256, requests: 100000 }
Connecting to 127.0.0.1:9042 ...
Sending 100000 inserts, hold tight ..........
----------
Inserts:
----------
allocs/req:                 6.08
reallocs/req:               6.00
frees/req:                  6.07
bytes allocated/req:      272.75
bytes reallocated/req:     73.06
bytes freed/req:          267.77
----------
Sending 100000 selects, hold tight ..........
----------
Selects:
----------
allocs/req:                19.00
reallocs/req:               6.00
frees/req:                 19.00
bytes allocated/req:     1123.09
bytes reallocated/req:     73.00
bytes freed/req:         1123.03
----------

After:

Args { mode: All, node: "127.0.0.1:9042", parallelism: 256, requests: 100000 }
Connecting to 127.0.0.1:9042 ...
Sending 100000 inserts, hold tight ..........
----------
Inserts:
----------
allocs/req:                 5.08
reallocs/req:               5.00
frees/req:                  5.07
bytes allocated/req:      265.09
bytes reallocated/req:     77.06
bytes freed/req:          260.06
----------
Sending 100000 selects, hold tight ..........
----------
Selects:
----------
allocs/req:                18.00
reallocs/req:               5.00
frees/req:                 18.00
bytes allocated/req:     1115.19
bytes reallocated/req:     77.00
bytes freed/req:         1115.05
----------

piodul · 2023-07-24T10:38:02Z

Any measurable performance impact?

Number of allocations has dropped (6 -> 5 per insert, 19 -> 18 per select).

Please prepare a microbenchmark for this. While an allocation is now avoided, the new implementation has more branches, so it will be interesting to see whether the tradeoff was worth it.

We are basically interested in the PreparedStatement::compute_partition_key function. Interesting cases are single-column pks with different lengths and multi-column pks with different lengths.

We already have a small amount of benchmarks in scylla and scylla-cql, you can look at them for reference (and at the external criterion crate that those benchmarks use).

scylla/src/transport/partitioner.rs

wprzytula · 2023-07-24T15:57:32Z

Please prepare a microbenchmark for this. While an allocation is now avoided, the new implementation has more branches, so it will be interesting to see whether the tradeoff was worth it.

After benchmarks and some changes to the algorithm, the new way seems to outperform the old way by 15% to 40% in various cases. I believe this suffices to merge this.

mykaul · 2023-07-25T10:03:47Z

Please prepare a microbenchmark for this. While an allocation is now avoided, the new implementation has more branches, so it will be interesting to see whether the tradeoff was worth it.

After benchmarks and some changes to the algorithm, the new way seems to outperform the old way by 15% to 40% in various cases. I believe this suffices to merge this.

NICE! Well done! Would be happy to see details of this tests.

wprzytula · 2023-07-25T10:13:38Z

Please prepare a microbenchmark for this. While an allocation is now avoided, the new implementation has more branches, so it will be interesting to see whether the tradeoff was worth it.

After benchmarks and some changes to the algorithm, the new way seems to outperform the old way by 15% to 40% in various cases. I believe this suffices to merge this.

NICE! Well done! Would be happy to see details of this tests.

I'll clean the benchmarks up and include them in our codebase, so they can guard against unintended performance regressions in the future.

wprzytula · 2023-07-25T16:01:24Z

These are results of the benchmarks: the old algorithm on the right (percentage not relevant), the new algorithm on the left (percentage shows improvement relative the the old one).

wprzytula · 2023-07-28T16:28:09Z

CI fails due to use of Option::unzip(), which is stable since Rust 1.66, whereas our MSRV is 1.65. Can we bump it @piodul?
Apart from that, ready for review.

wprzytula · 2023-07-31T13:00:09Z

I reimplemented Option::unzip() (technically, copied its code from the standard library) temporarily, not to force MSRV bump yet.

wprzytula · 2023-08-02T07:59:59Z

Rebased on main.

scylla/src/statement/prepared_statement.rs

scylla/src/transport/partitioner.rs

scylla/src/statement/prepared_statement.rs

scylla/src/transport/cluster.rs

scylla/src/utils/mod.rs

scylla/src/transport/session.rs

scylla/src/transport/iterator.rs

piodul · 2023-08-04T07:38:33Z

scylla/benches/benchmark.rs

+    frame::types,
+    frame::value::ValueList,
+    transport::partitioner::{calculate_token_for_partition_key, Murmur3Partitioner},
+};


Nit: putting the commit with the benchmark at the beginning of the PR should make it easier to use it to compare performance before and after the changes.

scylla/src/statement/prepared_statement.rs

scylla/src/transport/partitioner.rs

piodul

Another batch of comments - hopefully the last one this time.

piodul · 2023-08-21T07:11:25Z

scylla/src/transport/partitioner.rs

+/// Instances of this trait are created by a `Partitioner` and are stateful.
+/// At any point, one can call `finish()` and a `Token` will be computed
+/// based on values that has been fed so far.
+pub trait PartitionerHasher {


Nit: The first thing that comes to my mind when I see PartitionerHasher is that it hashes partitioners. I don't have a good, alternative suggestion, though.

scylla/src/statement/prepared_statement.rs

scylla-cql/src/frame/response/result.rs

scylla/src/statement/prepared_statement.rs

piodul · 2023-08-21T07:39:25Z

scylla/src/transport/session.rs

-        prepared_metadata: &PreparedMetadata,
-        partition_key: Option<&Bytes>,
+    pub(crate) fn new_prepared<'ps>(
+        partition_key: Option<impl Iterator<Item = (&'ps [u8], &'ps ColumnSpec)> + Clone>,


Does this need to be that much generic? Does it really make sense to put here anything that isn't an Option<PartitionKey>? Perhaps doing so will make it easier to fix the lifetime issue in partition_key_displayer.

I've tried it up and gave up. Apparently, the Iterator returned by PartitionKey::iter() is less problematic in terms of lifetime issue than PartitionKey itself.

scylla/src/transport/session.rs

An `else` branch is introduced to make code flow clearer, and a typo is fixed.

The Partitioner API is changed into stateful one, resembling std::hash traits. The motivation is simple: having Partitioners hold state enables feeding them with data in chunks instead of one contiguous slice, which saves us an allocation. Since now, Partitioners' users are encouraged to first build a hasher using `Partitioner::build_hasher()` and then feed it in chunks with `write()`. Finally, `finish()` is to be called and a token is returned. The `Partitioner::hash()` method was renamed to `hash_one()` for closer correspondence to `std::hash::BuildHasher` trait.

This is an umbrella enum for all available PartitionerHashers. It is analogous to PartitionerName in this regard, as well as when it comes to its usage.

In the next few commits, the logic concerning partition key and token computation will be split into two steps: 1) extracting partition key from serialized values 2) calculating token based on the extracted partition key Therefore, to gain more granularity over errors, `PartitionKeyError` is divided into `PartitionKeyExtractionError` and `TokenCalculationError`. This will be needed for clean error handling in the new API.

The recently added `calculate_token_for_partition_key()` duplicates the logic that `ClusterData::compute_token()` uses. To deduplicate, `compute_token()` now calls `calculate_token_for_partition_key()`.

It is clear that the algorithm for computing partition key for a given prepared statement and bound values can be divided into steps. The first of them involves extracting values that consistute the partition key and putting them in proper partition key order. PartitionKey struct performs this step on construction, and serves as an entry point for further steps, with token calculation among others. For those further steps to be done, an iterator over partition key values is accessible, which yields pairs (value, spec) for each column that constitutes partition key, in partition key order. It will be used to avoid materialising partition key in a buffer.

The tests require PartialEq, Eq on TableSpec and ColumnSpec, so these traits were derived unconditionally.

Traces generated upon requests are expected to contain the partition key if available. In order to avoid an allocation for that, `PartitionKey` struct is used, whose iterator yields the required values one by one and then they are deserialised. Partition key is extracted only once, and token is calculated only once, too. Not to raise our MSRV, `Option::unzip()` is temporarily reimplemented as an util `unzip_option()`. It is to be replaced with `Option::unzip()` once MSRV is raised to at least 1.66.

It's no longer useful, as now we operate straight on serialized values before they get encoded in partition key format.

`Session::calculate_token()` is modified so that it takes advantage of the new Partitioner API and hence avoids allocating the whole partition key. Additionally, `calculate_token()` is now called upon queries straight, instead of `calculate_partition_key()` first. As a bonus, the mess around various flavours of `[calculate|compute]_partition_key` functions is finally gone: two of them could be completely removed.

It has no usages left, as `build_hasher()` is to be used now.

It should be exposed that calculating token without materialising partition key is feasible now, and promote this way.

In effort to combat overloading Session with nonrelated functionality, token computation functions are moved out from session.rs. Specifically, `calculate_token()` is moved to prepared.rs and `calculate_token_for_partition_key()` is moved to partitioner.rs.

The benchmark's results confirm that the new algorithm for stateful partitioners is more performant that the old one, by about 15% to 40%. In the future, the benchmark can be run to prevent unintended performance regressions in partitioners.

wprzytula requested review from piodul and cvybhu July 7, 2023 15:58

avelanarius suggested changes Jul 20, 2023

View reviewed changes

wprzytula force-pushed the stateful-partitioners branch from 1707c35 to 948c355 Compare July 24, 2023 13:08

avelanarius reviewed Jul 24, 2023

View reviewed changes

scylla/src/transport/partitioner.rs Outdated Show resolved Hide resolved

wprzytula force-pushed the stateful-partitioners branch from 948c355 to d5c7737 Compare July 24, 2023 15:55

wprzytula requested a review from avelanarius July 25, 2023 10:11

wprzytula force-pushed the stateful-partitioners branch from 126d77a to c0353dc Compare July 28, 2023 16:09

wprzytula added this to the 0.10.0 milestone Jul 28, 2023

wprzytula added the performance Improves performance of existing features label Jul 30, 2023

wprzytula force-pushed the stateful-partitioners branch 2 times, most recently from 0aa8953 to 286044f Compare July 31, 2023 12:59

wprzytula force-pushed the stateful-partitioners branch from 286044f to c9851ed Compare August 2, 2023 07:59

piodul requested changes Aug 4, 2023

View reviewed changes

wprzytula force-pushed the stateful-partitioners branch from c9851ed to e107525 Compare August 18, 2023 13:48

wprzytula requested a review from piodul August 21, 2023 07:01

piodul reviewed Aug 21, 2023

View reviewed changes

scylla/src/transport/partitioner.rs Outdated Show resolved Hide resolved

wprzytula force-pushed the stateful-partitioners branch from e107525 to 624e79e Compare August 21, 2023 07:07

piodul requested changes Aug 21, 2023

View reviewed changes

wprzytula requested a review from piodul August 21, 2023 08:54

wprzytula force-pushed the stateful-partitioners branch from 624e79e to 959520b Compare August 21, 2023 08:54

wprzytula added 6 commits August 22, 2023 09:34

prepared: Refactor compute_partition_key()

eca7d4b

An `else` branch is introduced to make code flow clearer, and a typo is fixed.

partitioner: Introduce PartitionerHasherAny

5942e0b

This is an umbrella enum for all available PartitionerHashers. It is analogous to PartitionerName in this regard, as well as when it comes to its usage.

cluster: compute_token() shares existing logic

a4dbaa5

The recently added `calculate_token_for_partition_key()` duplicates the logic that `ClusterData::compute_token()` uses. To deduplicate, `compute_token()` now calls `calculate_token_for_partition_key()`.

wprzytula force-pushed the stateful-partitioners branch from 959520b to 4c7e268 Compare August 22, 2023 07:37

wprzytula added 8 commits August 22, 2023 10:15

prepared: add tests for PartitionKey

b5a4581

The tests require PartialEq, Eq on TableSpec and ColumnSpec, so these traits were derived unconditionally.

prepared: Remove PartitionKeyDecoder altogether

aa47fce

It's no longer useful, as now we operate straight on serialized values before they get encoded in partition key format.

PartitionerName: remove hash() method altogether

f60f26a

It has no usages left, as `build_hasher()` is to be used now.

examples: compare-tokens: promote calculating token on-fly

f711725

It should be exposed that calculating token without materialising partition key is feasible now, and promote this way.

benches: add benchmark for token calculation

0e21804

The benchmark's results confirm that the new algorithm for stateful partitioners is more performant that the old one, by about 15% to 40%. In the future, the benchmark can be run to prevent unintended performance regressions in partitioners.

wprzytula force-pushed the stateful-partitioners branch from 4c7e268 to 0e21804 Compare August 22, 2023 08:15

piodul approved these changes Aug 22, 2023

View reviewed changes

piodul merged commit 83f4050 into scylladb:main Aug 22, 2023
8 checks passed

wprzytula deleted the stateful-partitioners branch March 5, 2024 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stateful partitioners #758

Stateful partitioners #758

wprzytula commented Jul 7, 2023 •

edited

Loading

mykaul commented Jul 9, 2023

avelanarius left a comment

wprzytula commented Jul 24, 2023

piodul commented Jul 24, 2023

wprzytula commented Jul 24, 2023

mykaul commented Jul 25, 2023

wprzytula commented Jul 25, 2023

wprzytula commented Jul 25, 2023

wprzytula commented Jul 28, 2023

wprzytula commented Jul 31, 2023

wprzytula commented Aug 2, 2023

piodul Aug 4, 2023

piodul left a comment

piodul Aug 21, 2023

piodul Aug 21, 2023

wprzytula Aug 21, 2023

Stateful partitioners #758

Stateful partitioners #758

Conversation

wprzytula commented Jul 7, 2023 • edited Loading

Motivation

What's done

Pre-review checklist

mykaul commented Jul 9, 2023

avelanarius left a comment

Choose a reason for hiding this comment

wprzytula commented Jul 24, 2023

piodul commented Jul 24, 2023

wprzytula commented Jul 24, 2023

mykaul commented Jul 25, 2023

wprzytula commented Jul 25, 2023

wprzytula commented Jul 25, 2023

wprzytula commented Jul 28, 2023

wprzytula commented Jul 31, 2023

wprzytula commented Aug 2, 2023

piodul Aug 4, 2023

Choose a reason for hiding this comment

piodul left a comment

Choose a reason for hiding this comment

piodul Aug 21, 2023

Choose a reason for hiding this comment

piodul Aug 21, 2023

Choose a reason for hiding this comment

wprzytula Aug 21, 2023

Choose a reason for hiding this comment

wprzytula commented Jul 7, 2023 •

edited

Loading