lbp: rack awareness #195

muzarski · 2024-10-17T13:21:45Z

Changes

Implemented cass_[cluster/exeuction_profile]_set_load_balance_rack_aware[_n]. Adjusted the documentation, and extended unit tests for API related to LBP.

Important note about documentation and default behaviour of rack aware policy

I removed this part of the docstring:

 * With empty local_rack and local_dc,  default local_dc and local_rack
 * is chosen from the first connected contact point,
 * and no remote hosts are considered in query plans.
 * If relying on this mechanism, be sure to use only contact
 * points from the local rack.

There are multiple reasons for this:

this behaviour does not make much sense, and we should not mimic it IMO
rust-driver does not behave like this
this is not even true for cpp-driver

Why it's not true for cpp-driver:
If you carefully study the changes introduced to cpp-driver in the aforementioned
commit, you will notice that it's not possible for the driver to use rack aware
policy with an empty strings. This is because API functions reject
empty string, thus RackAwarePolicy object is never constructed in such case.

CassError cass_cluster_set_load_balance_rack_aware_n(CassCluster* cluster, const char* local_dc,
                                                   size_t local_dc_length,
                                                   const char* local_rack,
                                                   size_t local_rack_length) {
  if (local_dc == NULL || local_dc_length == 0 || local_rack == NULL || local_rack_length == 0) {
    return CASS_ERROR_LIB_BAD_PARAMS;
  }
  cluster->config().set_load_balancing_policy(new RackAwarePolicy(
      String(local_dc, local_dc_length), String(local_rack, local_rack_length)));
  return CASS_OK;
}

Why is this part of docstring included in cpp-driver then? No idea. Maybe,
because cass_cluster_set_load_balance_dc_aware mentions something similar
for empty (non-specified) dc. However, in this case it's true, since dc awareness is enabled
by default in cpp-driver. See the docstring:

 * Configures the cluster to use DC-aware load balancing.
 * For each query, all live nodes in a primary 'local' DC are tried first,
 * followed by any node from other DCs.
 *
 * <b>Note:</b> This is the default, and does not need to be called unless
 * switching an existing from another policy or changing settings.
 * Without further configuration, a default local_dc is chosen from the
 * first connected contact point, and no remote hosts are considered in
 * query plans. If relying on this mechanism, be sure to use only contact
 * points from the local DC.

Default node location preference

Cpp-driver, is dc-aware by default. This is not true for rust-driver, and probably will never be. In rust-driver, by default there are no node location preferences. I adjusted the documentation of cass_cluster_set_load_balance_dc_aware by removing the mention of dc awareness being enabled by default.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
PR description sums up the changes and reasons why they should be introduced.
I have implemented Rust unit tests for the features/changes introduced.
~~[ ] I have enabled appropriate tests in .github/workflows/build.yml in gtest_filter.~~
~~[ ] I have enabled appropriate tests in .github/workflows/cassandra.yml in gtest_filter.~~

scylla-rust-wrapper/src/cluster.rs

Lorak-mmk · 2024-10-17T13:48:58Z

include/cassandra.h

- * first connected contact point, and no remote hosts are considered in
- * query plans. If relying on this mechanism, be sure to use only contact
- * points from the local DC.
- *
 * @deprecated The remote DC settings for DC-aware are not suitable for most
 * scenarios that require DC failover. There is also unhandled gap between
 * replication factor number of nodes failing and the full cluster failing. Only


I generally dislike the machanism of using first contact point DC as local DC (/rack), but I wonder if this won't cause compatibility issues.

We could probably implement it by, in case dc name is empty, following mechanism:

After session is connected check the DC of first contact point

Then create new LBP with this DC set as local

Swap it in session using something like this:

let handle = session.get_default_execution_profile_handle().clone(); let new_profile = handle.pointee_to_builder().load_balancing_policy(...).build(); handle.map_to_another_profile(new_profile);

I'm not sure preserving this mechanism is worth introducing such trickery - probably not.

Wow! Is it going to be the first time when profile mapping feature is actually useful (and used by anyone!) ?

As I said, I'm not convinced we should do this - actually, I'm more and more convinced that we shouldn't. So unfortunately it won't :(

muzarski · 2024-10-17T14:26:01Z

v2: Introduced NodeLocationPreferences enum. Removed DcAwareness and RackAwareness structs.

scylla-rust-wrapper/src/cluster.rs

include/cassandra.h

scylla-rust-wrapper/src/cluster.rs

wprzytula · 2024-10-17T14:23:18Z

include/cassandra.h

+ * <b>Note:</b> Profile-based load balancing policy is disabled by default.
+ * cluster load balancing policy is used when profile does not contain a policy.


How can we make a profile contain a load balancing policy? This is unclear to me from this docstring; is it clear from some other documentation?

cpp-rust-driver's session wrapper additionally holds a map of execution profiles, indexed by their custom names. You can firstly construct a profile, and provide custom options to it, and then pass it to CassCluster via cass_cluster_set_execution_profile with a name chosen for this profile.

Then, having multiple execution profiles defined and stored (each with unique name), you can "bind" one of your choice to some statement (see cass_statement_set_execution_profile), identifying the chosen profile by its name.

I know the mechanism you described, it was me who implemented it :)
The thing is, what does it mean that a profile contains a policy?

Ok, now I understand your question. So I studied the logic of cpp-driver, and it seems that this statement is true for cpp-driver. See: https://github.com/scylladb/cpp-driver/blob/master/src/request_processor.cpp#L196-L209

If user does not define an LBP for execution profile (i.e. exec profile does not CONTAIN lbp), then the session-wide LBP is used for this execution profile. So it means, we are inconsistent with cpp-driver. Consider following scenario:

user sets session-wide LBP to dc aware: cass_cluster_set_load_balance_dc_aware(...)

user creates some exeuction profile, but does not define LBP for it (let's call this exec profile "foo")

now, when user executes a statement with a "foo" exec profile:

in cpp-driver, the session-wide DC-aware LBP is chosen for this statement execution

in cpp-rust-driver/rust-driver, the DefaultPolicy::default() is always chosen, no matter what session-wide LBP is set to

I believe that if we wanted to mimic this behaviour, then it's relatively easy to implement - no changes to rust-driver are required. The question is, however, whether we want to mimic such behaviour?

cc: @Lorak-mmk

I don't see anything wrong with such behavior. It actually seems quite useful - if you want to overwrite some things (e.g. CL and timeout) for some statements but don't want to overwrite everything. As this behavior is not buggy / footgun etc I see no reason to change it.

Actually I didn't even know that Rust Driver worked this way, I thought it behaved like cpp-driver in this regard :D

Opened a PR: #197

Actually I didn't even know that Rust Driver worked this way, I thought it behaved like cpp-driver in this regard :D

I can vaguely remember that we decided that overriding the global per-session execution only partly with per-statement exec profile is too complex (for a user). A profile should be considered a full configuration. If only some options are to be changed for a statement compared to the session globally, it's adviced to derive an execution profile - either by cloning the builder beforehand or by calling to_builder() on an existing profile.

wprzytula · 2024-10-17T14:30:21Z

include/cassandra.h

- * first connected contact point, and no remote hosts are considered in
- * query plans. If relying on this mechanism, be sure to use only contact
- * points from the local DC.
- *
 * @deprecated The remote DC settings for DC-aware are not suitable for most
 * scenarios that require DC failover. There is also unhandled gap between
 * replication factor number of nodes failing and the full cluster failing. Only


Wow! Is it going to be the first time when profile mapping feature is actually useful (and used by anyone!) ?

wprzytula · 2024-10-17T14:31:45Z

scylla-rust-wrapper/src/cluster.rs

+                            "eu-east\0".as_ptr() as *const i8,
+                            "rack1\0".as_ptr() as *const i8,


Let's use C str literals, as they've been stabilised: https://doc.rust-lang.org/nightly/edition-guide/rust-2021/c-string-literals.html

Done. BTW, I cannot replace let empty_str = "\0".as_ptr() as *const i8; with let empty_str = c""; It results in a following ntest::timeout proc macro panic:

error: custom attribute panicked --> src/cluster.rs:884:5 | 884 | #[ntest::timeout(100)] | ^^^^^^^^^^^^^^^^^^^^^^ | = help: message: Unrecognized literal: `c""`

It happens for any c str literal constructed outside of some other macro (e.g. assert_cass_error_eq).

Oh, maybe you should open an issue to ntest?

scylla-rust-wrapper/src/cluster.rs

`self` is consumed by this method, thus there is no need to match dc_awareness by reference and to clone a local_dc string.

This is because in the following commits we will be introducing rack awareness.

…e[_n] This is an extension introduced by Scylla's fork of cpp-driver. See: scylladb/cpp-driver@9691ec0 Note: I removed this part of the docstring: ``` * With empty local_rack and local_dc, default local_dc and local_rack * is chosen from the first connected contact point, * and no remote hosts are considered in query plans. * If relying on this mechanism, be sure to use only contact * points from the local rack. ``` There are multiple reasons for this: - this behaviour does not make much sense, and we should not mimic it IMO - rust-driver does not behave like this - this is not even true for cpp-driver Why it's not true for cpp-driver: If you carefully study the changes introduced to cpp-driver in the aforementioned commit, you will notice that it's not possible for the driver to use rack aware policy with an empty strings. This is because API functions reject empty string, thus RackAwarePolicy object is never constructed in such case. ``` CassError cass_cluster_set_load_balance_rack_aware_n(CassCluster* cluster, const char* local_dc, size_t local_dc_length, const char* local_rack, size_t local_rack_length) { if (local_dc == NULL || local_dc_length == 0 || local_rack == NULL || local_rack_length == 0) { return CASS_ERROR_LIB_BAD_PARAMS; } cluster->config().set_load_balancing_policy(new RackAwarePolicy( String(local_dc, local_dc_length), String(local_rack, local_rack_length))); return CASS_OK; } ``` Why is this part of docstring included in cpp-driver then? No idea. Maybe, because `cass_cluster_set_load_balance_dc_aware` mentions something similar for empty (non-specified) dc. However, in this case it's true, since dc awareness is enabled by default in cpp-driver. See the docstring: ``` * Configures the cluster to use DC-aware load balancing. * For each query, all live nodes in a primary 'local' DC are tried first, * followed by any node from other DCs. * * <b>Note:</b> This is the default, and does not need to be called unless * switching an existing from another policy or changing settings. * Without further configuration, a default local_dc is chosen from the * first connected contact point, and no remote hosts are considered in * query plans. If relying on this mechanism, be sure to use only contact * points from the local DC. ```

…lance_rack_aware[_n] This is an extension to the extension. cpp-driver does not implement it for some reason.

This is not true (and I doubt it will ever be) for cpp-rust-driver.

Added test cases for rack-awareness, and extended dc-awareness tests by empty and nullptr parameters checks.

muzarski · 2024-10-24T10:49:38Z

v2.1: rebased on master. (now there is only one cassadra.h header.

Still waiting for: #197

muzarski requested review from dkropachev, Lorak-mmk and wprzytula October 17, 2024 13:23

muzarski force-pushed the rack-awareness branch from 6b33262 to ce7b5e6 Compare October 17, 2024 13:27

muzarski self-assigned this Oct 17, 2024

Lorak-mmk requested changes Oct 17, 2024

View reviewed changes

muzarski force-pushed the rack-awareness branch from ce7b5e6 to ea3e77e Compare October 17, 2024 14:24

muzarski requested a review from Lorak-mmk October 17, 2024 14:26

wprzytula requested changes Oct 17, 2024

View reviewed changes

muzarski force-pushed the rack-awareness branch from ea3e77e to 84a0560 Compare October 17, 2024 15:30

muzarski marked this pull request as draft October 22, 2024 09:39

muzarski added 6 commits October 24, 2024 12:47

lbp_config: remove needless string allocation

7b8572a

`self` is consumed by this method, thus there is no need to match dc_awareness by reference and to clone a local_dc string.

lbp_config: remove DcAwareness and introduce NodeLocationPreference enum

8c10e77

This is because in the following commits we will be introducing rack awareness.

exec_profile: define and implement cass_execution_profile_set_load_ba…

25ee1a8

…lance_rack_aware[_n] This is an extension to the extension. cpp-driver does not implement it for some reason.

cassandra.h: remove mention about dc awareness being enabled by default

26bb720

This is not true (and I doubt it will ever be) for cpp-rust-driver.

tests: additional unit test cases for lbp config

b1bcbcd

Added test cases for rack-awareness, and extended dc-awareness tests by empty and nullptr parameters checks.

muzarski force-pushed the rack-awareness branch from 84a0560 to b1bcbcd Compare October 24, 2024 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lbp: rack awareness #195

lbp: rack awareness #195

muzarski commented Oct 17, 2024 •

edited

Loading

Lorak-mmk Oct 17, 2024

wprzytula Oct 17, 2024

Lorak-mmk Oct 17, 2024

muzarski commented Oct 17, 2024

wprzytula Oct 17, 2024

muzarski Oct 17, 2024

wprzytula Oct 17, 2024

muzarski Oct 21, 2024 •

edited

Loading

Lorak-mmk Oct 21, 2024

muzarski Oct 21, 2024

wprzytula Oct 21, 2024 •

edited

Loading

wprzytula Oct 17, 2024

wprzytula Oct 17, 2024

muzarski Oct 17, 2024

wprzytula Oct 17, 2024

muzarski commented Oct 24, 2024

		* <b>Note:</b> Profile-based load balancing policy is disabled by default.
		* cluster load balancing policy is used when profile does not contain a policy.

		"eu-east\0".as_ptr() as *const i8,
		"rack1\0".as_ptr() as *const i8,

lbp: rack awareness #195

Are you sure you want to change the base?

lbp: rack awareness #195

Conversation

muzarski commented Oct 17, 2024 • edited Loading

Changes

Important note about documentation and default behaviour of rack aware policy

Default node location preference

Pre-review checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muzarski commented Oct 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muzarski Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wprzytula Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muzarski commented Oct 24, 2024

muzarski commented Oct 17, 2024 •

edited

Loading

muzarski Oct 21, 2024 •

edited

Loading

wprzytula Oct 21, 2024 •

edited

Loading