Decouple table and column specs #956

wprzytula · 2024-03-13T10:23:07Z

Motivation

Every column in a Response::Result comes from the same table (and it follows that from the same keyspace as well). It's thus redundant to store a copy of TableSpec (table and keyspace names) in each ColumnSpec (name and type of a column), which was done before.

What's done

This PR moves TableSpec out of ColumnSpec and only allocates TableSpec once per each query response, effectively saving 2 * (C-1) string allocations, where C denotes the number of columns returned in the response.
TableSpec is now stored in ResultMetadata and PreparedMetadata.

As table spec is no longer available in column specs, a public field in QueryResult is added for users to still
be able to retrieve this information from QueryResult. Keep in mind that this is a temporary measure, because QueryResult in the current form will be deprecated soon as part of the upcoming deserialization refactor (#462).

Notes to reviewers

Please pay special attention to how user's experience changes after this API change. Don't they lose access to some information?

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
~~[ ] I added relevant tests for new features and bug fixes.~~
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
~~[ ] I have adjusted the documentation in ./docs/source/.~~
~~<[ ] I added appropriate Fixes: annotations to PR description.~~

github-actions · 2024-03-13T10:39:45Z

cargo semver-checks detected some API incompatibilities in this PR.
Checked commit: 0409cbe

See the following report for details:

cargo semver-checks output

./scripts/semver-checks.sh --baseline-rev 64b4afcdb4286b21f6cc1acb55266d6607f250e0
+ cargo semver-checks -p scylla -p scylla-cql --baseline-rev 64b4afcdb4286b21f6cc1acb55266d6607f250e0
     Cloning 64b4afcdb4286b21f6cc1acb55266d6607f250e0
     Parsing scylla v0.14.0 (current)
error: running cargo-doc on crate scylla failed with output:
-----
   Compiling proc-macro2 v1.0.89
   Compiling unicode-ident v1.0.13
   Compiling autocfg v1.4.0
   Compiling libc v0.2.161
    Checking cfg-if v1.0.0
   Compiling shlex v1.3.0
    Checking byteorder v1.5.0
   Compiling num-traits v0.2.19
   Compiling cc v1.1.34
    Checking pin-project-lite v0.2.15
   Compiling pkg-config v0.3.31
   Compiling quote v1.0.37
   Compiling vcpkg v0.2.15
   Compiling syn v2.0.87
    Checking once_cell v1.20.2
    Checking getrandom v0.2.15
   Compiling slab v0.4.9
   Compiling openssl-sys v0.9.104
   Compiling ident_case v1.0.1
   Compiling version_check v0.9.5
   Compiling strsim v0.11.1
   Compiling fnv v1.0.7
   Compiling ahash v0.8.11
    Checking num-integer v0.1.46
   Compiling libm v0.2.11
    Checking tinyvec_macros v0.1.1
   Compiling serde v1.0.214
    Checking futures-sink v0.3.31
    Checking futures-core v0.3.31
    Checking tinyvec v1.8.0
    Checking futures-channel v0.3.31
    Checking rand_core v0.6.4
    Checking socket2 v0.5.7
    Checking mio v1.0.2
   Compiling lock_api v0.4.12
   Compiling bigdecimal v0.4.6
   Compiling num-bigint v0.3.3
   Compiling snap v1.1.1
   Compiling thiserror v1.0.67
    Checking powerfmt v0.2.0
    Checking futures-task v0.3.31
    Checking foreign-types-shared v0.1.1
    Checking pin-utils v0.1.0
    Checking futures-io v0.3.31
    Checking bytes v1.8.0
    Checking static_assertions v1.1.0
    Checking memchr v2.7.4
   Compiling parking_lot_core v0.9.10
   Compiling openssl v0.10.68
    Checking twox-hash v1.6.3
    Checking foreign-types v0.3.2
    Checking deranged v0.3.11
    Checking unicode-normalization v0.1.24
   Compiling synstructure v0.13.1
   Compiling darling_core v0.20.10
    Checking num-bigint v0.4.6
    Checking zeroize v1.8.1
    Checking scopeguard v1.2.0
    Checking smallvec v1.13.2
    Checking bitflags v2.6.0
   Compiling tokio-openssl v0.6.5
    Checking time-core v0.1.2
    Checking hashbrown v0.15.0
    Checking unicode-bidi v0.3.17
    Checking percent-encoding v2.3.1
    Checking iana-time-zone v0.1.61
    Checking stable_deref_trait v1.2.0
    Checking allocator-api2 v0.2.18
    Checking equivalent v1.0.1
    Checking num-conv v0.1.0
    Checking time v0.3.36
   Compiling zerocopy-derive v0.7.35
   Compiling tokio-macros v2.4.0
   Compiling futures-macro v0.3.31
   Compiling darling_macro v0.20.10
   Compiling zerofrom-derive v0.1.4
    Checking zerocopy v0.7.35
   Compiling serde_derive v1.0.214
    Checking futures-util v0.3.31
    Checking ppv-lite86 v0.2.20
   Compiling darling v0.20.10
    Checking tokio v1.41.0
    Checking zerofrom v0.1.4
   Compiling thiserror-impl v1.0.67
   Compiling openssl-macros v0.1.1
   Compiling yoke-derive v0.7.4
    Checking futures-executor v0.3.31
   Compiling scylla-macros v0.6.0 (/home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-macros)
    Checking yoke v0.7.4
    Checking hashbrown v0.14.5
    Checking rand_chacha v0.3.1
   Compiling async-trait v0.1.83
   Compiling tracing-attributes v0.1.27
    Checking indexmap v2.6.0
    Checking chrono v0.4.38
    Checking form_urlencoded v1.2.1
    Checking idna v0.5.0
    Checking secrecy v0.8.0
    Checking lz4_flex v0.11.3
    Checking uuid v1.11.0
    Checking tracing-core v0.1.32
    Checking either v1.13.0
    Checking ryu v1.0.18
    Checking itoa v1.0.11
    Checking unsafe-libyaml v0.2.11
    Checking itertools v0.13.0
    Checking tracing v0.1.40
    Checking dashmap v5.5.3
    Checking scylla-cql v0.3.0 (/home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql)
    Checking url v2.5.2
    Checking serde_yaml v0.9.34+deprecated
    Checking rand v0.8.5
�[38;5;9merror[E0063]: missing field `table_spec` in initializer of `response::result::ResultMetadata<'_>`
    �[38;5;12m--> /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/frame/response/result.rs:1202:29
     �[38;5;12m|
�[38;5;12m1202 �[38;5;12m| �[38;5;12m...                   ResultMetadata {
     �[38;5;12m|                       �[38;5;9m^^^^^^^^^^^^^^ �[38;5;9mmissing `table_spec`

    Checking futures v0.3.31
    Checking rand_pcg v0.3.1
    Checking base64 v0.22.1
    Checking lazy_static v1.5.0
    Checking arc-swap v1.7.1
    Checking histogram v0.6.9
�[38;5;9merror[E0308]: mismatched types
    �[38;5;12m--> /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/frame/response/result.rs:1058:5
     �[38;5;12m|
�[38;5;12m1047 �[38;5;12m| ) -> StdResult<(TableSpec<'static>, Vec<ColumnSpec<'static>>), ColumnSpecParseError> {
     �[38;5;12m|      �[38;5;12m------------------------------------------------------------------------------- �[38;5;12mexpected `std::result::Result<(response::result::TableSpec<'static>, Vec<response::result::ColumnSpec<'static>>), ColumnSpecParseError>` because of return type
�[38;5;12m...
�[38;5;12m1058 �[38;5;12m|     result
     �[38;5;12m|     �[38;5;9m^^^^^^ �[38;5;9mexpected `Result<(TableSpec<'_>, ...), ...>`, found `Result<Vec<ColumnSpec<'_>>, ...>`
     �[38;5;12m|
     �[38;5;12m= note: expected enum `std::result::Result<(response::result::TableSpec<'static>, Vec<response::result::ColumnSpec<'static>>), _>`
                found enum `std::result::Result<Vec<response::result::ColumnSpec<'static>>, _>`

�[38;5;9merror[E0599]: `response::result::TableSpec<'_>` is not an iterator
    �[38;5;12m--> /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/frame/response/result.rs:1087:37
     �[38;5;12m|
�[38;5;12m45   �[38;5;12m| pub struct TableSpec<'a> {
     �[38;5;12m| �[38;5;12m------------------------ �[38;5;12mmethod `map` not found for this struct because it doesn't satisfy `response::result::TableSpec<'_>: Iterator`
�[38;5;12m...
�[38;5;12m1087 �[38;5;12m|         let table_spec = table_spec.map(TableSpec::into_owned);
     �[38;5;12m|                                     �[38;5;9m^^^ �[38;5;9m`response::result::TableSpec<'_>` is not an iterator
     �[38;5;12m|
     �[38;5;12m= note: the following trait bounds were not satisfied:
             `response::result::TableSpec<'_>: Iterator`
             which is required by `&mut response::result::TableSpec<'_>: Iterator`
�[38;5;10mnote: the trait `Iterator` must be implemented
    �[38;5;12m--> /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/traits/iterator.rs:39:1
     �[38;5;12m= help: items from traits can only be used if the trait is implemented and in scope
     �[38;5;12m= note: the following trait defines an item `map`, perhaps you need to implement it:
             candidate #1: `Iterator`

�[38;5;9merror[E0599]: `response::result::TableSpec<'_>` is not an iterator
    �[38;5;12m--> /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/frame/response/result.rs:1276:32
     �[38;5;12m|
�[38;5;12m45   �[38;5;12m| pub struct TableSpec<'a> {
     �[38;5;12m| �[38;5;12m------------------------ �[38;5;12mmethod `map` not found for this struct because it doesn't satisfy `response::result::TableSpec<'_>: Iterator`
�[38;5;12m...
�[38;5;12m1276 �[38;5;12m|         table_spec: table_spec.map(TableSpec::into_owned),
     �[38;5;12m|                                �[38;5;9m^^^ �[38;5;9m`response::result::TableSpec<'_>` is not an iterator
     �[38;5;12m|
     �[38;5;12m= note: the following trait bounds were not satisfied:
             `response::result::TableSpec<'_>: Iterator`
             which is required by `&mut response::result::TableSpec<'_>: Iterator`
�[38;5;10mnote: the trait `Iterator` must be implemented
    �[38;5;12m--> /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/iter/traits/iterator.rs:39:1
     �[38;5;12m= help: items from traits can only be used if the trait is implemented and in scope
     �[38;5;12m= note: the following trait defines an item `map`, perhaps you need to implement it:
             candidate #1: `Iterator`

Some errors have detailed explanations: E0063, E0308, E0599.
For more information about an error, try `rustc --explain E0063`.
error: could not compile `scylla-cql` (lib) due to 4 previous errors

-----

error: failed to build rustdoc for crate scylla v0.14.0
note: this is usually due to a compilation error in the crate,
      and is unlikely to be a bug in cargo-semver-checks
note: the following command can be used to reproduce the compilation error:
      cargo new --lib example &&
          cd example &&
          echo '[workspace]' >> Cargo.toml &&
          cargo add --path /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla --no-default-features --features bigdecimal-04,chrono-04,cloud,default,full-serialization,num-bigint-03,num-bigint-04,secrecy-08,ssl,time-03 &&
          cargo check

     Parsing scylla-cql v0.3.0 (current)
      Parsed [  11.496s] (current)
     Parsing scylla-cql v0.3.0 (baseline)
      Parsed [  10.134s] (baseline)
    Checking scylla-cql v0.3.0 -> v0.3.0 (no change)
     Checked [   0.103s] 94 checks: 92 pass, 2 fail, 0 warn, 0 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.36.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field PreparedMetadata.table_spec in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/frame/response/result.rs:591

--- failure struct_missing: pub struct removed or renamed ---

Description:
A publicly-visible struct cannot be imported by its prior path. A `pub use` may have been removed, or the struct itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.36.0/src/lints/struct_missing.ron

Failed in:
  struct scylla_cql::frame::response::result::Rows, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-64b4afcdb4286b21f6cc1acb55266d6607f250e0/7cd63be660774e034f17246d4d786d0cc0c76c91/scylla-cql/src/frame/response/result.rs:588

     Summary semver requires new major version: 2 major and 0 minor checks failed
    Finished [  21.786s] scylla-cql
error: aborting due to failure to build rustdoc for crate scylla v0.14.0

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
   1: anyhow::__private::format_err
   2: cargo_semver_checks::rustdoc_gen::generate_rustdoc
   3: <cargo_semver_checks::rustdoc_gen::RustdocFromProjectRoot as cargo_semver_checks::rustdoc_gen::RustdocGenerator>::load_rustdoc
   4: cargo_semver_checks::generate_versioned_crates
   5: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
   6: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
   7: cargo_semver_checks::Check::check_release
   8: cargo_semver_checks::exit_on_error
   9: cargo_semver_checks::main
  10: std::sys::backtrace::__rust_begin_short_backtrace
  11: std::rt::lang_start::{{closure}}
  12: std::rt::lang_start_internal
  13: main
  14: <unknown>
  15: __libc_start_main
  16: _start
make: *** [Makefile:61: semver-rev] Error 1

Lorak-mmk · 2024-03-13T11:13:20Z

scylla-cql/src/frame/response/result.rs

+// Only allocates a new `TableSpec` if one is not yet given.
+// TODO: consider equality check between known and deserialized spec.
+fn deser_table_spec(
+    buf: &mut &[u8],
+    known_spec: Option<TableSpec>,
+) -> StdResult<TableSpec, ParseError> {
+    let ks_name = types::read_string(buf)?;
+    let table_name = types::read_string(buf)?;
+
+    Ok(known_spec.unwrap_or_else(|| TableSpec {
+        ks_name: ks_name.to_owned(),
+        table_name: table_name.to_owned(),
+    }))


I'd like those equality checks to be a part of this PR so that we have more chances of catching any issues caused by this change.

As you know CQL protocol allows table spec to be either global or per column.
Do you know why that is? I assume there is some reason for the server to not always send the global one, and right now this PR assumes there is no reason and we can just use table spec of last column everywhere.

If there really is no reason for those per-column specs to exist, then that should be explained in the comment, and there should still be panicking checks in case this assumption turns out to be false.

Right, let's mind the Chesterton’s Fence: don’t ever take down a fence until you know why it was put up.
I'll investigate those per-column table specs.

Python driver makes the same assumption, i.e. uses the table spec of the first column for all columns.

This may be a good question for #engineering channel on Slack. I'd like to know when will Scylla / Cassandra send global spec and when per-column spec, and if it's posibble for columns specs to differ. If it's not possible, then learning a bit of history would be beneficial here imo: was it possible in the past? What was the case for it?

@nyh This is what I've talked to you about.

@Lorak-mmk I'm willing to merge this. This seems to be a worthy optimisation.

To sum up, at least the Python driver uses the first column table and keyspace names as the names for all columns, and no one ever complained about that.
Based on our research and scylladb/scylladb#17788 (comment), all columns are going to have the same keyspace and table names, so we can represent them only once.

As discussed, I'm going change the code so that it checks that those name are indeed the same and returns ParseError in case this assumption is violated, this way avoiding quiet passes in such case.

One more thing to consider: how likely is it that in some future version Cassandra / Scylla adds some form of joins / multi table queries?
AFAIK Cassandra already added ACID transactions (using their ACCORD algorithm), it doesn't seem so improbable for them to add something that queries more than 1 table in the future.
As those structs you modify are public, supporting this would require a breaking change.

Do you think we could use Cow / Arc or something else to make this future-proof? That way we could have global table spec in ResultMetadata, but also have per-column spec if necessary.

Adding joins to CQL would be such a giant change in ScyllaDB - wide-column DBs aren't made for joins - that I strongly doubt it will ever happen.
Arcs and Cows have considerable overhead that I don't deem worth incurring here just due to an extremely unprobable scenario.

wprzytula · 2024-03-26T10:43:11Z

v2:

the assumption of the same table specs for every code is checked and else ParseError is returned.

This is a tiny refactor, which makes further commits easier on eyes.

When a lifetime is not given explicitly in a value returned from a method, the rules of lifetime elision make the value bounded by the lifetime of Self (the implied 's lifetime in `&'s self`). In case of `ResultMetadata::col_specs`, `ColumnSpec`'s lifetime parameter was unnecessarily bound to the lifetime of `ResultMetadata`. Also, the commit lifts the requirement of `ResultMetadata::new_for_test` for the `ColumnSpec` given as an argument to have `'static` lifetime.

Introduces new structs for lazy deserialization of RESULT:Rows frames. Introduces `ResultMetadataHolder`, which is, unsurprisingly, a versatile holder for ResultMetadata. It allows 3 types of ownership: 1) borrowing it from somewhere, be it the RESULT:Rows frame or the cached metadata in PreparedStatement; 2) owning it after deserializing from RESULT:Rows; 3) sharing ownership of metadata cached in PreparedStatement. Introduces new structs representing the RESULT:Rows CQL frame body, in various phases of deserialization: - `RawMetadataAndRawRows` only deserializes (flags and) paging size in order to pass it directly to the user; keeps metadata and rows in a serialized, unparsed form; - `DeserializedMetadataAndRawRows` deserializes metadata, but keeps rows serialized; `DeserializedMetadataAndRawRows` is lifetime-generic and can be deserialized from `RawMetadataAndRawRows` to borrowed or owned form by corresponding methods on `RawMetadataAndRawRows`. `DeserializedMetadataAndRawRows` must abstract over two different ways of storing the frame: - shared ownership (Bytes), - borrowing (FrameSlice). The problem arises with the `rows_iter` method. - in case of `DeserializedMetadataAndRawRows<RawRowsOwned>`, the struct itself owns the frame. Therefore, the reference to `self` should have the `'frame` lifetime (and this way bound the lifetime of deserialized items). - in case of `DeserializedMetadataAndRawRows<RawRowsBorrowed>`, the struct borrows the frame with some lifetime 'frame. Therefore, the reference to `self` should only have the `'metadata` lifetime, as the frame is owned independently of Self's lifetime. This discrepancy is not expressible by enums. Therefore, an entirely separate `rows_iter` must be defined for both cases, and thus both cases must be separate types - and this is guaranteed by having a different type parameter (because they are distinct instantiations of a generic type). To restrict the type parameter of `DeserializedMetadataAndRawRows` to the two valid variants (borrowed and owned), a trait `RawRowsKind` is introduced. Credits to Karol Baryła for replacing a macro (the first approach) with a generic function. Co-authored-by: Wojciech Przytuła <[email protected]> Co-authored-by: Karol Baryła <[email protected]>

The iterator is analogous to RowIterator, but instead of borrowing from external frame bytes and metadata with 'frame lifetime, it owns them and lends them from itself. Thus, it cannot implement Iterator trait. It does not, however, prevent us from exposing a `next()` method on it. The iterator is going to be used in new iterator API for queries (i.e., the one connected to `{query,execute}_iter`), where borrowing is not suitable (or even possible) due to the design of that API. Tests of RowIterator are shared with the new RawRowsLendingIterator, by introducing a new LendingIterator trait using GATs. Due to a bug/limitation in the compiler, 'static lifetimes are needed in tests. I'd like to use the recently stabilised (1.80) std::sync::LazyLock, but our MSRV is too old. Instead, I've employed lazy_static.

Soon, a new QueryResult will be introduced with a slightly different API. The old one will be preserved as LegacyQueryResult, in order to make migration easier. This commit renames the existing QueryResult to LegacyQueryResult, as well as the query_result module to legacy_query_result. The new QueryResult will be introduced in later commits.

(Re)-introduces QueryResult. It is quite similar to the old (Legacy)QueryResult, but it keeps rows and metadata in an unparsed state (via RawRows) and has methods that allow parsing the contents using the new API. Helper method names are similar to what the old QueryResult had, just with the `_typed` suffix dropped - as now it is always required to provide the type of rows when parsing them, this suffix sounded redundant. There is one crucial change to the API. Motivation is as follows: 1) `QueryResult` can represent a non-Rows response, so every rows-related operation on `QueryResult` may return "NonRowsResponse" error, which is inconvenient; 2) `QueryResult` is an owned type, so it cannot deserialize metadata in the borrowed flavour (i.e., using strings from the frame bytes directly) and it must allocate metadata (mainly ColumnSpecs) on the heap. The solution for both is to extract a new struct, `RowsDeserializer`, which is parametrized by a lifetime and hence can borrow metadata from the frame. Moreover, one has to handle "NonRowsResponse" error only once, upon `RowsDeserializer` creation. All further methods (`rows(), `single_row()`, etc.) may no longer fail with that error, which provides a clean step of conversion from any Result frame to Result:Rows frame. The drawback is that now there is a new call required in the call chain to deserialize a result, namely `.row_deserializer()`. RowsDeserializer is parametrized by the representation of raw rows (Owned or Borrowed), analogously to how DeserializedMetadataAndRawRows are. Co-authored-by: Wojciech Przytuła <[email protected]>

New QueryResult tests are going to require result metadata serialization capabilities, as RawRows keep result metadata in a serialized form.

After the deserialization refactor with lazy result metadata deserialization, we will no longer have access to rows serialized size and rows count in Session and in RowIteratorWorker. Therefore, traces can no longer record that information. A further commit in this PR brings back serialized size to the span, but there is an important change: now the size is of both raw metadata and raw rows; before, metadata was not accounted.

Now, `result::deser_rows` returns RawRows instead of Rows, postponing any actual deserialization of response contents to a later time. The RawRows are pushed down the call stack, converted to the new QueryResult at some point and only converted to LegacyQueryResult where the API requires it. Co-authored-by: Wojciech Przytuła <[email protected]>

Now, worker-related code comes strictly before iterator/stream-related code. This locality aid readability.

This commit makes the RowIteratorWorker pass raw rows to the main tokio task, instead of the eagerly deserialized Rows. The equivalent of the old RowIterator is now RawIterator (notice a letter change). Despite the name, it cannot actually be conveniently iterated on, as it does not have any information about the column types. It exposes a `next()` method for deserializing consecutive `ColumnIterator`s. Users can manually perform deserialization using this method directly, but the preferred (typed) API will be added in the next commit. The legacy iterators are preserved by wrapping around RawIterator. Co-authored-by: Wojciech Przytuła <[email protected]>

This commit finishes the work related to adjusting the iterators module to the new deserialization framework. The previous commit brought RawIterator, which can deserialize ColumnIterators. This commit introduces new TypedRowIterator, which type-checks once and then deserializes from ColumnIterators into rows. RawIterator can be converted to TypedRowIterator by calling the `into_typed()` method. Unfortunately, due to the limitations of the Stream trait (no support for lending streams, analogous to lending iterators in case of RawRowsLendingIterator), a Stream cannot be used to deserialize borrowed types (i.e. those that borrow from the frame serialized contents). In order to give users both capabilities: 1) deserializing borrowed types (for efficiency), 2) deserializing using Stream (for convienience), two distinct types are used: TypedRowIterator and TypedRowStream. The first supports borrowed types and the second implements Stream. To sum up, instead of `RowIterator` (returning `Row`s) and `TypedRowIterator` (returning instances of the target type) both implementing `Stream`, now we have the following: - `RawIterator` - cannot implement `Stream`, because returns `ColumnIterator`s that borrow from it, - provide `type_check()` and `next()` methods that can be used for low-level, manual deserialization (not recommended for ordinary users) - supports deserializing manually borrowed types (such as `&str`). - `TypedRowIterator` - created by calling `into_typed::<TargetType>()` on `RawIterator`, - type checks upon creation, - supports deserializing borrowed types (such as `&str`), - does not implement `Stream` in order to support borrowed types, - provides basic Stream-like methods (`next()`, `try_next()`), - `TypedRowStream` - created by calling `into_stream()` on `TypedRowIterator`, - implements `Stream` and hence does not support borrowed types. Co-authored-by: Piotr Dulikowski <[email protected]>

It is no longer needed. For compatibility with LegacyQueryResult and LegacyRowIterator, higher-layer conversions suffice.

Even though we can no longer record rows serialized size without accounting metadata, we can record their serialized size together.

This is analogous to why TableSpec::borrowed is const. This simplifies tests, because those methods can be used in `static` and `const` contexts.

This is a preparation for the API change of the Session: current implementation is renamed to LegacySession, a new one will be introduced later and everything will be gradually switched to the new implementation. Co-authored-by: Wojciech Przytuła <[email protected]>

The LegacySession and the upcoming Session will differ on a small number of methods, but otherwise will share remaining ones. In order to reduce boilerplate the (Legacy)Session is converted into a generic, with a type parameter indicating the kind of the API it supports (legacy or the current one). The common methods will be implemented for GenericSession<K> for any K, and methods specific to the particular kind will only be implemented for GenericSession<K> for that particular K. Co-authored-by: Wojciech Przytuła <[email protected]>

Both Session and LegacySession will support methods that allow sending queries/prepared statements/batches and will share most of the implementation - it's just that return types will be slightly different. This commit moves the core of those methods to private methods `do_xyz` for every `xyz` method from the API. This will allow to implement the public methods for both API kinds with minimal boilerplate. Co-authored-by: Wojciech Przytuła <[email protected]>

Adds Session as an alias over GenericSession<CurrentDeserializationApi>. No methods (apart from the common ones) are added to it yet. Co-authored-by: Wojciech Przytuła <[email protected]>

This commit renames the SessionBuilder::build method to build_legacy, and then reintroduces the build method so that it returns the new Session (not LegacySession). All the examples, tests, documentation will gradually be migrated to use SessionBuilder::build again in following commits. Co-authored-by: Wojciech Przytuła <[email protected]>

This is a temporary measure. The tests are going to be modernised in parts, which is why for some time we are going to need both functions: one for LegacySession and another for modern Session.

The query/execute/batch statements are generic over the statement. They started by converting the statement to corresponding type (query/execute/batch) and then continued without the need for generics. However, those functions used to be non-trivial and would have to be monomorphised for every type of the arguments passed to the method, increasing compilation time more than necessary. Now that most of the implementation was moved to do_query etc. methods, we can restrict the generic part to the public query/execute/batch methods which convert the input statement to required type and then call the non-generic do_query etc. methods. This commit does just that - de-genericises do_query and friends, while leaving query and friends generic as they used to. Co-authored-by: Wojciech Przytuła <[email protected]>

QueryResult can be converted to LegacyQueryResult, but not the other way around. In order to support both APIs, internal methods (do_query, do_execute, etc.) need to be changed so that they return the new QueryResult. Co-authored-by: Wojciech Przytuła <[email protected]>

Implements methods related to sending queries for the new Session. Co-authored-by: Wojciech Przytuła <[email protected]>

Adjusts the methods of Connection, apart from query_iter, to use the new deserialization API. Connection is meant to be an internal API, so we don't introduce a LegacyConnection for this. Co-authored-by: Wojciech Przytuła <[email protected]>

In a similar fashion to Session, CachingSession was also made generic over the session kind. Co-authored-by: Wojciech Przytuła <[email protected]>

Adjusts the CachingSession tests to use the new deserialization interface. Co-authored-by: Wojciech Przytuła <[email protected]>

The Connection::query_iter method is changed to use the new deserialization framework. All the internal uses of it in topology.rs are adjusted. Co-authored-by: Piotr Dulikowski <[email protected]>

Adjusts the Session::try_getting_tracing_info method to use the new deserialization framework. Co-authored-by: Wojciech Przytuła <[email protected]>

This is a large commit which goes over all existing tests that haven't been migrated in previous commits and adjusts them to use the new deserialization framework. There were lots of changes to be made, but they are mostly independent from each other and very simple.

This commit goes over all unadjusted examples and changes them to use the new deserialization framework. Again, it contains a lot of changes, but they are quite simple. Co-authored-by: Wojciech Przytuła <[email protected]>

ScyllaDB does not distinguish empty collections from nulls. That is, INSERTing an empty collection is equivalent to nullifying the corresponding column. As pointed out in [scylladb#1001](scylladb#1001), it's a nice QOL feature to be able to deserialize empty CQL collections to empty Rust collections instead of `None::<RustCollection>`. A test is added that checks it.

There were plenty of places were using a stack-allocated slice &[] suffices. Not to promote bad practice of redundant heap allocation and to possibly quicken our tests, vec![x, ...] was replaced with &[x, ...] where possible.

wprzytula self-assigned this Mar 13, 2024

wprzytula requested review from Lorak-mmk and piodul March 13, 2024 10:23

github-actions bot added the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label Mar 13, 2024

Lorak-mmk requested changes Mar 13, 2024

View reviewed changes

wprzytula added the performance Improves performance of existing features label Mar 13, 2024

wprzytula mentioned this pull request Mar 13, 2024

Don't repeat keyspace and table name for every column in the results scylladb/scylladb#17788

Open

wprzytula force-pushed the decouple-table-and-col-specs branch from 5dfc972 to c37eb55 Compare March 26, 2024 10:41

wprzytula requested a review from Lorak-mmk March 26, 2024 13:52

muzarski mentioned this pull request Apr 16, 2024

Introduce support for Tablets #937

Merged

18 tasks

wprzytula added this to the 0.14.0 milestone Aug 5, 2024

wprzytula force-pushed the decouple-table-and-col-specs branch 2 times, most recently from cbda7f4 to abe53f6 Compare August 12, 2024 14:35

wprzytula mentioned this pull request Aug 14, 2024

Introduce new deserialization API #1057

Draft

8 tasks

wprzytula modified the milestones: 0.14.0, 0.15.0 Aug 20, 2024

muzarski mentioned this pull request Oct 3, 2024

make ResultMetadata lifetime-generic #1082

Merged

4 tasks

wprzytula mentioned this pull request Oct 16, 2024

Introduce new deserialization framework upper layer abstractions #1093

Open

8 tasks

wprzytula and others added 10 commits October 30, 2024 17:35

result: reposition and comment some code

4fbb315

This is a tiny refactor, which makes further commits easier on eyes.

result: metadata serialization utils for tests

d7ce383

New QueryResult tests are going to require result metadata serialization capabilities, as RawRows keep result metadata in a serialized form.

transport: add tests for new QueryResult

0af797f

wprzytula and others added 28 commits October 30, 2024 18:30

iterator: reorder code for better grouping

920244e

Now, worker-related code comes strictly before iterator/stream-related code. This locality aid readability.

result: delete legacy Rows type

7ec3c8e

It is no longer needed. For compatibility with LegacyQueryResult and LegacyRowIterator, higher-layer conversions suffice.

session,iterator: record raw metadata&rows size

1692a67

Even though we can no longer record rows serialized size without accounting metadata, we can record their serialized size together.

result: make ColumnSpec::borrowed const

778c392

This is analogous to why TableSpec::borrowed is const. This simplifies tests, because those methods can be used in `static` and `const` contexts.

WIP: yoked version

d59d593

FIX: QueryResult

10c7b98

session: re-introduce the Session type as an alias

f6f29d9

Adds Session as an alias over GenericSession<CurrentDeserializationApi>. No methods (apart from the common ones) are added to it yet. Co-authored-by: Wojciech Przytuła <[email protected]>

tests: scylla_supports_tablets[_legacy] suffix

1cc4a9a

This is a temporary measure. The tests are going to be modernised in parts, which is why for some time we are going to need both functions: one for LegacySession and another for modern Session.

session: add interface methods for the new deser API

02f5237

Implements methods related to sending queries for the new Session. Co-authored-by: Wojciech Przytuła <[email protected]>

caching_session: make generic over session APIs

dd813de

In a similar fashion to Session, CachingSession was also made generic over the session kind. Co-authored-by: Wojciech Przytuła <[email protected]>

caching_session: modernize tests

d3686d0

Adjusts the CachingSession tests to use the new deserialization interface. Co-authored-by: Wojciech Przytuła <[email protected]>

connection: migrate query_iter to new deserialization framework

f5237ad

The Connection::query_iter method is changed to use the new deserialization framework. All the internal uses of it in topology.rs are adjusted. Co-authored-by: Piotr Dulikowski <[email protected]>

{session,tracing}: switch to the new deser framework for tracing info

eb8b720

Adjusts the Session::try_getting_tracing_info method to use the new deserialization framework. Co-authored-by: Wojciech Przytuła <[email protected]>

examples: adjust to use the new interface

4fb4922

This commit goes over all unadjusted examples and changes them to use the new deserialization framework. Again, it contains a lot of changes, but they are quite simple. Co-authored-by: Wojciech Przytuła <[email protected]>

codewide: migrate doctests to new deser API

a82fa6d

treewide tests: remove needless vec![] allocations

5561b17

There were plenty of places were using a stack-allocated slice &[] suffices. Not to promote bad practice of redundant heap allocation and to possibly quicken our tests, vec![x, ...] was replaced with &[x, ...] where possible.

partial migration to TableSpec out of ColumnSpec

0409cbe

wprzytula marked this pull request as draft November 4, 2024 08:06

wprzytula force-pushed the decouple-table-and-col-specs branch from abe53f6 to 0409cbe Compare November 4, 2024 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple table and column specs #956

Decouple table and column specs #956

wprzytula commented Mar 13, 2024

github-actions bot commented Mar 13, 2024 •

edited

Loading

Lorak-mmk Mar 13, 2024

wprzytula Mar 13, 2024

wprzytula Mar 13, 2024

Lorak-mmk Mar 13, 2024

wprzytula Mar 13, 2024

wprzytula Mar 26, 2024 •

edited

Loading

Lorak-mmk Mar 26, 2024

wprzytula Mar 26, 2024

wprzytula commented Mar 26, 2024

Decouple table and column specs #956

Are you sure you want to change the base?

Decouple table and column specs #956

Conversation

wprzytula commented Mar 13, 2024

Motivation

What's done

Notes to reviewers

Pre-review checklist

github-actions bot commented Mar 13, 2024 • edited Loading

Lorak-mmk Mar 13, 2024

Choose a reason for hiding this comment

wprzytula Mar 13, 2024

Choose a reason for hiding this comment

wprzytula Mar 13, 2024

Choose a reason for hiding this comment

Lorak-mmk Mar 13, 2024

Choose a reason for hiding this comment

wprzytula Mar 13, 2024

Choose a reason for hiding this comment

wprzytula Mar 26, 2024 • edited Loading

Choose a reason for hiding this comment

Lorak-mmk Mar 26, 2024

Choose a reason for hiding this comment

wprzytula Mar 26, 2024

Choose a reason for hiding this comment

wprzytula commented Mar 26, 2024

github-actions bot commented Mar 13, 2024 •

edited

Loading

wprzytula Mar 26, 2024 •

edited

Loading