Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parametrize Deserialize{Value,Row} with two lifetimes: 'frame and 'metadata separately #1101

Merged

Conversation

wprzytula
Copy link
Collaborator

Refs: #462

Tl;dr

A second lifetime parameter is introduced to entities comprising the bottom part of the new deserialization framework: 'metadata.

Deserialization traits - recap

DeserializeValue requires two types of data in order to perform deserialization:

  1. a reference to the CQL frame (a FrameSlice),
  2. the type of the column being deserialized, being part of the result metadata.

Similarly, DeserializeRow requires two types of data in order to perform deserialization:

  1. a reference to the CQL frame (a FrameSlice),
  2. a slice of specifications of all columns in the row, being part of the result metadata.

Observation

When deserializing owned types, both the frame and the metadata can have any lifetime and it's not important. When deserializing borrowed types, however, they borrow from the frame, so their lifetime must necessarily be bound by the lifetime of the frame. Metadata is only needed for the deserialization, so its lifetime does not abstractly bound the deserialized value.

Problem

Up to this PR, Deserialize{Value, Row} were only parametrized by one lifetime: 'frame, which bounded both the frame slice and the metadata. This was done that way due to an assumption that both the metadata and the frame (shared using Bytes) will be owned by the same entity. However, the new idea of deserializing result metadata to a borrowed form (to save allocations) makes result metadata's lifetime shorter than the frame's lifetime.

Implemented solution

Not to unnecessarily shorten the deserialized values' lifetime, a separate lifetime parameter is introduced for result metadata: 'metadata.

The PR is large, but the changes are mostly mechanical, by adding
the second lifetime parameter.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • [ ] I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • [ ] I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/. <-- not yet, will be done at the end of the deserialization refactor
  • [ ] I added appropriate Fixes: annotations to PR description.

Copy link

github-actions bot commented Oct 21, 2024

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: 2394d10

@wprzytula
Copy link
Collaborator Author

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳 Checked commit: 4406db5

???

This looks like a crude bug... Adding a new lifetime parameter to a trait necessarily breaks the API, in the simplest possible way: syntactically.

@wprzytula wprzytula added the API-breaking This might introduce incompatible API changes label Oct 21, 2024
Copy link
Contributor

@muzarski muzarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great explanation of why we need to introduce second lifetime parameter, so the review went smoothly

scylla-cql/src/types/deserialize/value.rs Show resolved Hide resolved
Comment on lines +441 to +443
impl_emptiable_strict_type!(chrono_04::NaiveDate, Date, |typ: &'metadata ColumnType<
'metadata,
>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatter going crazy again

@Lorak-mmk
Copy link
Collaborator

Refs: #462

Tl;dr

A second lifetime parameter is introduced to entities comprising the bottom part of the new deserialization framework: 'metadata.

Deserialization traits - recap

DeserializeValue requires two types of data in order to perform deserialization:

1. a reference to the CQL frame (a `FrameSlice`),

2. the type of the column being deserialized, being part of the result metadata.

Similarly, DeserializeRow requires two types of data in order to perform deserialization:

1. a reference to the CQL frame (a `FrameSlice`),

2. a slice of specifications of all columns in the row, being part of the result metadata.

Observation

When deserializing owned types, both the frame and the metadata can have any lifetime and it's not important. When deserializing borrowed types, however, they borrow from the frame, so their lifetime must necessarily be bound by the lifetime of the frame. Metadata is only needed for the deserialization, so its lifetime does not abstractly bound the deserialized value.

If that were the case then there would be no need to introduce second lifetime to those traits, you could just use anonymous lifetime / have a lifetime-generic method:

    fn deserialize(
        typ: &ColumnType,
        v: Option<FrameSlice<'frame>>,
    ) -> Result<Self, DeserializationError>;

The reason that seems not possible it precisely that deserialized values are bound by metadata lifetime, for example a ListlikeIterator.

This sentence confused me so much that I had to go and try to do this change (and learn why it is not possible) in order to understand why this PR introduces the second lifetime.
Please fix the sentence.

Problem

Up to this PR, Deserialize{Value, Row} were only parametrized by one lifetime: 'frame, which bounded both the frame slice and the metadata. This was done that way due to an assumption that both the metadata and the frame (shared using Bytes) will be owned by the same entity. However, the new idea of deserializing result metadata to a borrowed form (to save allocations) makes result metadata's lifetime shorter than the frame's lifetime.

Implemented solution

Not to unnecessarily shorten the deserialized values' lifetime, a separate lifetime parameter is introduced for result metadata: 'metadata.

The PR is large, but the changes are mostly mechanical, by adding the second lifetime parameter.

Copy link
Collaborator

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: such change should have new tests, right? Those should showcase usage with 2 different lifetimes.

scylla-macros/src/deserialize/mod.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/mod.rs Outdated Show resolved Hide resolved
@wprzytula
Copy link
Collaborator Author

wprzytula commented Oct 24, 2024

If that were the case then there would be no need to introduce second lifetime to those traits, you could just use anonymous lifetime / have a lifetime-generic method:

    fn deserialize(
        typ: &ColumnType,
        v: Option<FrameSlice<'frame>>,
    ) -> Result<Self, DeserializationError>;

The reason that seems not possible it precisely that deserialized values are bound by metadata lifetime, for example a ListlikeIterator.

This sentence confused me so much that I had to go and try to do this change (and learn why it is not possible) in order to understand why this PR introduces the second lifetime. Please fix the sentence.

What I had in mind was precisely about deserialized values. ListlikeIterator is not a value; it's a value deserializer. Indeed, value deserializers (MapIterator, UdtIterator, ColumnIterator, RowIterator etc.) are bound by the 'metadata lifetime. But what I wrote was intended to be about values.

Is this explanation clear enough? If so, I'll introduce the distinction between values and value deserializers and explain that value deserializers need to be parametrised by some lifetime(s) and if it were just 'frame', then values deserialized using them would be bound to the same lifetime as the metadata (because of unification of metadata's and frame's lifetimes). Hence, we need the second lifetime.

Actually, one of reasons of this misunderstanding is a quirk that DeserializeValue does not necessarily deserialize value - sometimes it creates another value deserializer.

@wprzytula wprzytula force-pushed the new-deserialization-with-two-lifetimes branch from 4406db5 to c4ba720 Compare October 25, 2024 09:51
As DeserializeValue and DeserializeRow are going to be parametrized by
two lifetimes instead of one, the common code for generating unique
lifetimes in derive macros is adjusted to generate two lifetimes.

For now, only the first generated lifetime is used (the second is
ignored), because DeserializeValue and DeserializeRow still have only
one lifetime.
`DeserializeValue` requires two types of data in order to perform
deserialization:
1) a reference to the CQL frame (a FrameSlice),
2) the type of the column being deserialized, being part of the
   ResultMetadata.

It's important to understand what is a _deserialized value_. It's not
just an implementor of DeserializeValue; there are some implementors of
`DeserializeValue` who are not yet final values, but **partially**
deserialized types that support further deserialization - _value
deserializers_, such as `ListlikeIterator` or `UdtIterator`.
_Value deserializers_, as they still need to deserialize some value, are
naturally bound by 'metadata lifetime. However, _values_ are completely
deserialized, so they should not be bound by 'metadata - only by 'frame.

When deserializing owned values, both the frame and the metadata can
have any lifetime and it's not important. When deserializing borrowing
values, however, they borrow from the frame, so their lifetime must
necessarily be bound by the lifetime of the frame. Metadata is only
needed for the deserialization, so its lifetime does not abstractly
bound the deserialized value.

Up to this commit, DeserializeValue was only parametrized by one
lifetime: 'frame, which bounded both the frame slice and the metadata.
(why? see next paragraph about value deserializers)
This was done that way due to an assumption that both the metadata and
the frame (shared using Bytes) will be owned by the same entity.
However, the new idea of deserializing result metadata to a borrowed
form (to save allocations) makes result metadata's lifetime shorter than
the frame's lifetime. Not to unnecessarily shorten the deserialized
values' lifetime, a separate lifetime parameter is introduced for
result metadata: 'metadata.

Up to this commit, value deserializers could only be parametrized by
'frame lifetime. Therefore, borrowed metadata held by value
deserializers limits the 'frame lifetime to its lifetime. Hence, values
deserialized by value deserializers must have been bounded by the
lifetime of the metadata.

The commit is large, but the changes are mostly mechanical, by adding
the second lifetime parameter.

DeserializeRow & friends are going to get the second lifetime parameter,
too, but for now they pass 'frame as both lifetime parameters of
DeserializeValue.
`DeserializeRow` requires two types of data in order to perform
deserialization:
1) a reference to the CQL frame (a FrameSlice),
2) a slice of specifications of all columns in the row, being part of
   the ResultMetadata.

It's important to understand what is a _deserialized row_. It's not
just an implementor of DeserializeRow; there are some implementors of
`DeserializeRow` who are not yet final rows, but **partially**
deserialized types that support further deserialization - _row
deserializers_, such as `ColumnIterator`.
_Row deserializers_, as they still need to deserialize some row, are
naturally bound by 'metadata lifetime. However, _rows_ are completely
deserialized, so they should not be bound by 'metadata - only by 'frame.

When deserializing owned rows, both the frame and the metadata can have
any lifetime and it's not important. When deserializing borrowing rows,
however, they borrow from the frame, so their lifetime must necessarily
be bound by the lifetime of the frame. Metadata is only needed for the
deserialization, so its lifetime does not abstractly bound the
deserialized row.

Up to this commit, DeserializeRow was only parametrized by one lifetime:
'frame, which bounded both the frame slice and the metadata. (why?
the reason is the same as in DeserializeValue - see the previous commit)
This was done that way due to an assumption that both the metadata and
the frame (shared using Bytes) will be owned by the same entity.
However, the new idea of deserializing result metadata to a borrowed
form (to save allocations) makes result metadata's lifetime shorter than
the frame's lifetime. Not to unnecessarily shorten the deserialized
values' lifetime, a separate lifetime parameter is introduced for
result metadata: 'metadata.

The commit is large, but the changes are mostly mechanical, by adding
the second lifetime parameter.

RowIterator and TypedRowIterator are going to get the second lifetime
parameter, too, but for now they pass 'frame as both lifetime parameters
of DeserializeRow.
As DeserializeValue and DeserializeRow now take two lifetime parameters:
'frame and 'metadata, RowIterator and TypedRowIterator get the second
lifetime parameter, too.
This commit introduced a test, whose goal is to assert in compile time
(by satisfying the borrow checker) that deserialized values are not
bound by the lifetime of the metadata (the column type).
This commit introduced a test, whose goal is to assert in compile time
(by satisfying the borrow checker) that deserialized rows are not
bound by the lifetime of the metadata (the column specs).
@wprzytula wprzytula force-pushed the new-deserialization-with-two-lifetimes branch from c4ba720 to 2394d10 Compare October 25, 2024 10:02
@wprzytula
Copy link
Collaborator Author

v1.1: addressed comments.

@wprzytula wprzytula merged commit 6d14015 into scylladb:main Oct 26, 2024
11 checks passed
@wprzytula wprzytula deleted the new-deserialization-with-two-lifetimes branch October 26, 2024 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-breaking This might introduce incompatible API changes area/deserialization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants