-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capacity for serialization is suboptimal #579
Comments
BTW, also recommend preallocating more space in
Here's the diff for that:
At minimum I'd suggest preallocating to 16, but I think 32 is probably a sweet spot. |
Possibly related? #385 |
Definitely related, my fix is more of a stop gap. The reality is that the entire write path could be optimized to just use a single buffer to write to, and reuse that buffer such that it generally allocates for one path and never again. |
Currently the driver is preallocating buffers for serialization using a much lower bound than it should. In most cases we know exactly how much to allocate, or we know a more accurate lower bound.
In particular, right now for types like
&[i32]
of length N only N bytes are preallocated. Instead it should besize_of::<i32i>() * N
for the bytes plussize_of::<i32>
for the tag.This is at least true for all integer types. Where it's a bit trickier is if we're in a generic context, such as with Vec, &[T], or tuples of T and if T contains data on the heap.
size_of::` is 24, but the string itself may be smaller than 24 characters (or larger, but that's less of an issue).
I would propose two things.
Value
.,size_hint
.size_hint
would return the lower bound for the size of the serialized value (including its tag).size_hint
would returnusize
, with a default implementation returning size_of::(), as that is the lower bound for all values. The default implementation is purely to preserve compatibility with any external implementations of Value.This also lets types that are relatively expensive to calculate the size of return a lower bound. For example,
HashMap<String, T>
could just return (2 * ::size_hint()) + (2 * T::size_hint())I've already implemented this in a fork.
Here are the results of benchmarking
If this is of interest I can cut a PR.
Here's the code. Very minimal changes and, again, nothing breaking.
https://github.com/scylladb/scylla-rust-driver/compare/main...colin-grapl:fix-serialize-capacity?expand=1#diff-3472ebb06a92a0acc29c561335d5af1e24b27ea23f6835fccf4f24518089a0f2
The text was updated successfully, but these errors were encountered: