Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't copy in From<String> for InternedString. #14674

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jacobbramley
Copy link

If we have an owned String already, then don't clone it just to internalise it.

What does this PR try to resolve?

Before, InternedString::from(String::new("blahblah")) would take ownership of the String but then make and leak a copy (str.to_string().leak()) for use in the interned_storage(). This change just avoids that copy, which I think was the intention behind the interface and the interning mechanism.

How should we test and review this PR?

Several existing tests hit this implementation, so I'd expect cargo test to suffice for functional correctness. The implementation of InternedString::new() (a few lines down) is useful as a reference.

Additional information

Whilst it looks like an optimisation, I've no particular performance objective here. This was just something I noticed whilst looking around the code.

If we have an owned `String` already, then don't clone it just to
internalise it.
@rustbot
Copy link
Collaborator

rustbot commented Oct 11, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @epage (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 11, 2024
Comment on lines +49 to +56
let mut cache = interned_storage();
let s = cache.get(item.as_str()).copied().unwrap_or_else(|| {
let s = item.leak();
cache.insert(s);
s
});

InternedString { inner: s }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without performance data, this falls to being a refactor. My main concern is that we're duplicating the logic for creating an InternedString.

Unsure whether thats sufficient to block this or not. Will think more on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to handle this is having a Cow variant of new(), and then both new() and from() call it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it just looked like an attempted optimisation — the impl for an already-owned String — that wasn't quite complete. I couldn't work out a way to avoid the duplication without allocating on a common path.

Cow would work but turns a statically known property into a runtime check, which might be reasonable, but requires benchmarking to check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncertainty about the additional complexity is why this was not done originally.

If we want to make it part of the type system without duplicating the core logic we could do something like:

fn new_inner<S: AsRef<str>>(str: S, make_stat: impl Fn(S) -> &'static str) -> InternedString {
    let mut cache = interned_storage();
    let s = cache.get(str.as_ref()).copied().unwrap_or_else(|| {
        let s = make_stat(str);
        cache.insert(s);
        s
    });
    InternedString { inner: s }
}

I don't know if it's worth the additional complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants