Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added PyRefMap and PyRefMapMut for borrowing data nested within a PyRef/PyRefMut #4203

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

JRRudy1
Copy link
Contributor

@JRRudy1 JRRudy1 commented May 24, 2024

This draft PR proposes a possible solution for issue #2300 and PR #4195, which involves adding a Ref::map-like mechanism for borrowing data nested within a PyRef or PyRefMut.

Unlike some of the approaches proposed in #2300 which attempt to create a PyRef<U> to some type U nested within a PyRef<T: PyClass>, this implementation instead defines a wrapper type that encapsulates the original PyRef alongside a raw pointer to U which it then dereferences to. This has the benefit of not relying on any of the pyo3 internals that may be in flux, and makes the safety relative easy to reason about since there is nothing magic going on.

Simplified version

In the simplest case of supporting only immutable borrows from immutable PyRef's, the approach boils down to this:

pub struct PyRefMap<'py, T: PyClass, U> {
    owner: PyRef<'py, T>,
    reference: *const U,
}

impl<'py, T: PyClass> PyRef<'py, T> {
    pub fn into_map<F, U>(self, f: F) -> PyRefMap<'py, T, U>
        where F: FnOnce(&T) -> &U 
    {
        PyRefMap {reference: f(&*self), owner: self}
    }
}

impl<'py, T: PyClass, U> Deref for PyRefMap<'py, T, U> {
    type Target = U;

    fn deref(&self) -> &U {
        // we own the `PyRef` that is guarding our (shared) access to `T`
        unsafe { &*self.reference }
    }
}

This can then be used like so:

let bound = Bound::new(py, MyClass{data: [0i32; 100]};
let data = bound.try_borrow()?.into_map(|c| &c.data);
assert_eq!(data[0], 0);

Full version

The above method should extend seamlessly to mutable borrows through PyRefMut, however this requires adding another wrapper type (e.g. PyRefMutMap) and all the associated impl's. Then another wrapper may be needed if you want to support shared borrows from a PyRefMut... this could be somewhat be avoided by making a single wrapper with some extra generic parameters, but then naming the types gets tedious. So, I ultimately decided to use a single base wrapper type PyRefMapBase<'py, U, Mut> where the contained PyRef<'py, T> or PyRefMut<'py, T> is stored as an opaque trait object Box<dyn OpaquePyRef<'py>> and the pointer is stored as NonNull<U> (which can be derived from either &U or &mut U). The mutability is then handled with the generic parameter Mut: Boolean which is used in type bounds to statically enforce that mutable dereferences only occur when derived from PyRefMut<T> and &mut U. Finally, two type aliases PyRefMap<'py, U> and PyRefMapMut<'py, U> are defined for the immutable and mutable cases, which are the actual names that users will reference (instead of PyRefMapBase directly).

This approach of Boxing the PyRef/PyRefMut has the added benefit of allowing the pyclass type T to be erased from the type name so that only the lifetime and the target type need to be included in signatures (i.e. instead of PyRefMap<'py, MyClass, [i32; 100]> you simply have PyRefMap<'py, [i32; 100]>. This comes at the cost of a pointer-sized Box allocation when the PyRef/PyRefMut is converted to a PyRefMap/PyRefMapMut, but I would imagine that is negligible in the context of interfacing with Python.

ToDo

  • Make sure it is sound when concurrency is involved.
  • Make sure it is sound when interior mutability is involved.
  • Settle on type/method names.
  • Decide where to put everything. I put it all in a new module for now just out of simplicity, but most/all of it should probably be moved to the pycell module.
  • Decide whether the PyRef->PyRefMap conversions should be associated methods; I know they probably should due to deref conflicts, but I am in denial because standard methods are just so much nicer.
  • Improve the docstrings and add examples.
  • Add more tests, including compile_fail tests to make sure it can't be abused.
  • Add AsRef/AsMut impls

JRRudy1 and others added 7 commits May 23, 2024 13:24
…ts are now opaque, w.r.t. the owner, including both the `PyRef` flavor and the underlying pyclass.
…p` struct is now opaque w.r.t. only the `PyRef` flavor and the `Frozen`-ness of the pyclass so that the struct can use either `PyRef` or `PyRefMut` as the owner. `PyRefMapMut` is no longer opaque since it only needs to support `PyRefMut` where `T` is `Frozen`.
…are now aliases for a single struct `PyRefMapBase` that has a `Mut: Boolean` generic parameter indicating whether it supports `DerefMut`. The underlying `PyRef`/`PyRefMut` flavor is opaque, as well as the pyclass `Frozen`-ness, but the pyclass type is visible.
…ar to the 4th draft, with a single base struct being reused for both `PyRefMap` and `PyRefMapMut`, but the pyclass type is now opaque as well.
@JRRudy1 JRRudy1 changed the title Added PyRefMap and PyRefMapMut for borrowing data nested PyRef/PyRefMut Added PyRefMap and PyRefMapMut for borrowing data nested within a PyRef/PyRefMut May 24, 2024
Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this could be a bit more simpified ;)

Comment on lines +24 to +38
pub struct PyRefMapBase<'py, U: 'py, Mut: Boolean> {
// Either `PyRef` or `PyRefMut` guarding the opaque pyclass from which
// the target is borrowed.
owner: Box<dyn OpaquePyRef<'py>>,
// Pointer to the `Deref` target which is (probably) borrowed from `owner`.
// This pointer is derived from either `&U` or `&mut U` as indicated by
// the `Mut` parameter.
target: NonNull<U>,
// Marks whether mutable methods are supported. If `Mut` is `True` then
// `owner` is a `PyRefMut` and `target` was derived from `&mut U`, so
// the pointer may be mutably dereferenced safely; if `False`, then
// `owner` may be either `PyRef` or `PyRefMut` and `target` was derived
// from `&U` so mutable dereferencing is forbidden.
_mut: PhantomData<Mut>
}
Copy link
Member

@mejrs mejrs May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be a lot simpler, something like:

pub struct PyRefMap<'py, U: 'py> {
    value: NonNull<U>,
    borrow: &'py dyn PyClassBorrowChecker,
}

similar for a mutable variant. I haven't tried if that works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I'll play around with it a bit to see if it can be simplified.

However, I'm not sure that your method would work unless I am missing something. My motivation for storing the original PyRef/PyRefMut within the PyRefMap struct was to hold the borrow on the T: PyClass that the NonNull<U> is pointing into so that the pointer is guaranteed to be safely dereferenceable. If I allow the PyRef to be dropped, its Drop impl will call release_borrow and stop protecting T from being mutably aliased. The inner Bound<T> would also get dropped, which would decrement the reference count and potentially let T get garbage collected. The next time PyRefMap get's dereferenced would then be UB.

So to prevent all this I would have to do something messy, perhaps ManuallyDrop the PyRef and then add a Drop impl for PyRefMap that releases the borrow; and then I'd still have to find a way to decrement the T's reference count, which wouldn't be possible if I only had access to &dyn PyClassBorrowChecker. I'm also not sure how I would even get a reference to the borrow checker with the 'py lifetime; it seems like the only way to get it is by borrowing it from a &'a Bound<'py, T> which could only give you &'a dyn PyClassBorrowChecker. But even if there is a way to get around all this, it seems a lot more complicated/unsafe to me than just storing the original PyRef and letting it handle all the protection and cleanup with its existing mechanisms.

Copy link
Member

@mejrs mejrs May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I allow the PyRef to be dropped, its Drop impl will call release_borrow and stop protecting...

Of course. I was just sketching out an approach. I would really prefer to not allocate in PyRefMap::new, this is another approach to erase the pyclass type without boxing things.

A solution would be to refactor that into its own struct:

pub struct BorrowRef<'b>{
     borrow:  &'b ???,
}
impl Drop for BorrowRef {
    ...
}

pub struct PyRef<'p, T: PyClass> {
    value: NonNull<T>
    borrow: BorrowRef<'p>
}

pub struct PyRefMap<'b, U>{
    value: NonNull<U>
    borrow: BorrowRef<'p>
}

Another solution is your simplified solution that stores the PyRef inline but does not erase the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer to not allocate in PyRefMap::new

Yes I agree allocating wouldn't be ideal, and I had pretty mixed feeling about taking that approach.

Also, the current implementation of pyclass and pyref/pyrefmut is fairly complicated. We'd be open to changes to it if that would make this simpler or more efficient.

I took you up on this and was able to simplify PyRef/PyRefMut quite a bit without breaking anything. I will submit a PR for it soon to see what you guys think.

Another solution is your simplified solution that stores the PyRef inline but does not erase the type.

I personally do prefer storing the PyRef inline, and keeping the type un-erased is probably no big deal. Users can just implement their own wrapper if they want to erase the type, so I agree that it isn't worth allocating a box internally.

I have a new implementation you may like based on the reworks to PyRef/PyRefMut that I'll be proposing soon, so let's circle back after that.

@mejrs
Copy link
Member

mejrs commented May 26, 2024

Also, the current implementation of pyclass and pyref/pyrefmut is fairly complicated. We'd be open to changes to it if that would make this simpler or more efficient.

@davidhewitt
Copy link
Member

Thank you for working on this! I am 👍 on this feature although just wanted to give a heads up it might take me a few days to contribute to any review given it's a tricky one from soundness angle and it's not something I'd actively had in my head until you pushed the PR! 😂

@JRRudy1
Copy link
Contributor Author

JRRudy1 commented May 29, 2024

No problem at all David! I know I've been spamming you with PR's this week 😉

However, I'd actually like to put this PR on a brief hold until the following PR's get resolved:

  1. PR Added as_super methods to PyRef and PyRefMut. #4219
  2. A new PR that I will submit after 4219 which involves a rework to PyRef and PyRefMut that should simplify things quite a bit, including the PyRefMap implementation proposed in this PR. It should also play nicely with a post-gil-refs transition to PyRef wrapping Borrowed<T> instead of Bound<T>, which I would be interested in working on when the time comes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants