自引用结构体

Pin和自引用结构体息息相关，要想引入Pin，必须先了解自引用结构体。

什么是自引用结构体？

结构体中一个字段被结构体中另一个字段引用。下面的A就是自引用结构体。

#[allow(dead_code)]
struct A<'a> {
    b: B,
    refb: &'a B,
}

#[allow(dead_code)]
struct B;

但是这样的结构体似乎无法构造？

#[test]
fn r1(){
    let b1 = B;
    let a1 = A{ b: b1, refb: &b1 };
}

error[E0382]: borrow of moved value: `b1`
  --> src\self_ref.rs:13:30
   |
12 |     let b1 = B;
   |         -- move occurs because `b1` has type `B`, which does not implement the `Copy` trait
13 |     let et = A{ b: b1, refb: &b1 };
   |                    --        ^^^ value borrowed here after move
   |                    |
   |                    value moved here

因为我们无法在move的同时引用它。如果我们用Option包装对B的引用呢？

#[allow(dead_code)]
struct A<'a> {
    refb: Option<&'a B>,
    b: B,
}

#[allow(dead_code)]
struct B;

#[test]
fn r1(){
    let b1 = B;
    let mut a1 = A{ b: b1, refb: None };
    a1.refb = Some(&a1.b);
}

这样在某个函数中执行似乎也没有什么问题，然而这样的结构体无法在函数中返回。

#[allow(dead_code)]
struct A<'a> {
    refb: Option<&'a B>,
    b: B,
}

#[allow(dead_code)]
struct B;

#[test]
fn r1() {
    let b1 = B;
    let mut a1 = A { b: b1, refb: None };
    a1.refb = Some(&a1.b);
}

fn create_b<'a>() -> A<'a> {
    let b1 = B;
    let mut a1 = A { b: b1, refb: None };
    a1.refb = Some(&a1.b);
    a1
}
-----------------
   |
68 | fn create_b<'a>() -> A<'a> {
   |             -- lifetime `'a` defined here
...
71 |     a1.refb = Some(&a1.b);
   |                    ----- borrow of `a1.b` occurs here
72 |     a1
   |     ^^
   |     |
   |     move out of `a1` occurs here
   |     returning this value requires that `a1.b` is borrowed for `'a`

这样做有什么问题呢？这是因为每次函数return都会发生内存的move，如果我们移动普通结构体（或者说非自引用结构体）是没问题的；但假如我们移动自引用结构体：

看起来指针指向了原地址，而原地址已经无效了。

另一个swap的例子

这里用裸指针代替Option，我们分别给两个struct 分配Test1，Test2
，然后swap他们。我们期望看到的是，原Test1的a，b字段都变成Test2，Test2的a，b变成Test1。

#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
}

#[allow(dead_code)]
impl Test {
    fn new(txt: &str) -> Self {
        Test {
            a: String::from(txt),
            b: std::ptr::null(),
        }
    }

    fn init(&mut self) {
        let self_ref: *const String = &self.a;
        self.b = self_ref;
    }

    fn a(&self) -> &str {
        &self.a
    }

    fn b(&self) -> &String {
        assert!(
            !self.b.is_null(),
            "Test::b called without Test::init being called first"
        );
        unsafe { &*(self.b) }
    }
}

#[test]
fn t1() {
    let mut t1 = Test::new("t1");
    t1.init();
    let mut t2 = Test::new("t2");
    t2.init();


    println!("{:?} {} {}", &t1, t1.a(),t1.b());
    println!("{:?} {} {}", &t2, t2.a(),t2.b());
    println!("---------------");
    std::mem::swap(&mut t1,&mut t2);

    println!("{:?} {} {}", &t1, t1.a(),t1.b());
    println!("{:?} {} {}", &t2, t2.a(),t2.b());
}
---------------
output:
Test { a: "t1", b: 0x16d27e620 } t1 t1
Test { a: "t2", b: 0x16d27e640 } t2 t2
---------------
Test { a: "t2", b: 0x16d27e640 } t2 t1
Test { a: "t1", b: 0x16d27e620 } t1 t2

我们期望看到
原因和前面的例子相同

自引用结构体在发生move后会带来问题。

什么时候会move

弹栈

栈弹出时将返回值move到栈顶，如果这个结构体不包含任何引用，这个过程没有任何问题。
如果返回是一个指针，或者引用指向了一个栈上分配的对象，那这个指针将随着栈帧弹出变成野指针。rust编译将报错，这时需要由用户将此对象先封装到堆上再返回。
总之，一个新分配的引用要么指向同栈帧的地址，要么指向堆。

swap/replace

当用户持有可变引用就可以调用std::mem::swap move其中的元素。

终于该说到Pin了，Pin能解决什么问题？文档中这样描述

1	Types that pin data to its location in memory.

能让内存不移动。

Rust是怎么做到的？实现原理也非常简单：

1
2
3

Similarly, Pin<&mut T> is a lot like &mut T. 
However, Pin<P> does not let clients actually obtain a Box<T> or &mut T to pinned data, 
which implies that you cannot use operations such as mem::swap

Pin Drop后其保存的东西一并释放，并且，不让外界获取其元素的可变引用。简而言之，你把东西交给Pin后，直到Pin被销毁（其中保存的东西一并销毁），都无法取出（safe的方式），甚至都不能获取可变引用，就是不给你swap的机会。没有任何黑科技，单纯从可变性和API设计上就杜绝了我们移动自引用结构体的可能。
用pin保存某个东西后，pin指针本身则是可以移动的。

那么Pin怎么用？

需要注意的是，Pin只能对指针使用。
首先认识下Unpin 这个trait。

1	pub auto trait Unpin {}

这是一个auto trait，即：自动为所有类型实现了此trait，除非：

此struct 实现了 !Unpin
此struct 包含PhantomPinned 类型的字段。

其他绝大部分类型为Unpin，用人话说就是我们前面提到的可以任意move的类型。换句话说，如果我们要用Pin就必须主动为我们的struct标记为!Unpin ，否则他就是默认Unpin。对于Unpin的struct，Pin会走一套独立的impl，由于这种struct可以随便move，Pin的约束对它也不需要生效。

impl<P: Deref<Target: Unpin>> Pin<P> {
    /// Construct a new `Pin<P>` around a pointer to some data of a type that
    /// implements [`Unpin`].
    ///
    /// Unlike `Pin::new_unchecked`, this method is safe because the pointer
    /// `P` dereferences to an [`Unpin`] type, which cancels the pinning guarantees.
    pub const fn new(pointer: P) -> Pin<P> {
        // SAFETY: the value pointed to is `Unpin`, and so has no requirements
        // around pinning.
        unsafe { Pin::new_unchecked(pointer) }
    }

    /// Unwraps this `Pin<P>` returning the underlying pointer.
    ///
    /// This requires that the data inside this `Pin` is [`Unpin`] so that we
    /// can ignore the pinning invariants when unwrapping it.
    pub const fn into_inner(pin: Pin<P>) -> P {
        pin.pointer
    }
}

看起来只是对unsafe操作的封装，因为Unpin 可以任意move，所可以这里允许用户safe地从pin中再取出。

而对于!Unpin的struct。我们则必须用unsafe code才能构建和访问这其中的字段了

// 不可变解引用
impl<P: Deref> Pin<P> {
   pub const unsafe fn new_unchecked(pointer: P) -> Pin<P> {
        Pin { pointer }
    }

   
    pub fn as_ref(&self) -> Pin<&P::Target> {
        // SAFETY: see documentation on this function
        unsafe { Pin::new_unchecked(&*self.pointer) }
    }

    pub const unsafe fn into_inner_unchecked(pin: Pin<P>) -> P {
        pin.pointer
    }
}

// 可变解引用
impl<P: DerefMut> Pin<P> {

    pub fn as_mut(&mut self) -> Pin<&mut P::Target> {
        // SAFETY: see documentation on this function
        unsafe { Pin::new_unchecked(&mut *self.pointer) }
    }

    pub fn set(&mut self, value: P::Target)
    where
        P::Target: Sized,
    {
        *(self.pointer) = value;
    }
}

...

未标记Unpin的实现都如果想get_mut几乎都是unsafe。

实际上操作起来是这个样子：


struct SS {
    s: String,
    _a: PhantomPinned,
}

#[test]
fn t1() {
    let mut s = SS {
        _a: PhantomPinned,
        s: String::from("123"),
    };
    let mut p = unsafe { Pin::new_unchecked(&mut s) };
    let j = unsafe { p.get_unchecked_mut() };
    dbg!(&j.s);
}

有一个crate可以帮我们处理这些unsafe :pin_project

#[pin_project::pin_project]
struct SS {
    s: String,
    _a: PhantomPinned,
}

#[test]
fn t1() {
    let mut s = SS {
        _a: PhantomPinned,
        s: String::from("123"),
    };
    let mut p = unsafe { Pin::new_unchecked(&mut s) };

    let mut this = p.project();
    this.s.push_str("456");
}

这里用过程宏为我们生成的project方法可以获取其中的字段。

这类库还有个轻量版：
pin-project-lite = "0.2.8"
功能类似

use pin_project_lite::*;
pin_project! {
    struct SS {
        #[pin]
        s: String,
        _a: PhantomPinned,
    }
}

#[test]
fn t1() {
    let mut s = SS {
        _a: PhantomPinned,
        s: String::from("123"),
    };
    let mut p = unsafe { Pin::new_unchecked(&mut s) };

    let pp = p.project();
    let s = pp.s;
}

只不过这个换成声明宏，都是调用project，搞嵌套pin也非常方便。
添加给对应的字段添加#[pin]即可。

Future又和Pin有什么关系？

说了半天Pin，那和Future有什么关系呢？为什么Future必须要用到pin？

实际上，Rust团队引入Pin就是用来解决Future的问题。这是由于async/await是Generator实现的，每个await将被编译为一个匿名结构体 如果在其中有跨await的引用，就会导致生成的匿名结构体 为自引用结构体。比如下面这段代码:

let mut fut = async {
    let to_borrow = String::from("Hello");
    let borrowed = &to_borrow;
    SomeResource::some_task().await;
    println!("{} world!", borrowed);
};

由于Pin只对!Unpin 生效，生成的匿名结构体也会是!Unpin，下面是生成Generator的源码

pub const fn from_generator<T>(gen: T) -> impl Future<Output = T::Return>
where
    T: Generator<ResumeTy, Yield = ()>,
{
    #[rustc_diagnostic_item = "gen_future"]
    struct GenFuture<T: Generator<ResumeTy, Yield = ()>>(T);

    // We rely on the fact that async/await futures are immovable in order to create
    // self-referential borrows in the underlying generator.
    impl<T: Generator<ResumeTy, Yield = ()>> !Unpin for GenFuture<T> {}

    impl<T: Generator<ResumeTy, Yield = ()>> Future for GenFuture<T> {
        type Output = T::Return;
        fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
            // SAFETY: Safe because we're !Unpin + !Drop, and this is just a field projection.
            let gen = unsafe { Pin::map_unchecked_mut(self, |s| &mut s.0) };

            // Resume the generator, turning the `&mut Context` into a `NonNull` raw pointer. The
            // `.await` lowering will safely cast that back to a `&mut Context`.
            match gen.resume(ResumeTy(NonNull::from(cx).cast::<Context<'static>>())) {
                GeneratorState::Yielded(()) => Poll::Pending,
                GeneratorState::Complete(x) => Poll::Ready(x),
            }
        }
    }

    GenFuture(gen)
}

总结

Pin是Rust引入用于解决可能存在的move导致的不安全问题的概念。后面的文章我们会继续探讨Rust异步的机制。

ByteDrift

Rust的Pin是咋回事?