💀 Spooky Skelington Blog 💀


The Rust standard library exposes two reference counting abstractions; Rc<T> and Arc<T>. These make use of Rust's strict thread safety tools Send and Sync to allow low overhead non-atomic reference counting with Rc<T> that does not allow being passed between threads, along with the more costly Arc<T> which can be freely shared so long as T is Send and Sync.

Unfortunately, since the choice between atomic and non-atomic reference counting is made in the type signature, libraries can end up being overly pessimistic and using Arc<T> in non-atomic codepaths in case a downstream user might one day like to pass the object to another thread.

A silly idea I had, was that since the transition between non-atomic and atomic reference counting can only happen while the object is not shared between threads, it should be possible to build an Rc<T> that can be upgraded to an Arc<T> by using the sign of the reference count to disambiguate, and a branch in Rc<T> to choose between the atomic and non-atomic code-paths.

Fucking around

The structure itself is straightforward. An inner struct that contains the strong reference count as well as the object itself. I'm lazy and don't really have much use for weak references so we'll ignore that issue entirely and focus only on a strong count.

1struct Inner<T: ?Sized> {
2 strong: AtomicI32,
3 value: T,
4}

And then two different versions of the wrapper struct, one for Rc<T> and another for Arc<T> where Arc<T> additionally implements Send and Sync when T does. This matches the behavior of the standard library reference counting objects.

1pub struct Rc<T: ?Sized> {
2 ptr: NonNull<Inner<T>>,
3 phantom: PhantomData<Inner<T>>,
4}
1pub struct Arc<T: ?Sized> {
2 ptr: NonNull<Inner<T>>,
3 phantom: PhantomData<Inner<T>>,
4}
5
6unsafe impl<T: ?Sized + Sync + Send> Send for Arc<T> {}
7unsafe impl<T: ?Sized + Sync + Send> Sync for Arc<T> {}

In non-atomic mode strong count will count up from i32::MIN while in atomic mode it will count up from 0. This gives us a simple upgrade function, though we must make sure that if the inner value is already in the atomic range we don't double-upgrade and wrap back around to the land of the non-atomics.

1fn upgrade(&self) {
2 let strong = self.strong.load(Ordering::Relaxed);
3 if strong < 0 {
4 self.strong.store(
5 strong.wrapping_add(i32::MIN),
6 Ordering::Relaxed,
7 );
8 }
9}

The use of wrapping_add in the code here isn't because we want to actually wrap, but because it ensures we don't generate any annoying overflow handling code. #yolo #safety #fear-the-crab

Which leads finally to our increment and decrement operations. Decrement returns false if we should Drop::drop the object.

1fn incr_strong(&self) {
2 let strong = self.strong.load(Ordering::Relaxed);
3 if strong < 0 {
4 self.strong.store(strong.wrapping_add(1), Ordering::Relaxed);
5 } else {
6 self.strong.fetch_add(1, Ordering::Relaxed);
7 }
8}
1fn decr_strong(&self) -> bool {
2 let strong = self.strong.load(Ordering::Relaxed);
3 if strong < 0 {
4 self.strong.store(strong.wrapping_sub(1), Ordering::Relaxed);
5 strong != i32::MIN + 1
6 } else {
7 let strong = self.strong.fetch_sub(1, Ordering::Release);
8 strong != 1
9 }
10}

Filling in the rest of the owl gives us an approach where one can use Rc<T> in single-threaded code internally, exposing it through public APIs, while still allowing downstream users to upgrade their Rc<T> to enable atomic counting where necessary.

1let rc1 = Rc::new(());
2let rc2 = rc1.clone();
3let arc1 = Arc::from_rc(&rc1);
4let arc2 = arc1.clone();

Finding out

Is this useful? Well, probably not.

We're basically adding a single predictable branch to the fast path of the Rc<T>, which has negligible overhead, and then adding a bit of complexity to the increment and decrement paths for the upgraded Rc<T> on top of their necessary lock prefixed instruction. This seems to play out roughly like we might expect in the World's Worst Benchmark which simply spams clone() operations in a loop.

OptionClone Time
std::sync::Rc3.3ns
std::sync::Arc14.4ns
Rc3.4ns
Arc14.4ns
Rc upgraded20.5ns

Perhaps if the flexibility is really important to your API it might be worth doing something like this, but probably, as usual, the better option is to stop spamming reference counting operations. Thankfully, Rust's ability to track the lifetimes of references combined with the explicit clone() function means it's much easier to avoid superfluous reference counting.

There's a more complete implementation from which the above snippets are drawn over here