2022-04-27
The Rust standard library exposes two reference counting abstractions; Rc<T>
and Arc<T>
. These
make use of Rust's strict thread safety tools Send
and Sync
to allow low overhead non-atomic
reference counting with Rc<T>
that does not allow being passed between threads, along with the
more costly Arc<T>
which can be freely shared so long as T
is Send
and Sync
.
Unfortunately, since the choice between atomic and non-atomic reference counting is made in the type
signature, libraries can end up being overly pessimistic and using Arc<T>
in non-atomic codepaths
in case a downstream user might one day like to pass the object to another thread.
A silly idea I had, was that since the transition between non-atomic and atomic reference counting
can only happen while the object is not shared between threads, it should be possible to build an
Rc<T>
that can be upgraded to an Arc<T>
by using the sign of the reference count to disambiguate,
and a branch in Rc<T>
to choose between the atomic and non-atomic code-paths.
The structure itself is straightforward. An inner struct that contains the strong reference count as well as the object itself. I'm lazy and don't really have much use for weak references so we'll ignore that issue entirely and focus only on a strong count.
1
And then two different versions of the wrapper struct, one for Rc<T>
and another for Arc<T>
where Arc<T>
additionally implements Send
and Sync
when T
does. This matches the behavior
of the standard library reference counting objects.
1
1
In non-atomic mode strong count will count up from i32::MIN
while in atomic mode it will count up
from 0. This gives us a simple upgrade function, though we must make sure that if the inner value is
already in the atomic range we don't double-upgrade and wrap back around to the land of the
non-atomics.
1
The use of wrapping_add in the code here isn't because we want to actually wrap, but because it ensures we don't generate any annoying overflow handling code. #yolo #safety #fear-the-crab
Which leads finally to our increment and decrement operations. Decrement returns false if we should
Drop::drop
the object.
1
1
Filling in the rest of the owl gives us an approach where one can use Rc<T>
in single-threaded
code internally, exposing it through public APIs, while still allowing downstream users to upgrade
their Rc<T>
to enable atomic counting where necessary.
1 let rc1 = new;
2 let rc2 = rc1.clone;
3 let arc1 = from_rc;
4 let arc2 = arc1.clone;
Is this useful? Well, probably not.
We're basically adding a single predictable branch to the fast path of the Rc<T>
, which has
negligible overhead, and then adding a bit of complexity to the increment and decrement paths for
the upgraded Rc<T>
on top of their necessary lock prefixed instruction. This seems to play out
roughly like we might expect in the World's Worst Benchmark which simply spams clone()
operations
in a loop.
Option | Clone Time |
---|---|
std::sync::Rc | 3.3ns |
std::sync::Arc | 14.4ns |
Rc | 3.4ns |
Arc | 14.4ns |
Rc upgraded | 20.5ns |
Perhaps if the flexibility is really important to your API it might be worth doing something like
this, but probably, as usual, the better option is to stop spamming reference counting operations.
Thankfully, Rust's ability to track the lifetimes of references combined with the explicit clone()
function means it's much easier to avoid superfluous reference counting.
There's a more complete implementation from which the above snippets are drawn over here