Add license
Rewrite of the README.
Remove wrong data from lngrad 80 raw
CMR is a memory reclamation scheme for concurrent systems in Rust. It was developed for Martin Hafskjold Thoresen's master thesis[^master] at the Norwegian University of Science and Technology, in collaboration with Dan Alistarh's group at IST Austria, and it draws significant inspiration from Forkscan[^forkscan]. The goal of the project was to explore alternatives to the popoular Crossbeam project for concurrent meomry management in Rust.
CMR works by splitting the memory space of the program into two: managed
memory, and shared memory. In managed memory lives all objects which lifetime
is managed by the Rust compiler, like Box<T>
, Vec<T>
, or even Arc<T>
s.
This includes most of the program. Shared memory, on the other hand, is where
the objects that the Rust compiler cannot reason about lives. This may include
objects like nodes in a concurrent queue. CMR tracks all objects that live in
shared memory, and is not concerned about the objects that live in managed
memory. CMR keeps track of all pointers to shared memory, and is therefore
able to reason about the lifetime of the objects in shared memory.
There are number of advantages of not having to keep track of all program
memory: for instance, if the program consists of only a small part of shared
memory - and most programs do - then the overhead of CMR only depends on the
size of this memory, and not the size of the total program memory[^fork]. It
also encourages programmers to minimize the use of shared memory, since this
must be explicitly handled (one can draw similarities to unsafe
code in
Rust).
The basic algorithm is as follows:
reclaim:
1. have all threads report their roots
2. collect all roots
3. find all allocations that are reachable from the roots
4. free() the allocations that were not found.
1
: this implemented using POSIX signals: each thread spawned must register a
signal handler for SIGUSR1
. In this signal handler we write out a pointer to
a thread local vector, containing pointers to all roots this thread currently
has.
2
: having pointers to the vectors the reclaming thread can simply do some
pointer following to obtain all of the root pointers to shared memory.
3
: Instead of spending valuable time doing the reachability analysis while
all threads are waiting, we branch off a new process which does this for us.
Then we use mmap
ed memory to communicate between the child process and a
background thread. The background thread gets back all pointers that were no
longer reachable from any root, and frees this memory in 4
.
CMR defined its own pointer types NullabelPtr<T>
and Ptr<T>
. All accesses
to shared memory is done through a Ptr<T>
. In order to obtain a
NullablePtr<T>
we must load an Atomic<T>
into a Guard<T>
: the Guard
guarantees that the location of the loaded pointer is known to CMR. Guard
s
are not constructed explicitly, but declared using the guard!
macro. Thus, a
complete read from shared memory can look like this:
guard!(my_guard);
let nullable_ptr = cmr::guard(&some_atomic, my_guard);
if nullable_ptr.is_null() { return Err(...); }
let ptr = nullable_ptr.unwrap();
let value: &T = &*ptr;
The lifetime system of Rust guarantees that the reference value
is only valid
as long as ptr
, which in turn guarantees that my_guard
is not changed, so
CMR knows that this is a reachable pointer. This makes all operations in this
example safe - no unsafe
usage is required.
As an example, this is from the push
operation of the Michael-Scott queue:
pub struct Node<T> {
data: std::mem::ManuallyDrop<T>,
next: cmr::Atomic<Node<T>>,
}
pub struct MsQueue<T> {
head: cmr::SharedGuard<Node<T>>,
tail: cmr::Atomic<Node<T>>,
}
impl<T> MsQueue<T> {
pub fn push(&self, t: T) {
guards!(_new_node, _tail, _next);
let new_node = cmr::alloc(_new_node, Node::new(t));
loop {
let tail = cmr::guard(_tail, &self.tail).ptr().unwrap();
let next_ptr = &tail.next;
let ptr = cmr::guard(_next, next_ptr);
if ptr::addr(ptr) != 0 {
let _ = self.tail.cas(tail, ptr, SeqCst);
} else if next_ptr.cas(ptr::null(), new_node, SeqCst).is_ok() {
let _ = self.tail.cas(tail, new_node, SeqCst);
break;
}
}
}
}
Note that the unwrap
call afte cmr::guard
will never fail, since the
MS-queue has non-emptyness as an invariant.
Usage is guaranteed to be safe, due to the forced registering of the guards,
similarily to that of hazard pointers. A difference between the two is that
with hazard pointers we usually require a check after protecting a pointer to
see whether the data node in still in the data structure we're operating on;
this is not required in CMR, since memory cannot be freed in between reading
the pointer and registering it: cmr::guard
makes sure of this.
There are a number of technical complications with this scheme: for instance we
are required to fork
before having the signaled threads return their
execution, since we cannot work with stale data. However, fork
is not
"signal-safe", meaning it's not safe to call this within a signal handler. The
reason for this is that if a thread was in the middle of eg. allocating memory
when it was signaled, then internal locks have been taken: these will continue
to be taken in the child process, which blocks all allocation in this process.
This was solved by having threads register when they are allocating, and simply
abort the reclamation attempt should the lock be taken when the threads are
signaled. While working, this caused significant overhead.
We also require the thread local vector of roots to be readable and clear
able
at any time, so we had to write our own vector, the SignalVec
, which does
exactly this.
The big question is how this performs.
CMR itself is located in the src
directory. Example data structures are in
data-structures
, and benchmarks of these are in benchmarks
. For comparison,
similar benchmarks for Crossbeam and mutexes from the standard library are in
extern-benchmarks
.
Prior to Rust 1.32, programs usually had jemalloc
bundled, which caused some
issues with cmr
: since we are tracking the allocations, but not the types of
the allocation, we pass in wrong Layout
arguments to the
std::alloc::dealloc
function. jemalloc
depended on this to be correct, and
caused a few issues, so we have a hack for it in jemalloc-free-hack
, ensuring
that we call jemallocator::ffi::free
instead of the variant taking the layout
as a parameter.
While I am happy with how CMR turned out, I don't think I can properly avocate for using CMR in any setting but for experimenting with Rust. There are a few reasons for this: (1) due to it's coupling with the operating system, it is really only usable on Linux, (2) it's overhead is higher than that of epochs in Crossbeam, (3) the project is not actively maintained[^activity]
[^fork]: Strictly speaking, this is not the case, since we use fork()
to
branch out a new process in which we run the reachability. Due to CoW
semantics of memory pages as implemented by the operating system, the OS have
to clone all pages that are changed while the child process still runs.
[^forkscan]: TODO: link here.
[^master]: TODO: link here.
[^activity]: I suppose this does not need to be a hard requiremeny for a
library to be used, but it also means that feature requests, like Windows
support, will probably not be implemented.