@@ 1,86 1,190 @@
-# Memory Reclamation for Rust
+# CMR - Concurrent Memory Reclamation
-CMR is a memory management system for concurrent applications. The system is described in Martin
-Hafskjold Thoresens [masters thesis](thesis). This file offers a quick outline of the reasoning
-behind CMR as well as a rough sketch on how it works.
+CMR is a memory reclamation scheme for concurrent systems in Rust. It was
+developed for Martin Hafskjold Thoresen's master thesis[^master] at the
+Norwegian University of Science and Technology, in collaboration with Dan
+Alistarh's group at IST Austria, and it draws significant inspiration from
+Forkscan[^forkscan]. The goal of the project was to explore alternatives to the
+popoular Crossbeam project for concurrent meomry management in Rust.
-## Setup
+## Overview
-[Install Rust here](https://rustup.rs/).
+CMR works by splitting the memory space of the program into two: managed
+memory, and shared memory. In managed memory lives all objects which lifetime
+is managed by the Rust compiler, like `Box<T>`, `Vec<T>`, or even `Arc<T>`s.
+This includes most of the program. Shared memory, on the other hand, is where
+the objects that the Rust compiler cannot reason about lives. This may include
+objects like nodes in a concurrent queue. CMR tracks all objects that live in
+shared memory, and is not concerned about the objects that live in managed
+memory. CMR keeps track of all pointers to shared memory, and is therefore
+able to reason about the lifetime of the objects in shared memory.
-Clone the repo, enter it, and run `cargo build`, and you're done.
+There are number of advantages of not having to keep track of all program
+memory: for instance, if the program consists of only a small part of shared
+memory - and most programs do - then the overhead of CMR only depends on the
+size of this memory, and not the size of the total program memory[^fork]. It
+also encourages programmers to minimize the use of shared memory, since this
+must be explicitly handled (one can draw similarities to `unsafe` code in
+Rust).
+
+
+## Reclaiming Memory
+
+The basic algorithm is as follows:
```
-$ git clone git@github.com:ist-daslab/rust-drop-box cmr
-$ cd cmr
-$ cargo build
+reclaim:
+ 1. have all threads report their roots
+ 2. collect all roots
+ 3. find all allocations that are reachable from the roots
+ 4. free() the allocations that were not found.
```
+`1`: this implemented using POSIX signals: each thread spawned must register a
+signal handler for `SIGUSR1`. In this signal handler we write out a pointer to
+a thread local vector, containing pointers to all roots this thread currently
+has.
+
+`2`: having pointers to the vectors the reclaming thread can simply do some
+pointer following to obtain all of the root pointers to shared memory.
-# Why
+`3`: Instead of spending valuable time doing the reachability analysis while
+all threads are waiting, we branch off a new process which does this for us.
+Then we use `mmap`ed memory to communicate between the child process and a
+background thread. The background thread gets back all pointers that were no
+longer reachable from any root, and frees this memory in `4`.
-This project implements memory management for concurrent systems in Rust. This is not in of itself
-novel work; Crossbeam, a popular umbrella project for concurrency, has [crossbeam-epoch], which
-uses the popular EBR scheme for memory reclamation. There are also implementations of Hazard
-Pointers. CMR is ment to be a concept of an alternative to these schemes.
+## Usage
+
+CMR defined its own pointer types `NullabelPtr<T>` and `Ptr<T>`. All accesses
+to shared memory is done through a `Ptr<T>`. In order to obtain a
+`NullablePtr<T>` we must load an `Atomic<T>` into a `Guard<T>`: the `Guard`
+guarantees that the location of the loaded pointer is known to CMR. `Guard`s
+are not constructed explicitly, but declared using the `guard!` macro. Thus, a
+complete read from shared memory can look like this:
+
+```rust
+guard!(my_guard);
+let nullable_ptr = cmr::guard(&some_atomic, my_guard);
+if nullable_ptr.is_null() { return Err(...); }
+let ptr = nullable_ptr.unwrap();
+let value: &T = &*ptr;
+```
-Main issues with EBR, at least the way it is implemented in Crossbeam, is that the programmer has
-to explicitly retire memory, and that operations using shared memory often include a fair amount of
-`unsafe` code. In CMR, programmers do not have to retire (and currently, *cannot* retire) memory.
-In addition, users of CMR rarely have to write `unsafe` code (with occational exceptions with
-initialization); see `data-structures` for examples.
+The lifetime system of Rust guarantees that the reference `value` is only valid
+as long as `ptr`, which in turn guarantees that `my_guard` is not changed, so
+CMR knows that this is a reachable pointer. This makes all operations in this
+example safe - no `unsafe` usage is required.
-# How
-CMR acts as a tracing garbage collector. Whenever a thread wants to read a shared pointer, the
-thread registers the address in which that pointer is to be stored (which is usually on the stack).
-This way CMR has access to all roots in the program at any time. This also allows us to guarantee
-that pointers read are either `null` or valid. When it is time to reclaim memory, which all thread
-attempt to do every once in a while, we `fork()` off a new process, and find all memory allocations
-that are not reachable from any root in the program. This is possible to do safely, as all types
-stored in a shared pointer must implement the `Trace` trait, which writes out other pointers to a
-buffer.
-An an example, we look at `MsQueue::push`:
+### Example
+
+
+As an example, this is from the `push` operation of the Michael-Scott queue:
```rust
-pub fn push(&self, t: T) {
- guards!(_new_node, _tail, _next);
- let new_node = cmr::alloc(_new_node, Node::new(t));
- loop {
- let tail = cmr::guard(_tail, &self.tail).ptr().unwrap();
- let next_ptr = &tail.next;
- let ptr = cmr::guard(_next, next_ptr);
- if ptr::addr(ptr) != 0 {
- let _ = self.tail.cas(tail, ptr, SeqCst);
- } else if next_ptr.cas(ptr::null(), new_node, SeqCst).is_ok() {
- let _ = self.tail.cas(tail, new_node, SeqCst);
- break;
- }
- }
+pub struct Node<T> {
+ data: std::mem::ManuallyDrop<T>,
+ next: cmr::Atomic<Node<T>>,
+}
+
+pub struct MsQueue<T> {
+ head: cmr::SharedGuard<Node<T>>,
+ tail: cmr::Atomic<Node<T>>,
+}
+
+impl<T> MsQueue<T> {
+ pub fn push(&self, t: T) {
+ guards!(_new_node, _tail, _next);
+ let new_node = cmr::alloc(_new_node, Node::new(t));
+ loop {
+ let tail = cmr::guard(_tail, &self.tail).ptr().unwrap();
+ let next_ptr = &tail.next;
+ let ptr = cmr::guard(_next, next_ptr);
+ if ptr::addr(ptr) != 0 {
+ let _ = self.tail.cas(tail, ptr, SeqCst);
+ } else if next_ptr.cas(ptr::null(), new_node, SeqCst).is_ok() {
+ let _ = self.tail.cas(tail, new_node, SeqCst);
+ break;
+ }
+ }
+ }
}
```
-`guards!` declare that we need three memory locations in which to store shared pointers. We
-allocate a new node with `cmr::alloc`. When loading the tail, we know that it will not be `null`,
-since the MsQueue is non-empty, so we may convert the `NullablePtr` to a `Ptr`. If the `next` ptr
-is not `null` (its address is not `0`), we `cas` the `tail` of the queue to our new suspected tail.
-If not, we `cas` the `tail` to the newly allocated node. Note that despite reading shared memory,
-this function does not contain any `unsafe` code.
+Note that the `unwrap` call afte `cmr::guard` will never fail, since the
+MS-queue has non-emptyness as an invariant.
+
+
+## Safety
+
+Usage is guaranteed to be safe, due to the forced registering of the guards,
+similarily to that of hazard pointers. A difference between the two is that
+with hazard pointers we usually require a check after protecting a pointer to
+see whether the data node in still in the data structure we're operating on;
+this is not required in CMR, since memory cannot be freed in between reading
+the pointer and registering it: `cmr::guard` makes sure of this.
+
+
+## Complications
+
+There are a number of technical complications with this scheme: for instance we
+are required to `fork` before having the signaled threads return their
+execution, since we cannot work with stale data. However, `fork` is not
+"signal-safe", meaning it's not safe to call this within a signal handler. The
+reason for this is that if a thread was in the middle of eg. allocating memory
+when it was signaled, then internal locks have been taken: these will continue
+to be taken in the child process, which blocks all allocation in this process.
+This was solved by having threads register when they are allocating, and simply
+abort the reclamation attempt should the lock be taken when the threads are
+signaled. While working, this caused significant overhead.
+
+We also require the thread local vector of roots to be readable and `clear`able
+at any time, so we had to write our own vector, the `SignalVec`, which does
+exactly this.
+
+
+
+## Performance
+
+The big question is how this performs.
+
+
+
+## Project Structure
+
+CMR itself is located in the `src` directory. Example data structures are in
+`data-structures`, and benchmarks of these are in `benchmarks`. For comparison,
+similar benchmarks for Crossbeam and mutexes from the standard library are in
+`extern-benchmarks`.
+
+Prior to Rust 1.32, programs usually had `jemalloc` bundled, which caused some
+issues with `cmr`: since we are tracking the allocations, but not the types of
+the allocation, we pass in wrong `Layout` arguments to the
+`std::alloc::dealloc` function. `jemalloc` depended on this to be correct, and
+caused a few issues, so we have a hack for it in `jemalloc-free-hack`, ensuring
+that we call `jemallocator::ffi::free` instead of the variant taking the layout
+as a parameter.
-# Current Shortcommings
+## Should *I* use CMR?
-We do not run destructors when memory is freed. This turned out to be difficult, as we needed to
-decide at runtime whether the type dropped implenented `Drop` or not. This might be possible with
-some trick.
+While I am happy with how CMR turned out, I don't think I can properly avocate
+for using CMR in any setting but for experimenting with Rust. There are a few
+reasons for this: (1) due to it's coupling with the operating system, it is
+really only usable on Linux, (2) it's overhead is higher than that of epochs in
+Crossbeam, (3) the project is not actively maintained[^activity]
-Most operations that CMR exposes uses atomics; these always use the `SeqCst` ordering. This is
-mainly done for simplicity, although correctness of the system as a whole may suffer if the user is
-allowed to specify more relaxed orderings. Experimental benchmarks show that the performance of CMR
-does not suffer from this, at least not on x86.
-[crossbeam-epoch]: https://github.com/crossbeam-rs/crossbeam-epoch
-[thesis]: https://github.com/martinhath/master-thesis/blob/master/thesis.pdf
+[^fork]: Strictly speaking, this is not the case, since we use `fork()` to
+ branch out a new process in which we run the reachability. Due to CoW
+ semantics of memory pages as implemented by the operating system, the OS have
+ to clone all pages that are changed while the child process still runs.
+[^forkscan]: TODO: link here.
+[^master]: TODO: link here.
+[^activity]: I suppose this does not need to be a hard requiremeny for a
+ library to be used, but it also means that feature requests, like Windows
+ support, will probably not be implemented.