~akkartik/basic-layers

e3194bb87020fc753803e589ec59029e24d22565 — Kartik Agaram 11 months ago
initial commit
A  => 000organization.cc +149 -0
@@ 1,149 @@
//: You guessed right: the '000' prefix means you should start reading here.
//:
//: This project is set up to load all files with a numeric prefix. Just
//: create a new file and start hacking.
//:
//: The first few files (00*) are independent of what this program does, an
//: experimental skeleton that will hopefully make it both easier for others to
//: understand and more malleable, easier to rewrite and remould into radically
//: different shapes without breaking in subtle corner cases. The premise is
//: that understandability and rewrite-friendliness are related in a virtuous
//: cycle. Doing one well makes it easier to do the other.
//:
//: Lower down, this file contains a legal, bare-bones C++ program. It doesn't
//: do anything yet; subsequent files will contain :(...) directives to insert
//: lines into it. For example:
//:   :(after "more events")
//: This directive means: insert the following lines after a line in the
//: program containing the words "more events".
//:
//: A simple tool is included to 'tangle' all the files together in sequence
//: according to their directives into a single source file containing all the
//: code for the project, and then feed the source file to the compiler.
//: (It'll drop these comments starting with a '//:' prefix that only make
//: sense before tangling.)
//:
//: Directives free up the programmer to order code for others to read rather
//: than as forced by the computer or compiler. Each individual feature can be
//: organized in a self-contained 'layer' that adds code to many different data
//: structures and functions all over the program. The right decomposition into
//: layers will let each layer make sense in isolation.
//:
//:   "If I look at any small part of it, I can see what is going on -- I don't
//:   need to refer to other parts to understand what something is doing.
//:
//:   If I look at any large part in overview, I can see what is going on -- I
//:   don't need to know all the details to get it.
//:
//:   Every level of detail is as locally coherent and as well thought-out as
//:   any other level."
//:
//:       -- Richard Gabriel, "The Quality Without A Name"
//:          (http://dreamsongs.com/Files/PatternsOfSoftware.pdf, page 42)
//:
//: Directives are powerful; they permit inserting or modifying any point in
//: the program. Using them tastefully requires mapping out specific lines as
//: waypoints for future layers to hook into. Often such waypoints will be in
//: comments, capitalized to hint that other layers rely on their presence.
//:
//: A single waypoint might have many different code fragments hooking into
//: it from all over the codebase. Use 'before' directives to insert
//: code at a location in order, top to bottom, and 'after' directives to
//: insert code in reverse order. By convention waypoints intended for insertion
//: before begin with 'End'. Notice below how the layers line up above the "End
//: Foo" waypoint.
//:
//:   File 001          File 002                File 003
//:   ============      ===================     ===================
//:   // Foo
//:   ------------
//:              <----  :(before "End Foo")
//:                     ....
//:                     ...
//:   ------------
//:              <----------------------------  :(before "End Foo")
//:                                             ....
//:                                             ...
//:   // End Foo
//:   ============
//:
//: Here's part of a layer in color: http://i.imgur.com/0eONnyX.png. Directives
//: are shaded dark.
//:
//: Layers do more than just shuffle code around. In a well-organized codebase
//: it should be possible to stop loading after any file/layer, build and run
//: the program, and pass all tests for loaded features. (Relevant is
//: http://youtube.com/watch?v=c8N72t7aScY, a scene from "2001: A Space
//: Odyssey".) Get into the habit of running the included script called
//: 'test_layers' before you commit any changes.
//:
//: This 'subsetting guarantee' ensures that this directory contains a
//: cleaned-up narrative of the evolution of this codebase. Organizing
//: autobiographically allows newcomers to rapidly orient themselves, reading
//: the first few files to understand a simple gestalt of a program's core
//: purpose and features, and later gradually working their way through other
//: features as the need arises.
//:
//: Programmers shouldn't need to understand everything about a program to
//: hack on it. But they shouldn't be prevented from a thorough understanding
//: of each aspect either. The goal of layers is to reward curiosity.

// Includes
// End Includes

// Types
// End Types

// Function prototypes are auto-generated in the 'build' script; define your
// functions in any order. Just be sure to declare each function header all on
// one line, ending with the '{'. Our auto-generation scripts are too minimal
// and simple-minded to handle anything else.
#include "function_list"  // by convention, files ending with '_list' are auto-generated

// Globals
//
// All statements in this section should always define a single variable on a
// single line. The 'build' script will simple-mindedly auto-generate extern
// declarations for them. Remember to define (not just declare) constants with
// extern linkage in this section, since C++ global constants have internal
// linkage by default.
//
// End Globals

int main(int argc, char* argv[]) {
  atexit(reset);

  // End One-time Setup

  // Commandline Parsing
  // End Commandline Parsing

  // End Main

  return 0;
}

// Unit Tests
// End Unit Tests

//: our first directive; insert the following headers at the start of the program
:(before "End Includes")
#include <stdlib.h>

//: Without directives or with the :(code) directive, lines get added at the
//: end.
//:
//: Regardless of where functions are defined, we can call them anywhere we
//: like as long as we format the function header in a specific way: put it
//: all on a single line without indent, end the line with ') {' and no
//: trailing whitespace. As long as functions uniformly start this way, our
//: 'build' script contains a little command to automatically generate
//: declarations for them.
:(code)
void reset() {
  // End Reset
}

:(before "End Includes")
#include<iostream>
using std::cerr;

A  => 001test.cc +120 -0
@@ 1,120 @@
//: A simple test harness. To create new tests, define functions starting with
//: 'test_'. To run all tests so defined, run:
//:   $ ./mu test
//:
//: Every layer should include tests, and can reach into previous layers.
//: However, it seems like a good idea never to reach into tests from previous
//: layers. Every test should be a contract that always passes as originally
//: written, regardless of any later layers. Avoid writing 'temporary' tests
//: that are only meant to work until some layer.

:(before "End Types")
typedef void (*test_fn)(void);
:(before "Globals")
// move a global ahead into types that we can't generate an extern declaration for
const test_fn Tests[] = {
  #include "test_list"  // auto-generated; see 'build*' scripts
};

:(before "End Globals")
bool Run_tests = false;
bool Passed = true;  // set this to false inside any test to indicate failure

:(before "End Includes")
#define CHECK(X) \
  if (Passed && !(X)) { \
    cerr << "\nF - " << __FUNCTION__ << "(" << __FILE__ << ":" << __LINE__ << "): " << #X << '\n'; \
    Passed = false; \
    return;  /* Currently we stop at the very first failure. */ \
  }

#define CHECK_EQ(X, Y) \
  if (Passed && (X) != (Y)) { \
    cerr << "\nF - " << __FUNCTION__ << "(" << __FILE__ << ":" << __LINE__ << "): " << #X << " == " << #Y << '\n'; \
    cerr << "  got " << (X) << '\n';  /* BEWARE: multiple eval */ \
    Passed = false; \
    return;  /* Currently we stop at the very first failure. */ \
  }

:(before "End Reset")
Passed = true;

:(before "End Commandline Parsing")
if (argc > 1 && is_equal(argv[1], "test")) {
  Run_tests = true;  --argc;  ++argv;  // shift 'test' out of commandline args
}

:(before "End Main")
if (Run_tests) {
  // Test Runs
  // we run some tests and then exit; assume no state need be maintained afterward

  long num_failures = 0;
  // End Test Run Initialization
  time_t t;  time(&t);
  for (size_t i=0;  i < sizeof(Tests)/sizeof(Tests[0]);  ++i) {
//?     cerr << "running " << Test_names[i] << '\n';
    run_test(i);
    if (Passed) cerr << '.';
    else ++num_failures;
  }
  cerr << '\n';
  // End Tests
  if (num_failures > 0) {
    cerr << num_failures << " failure"
         << (num_failures > 1 ? "s" : "")
         << '\n';
    return 1;
  }
  return 0;
}
cerr << "nothing to do\n";
return 1;

:(code)
void run_test(size_t i) {
  if (i >= sizeof(Tests)/sizeof(Tests[0])) {
    cerr << "no test " << i << '\n';
    return;
  }
  reset();
  // End Test Setup
  (*Tests[i])();
  // End Test Teardown
}

//: Convenience: run a single test
:(before "Globals")
// Names for each element of the 'Tests' global, respectively.
const string Test_names[] = {
  #include "test_name_list"  // auto-generated; see 'build*' scripts
};
:(after "Test Runs")
string maybe_single_test_to_run = argv[argc-1];
if (!starts_with(maybe_single_test_to_run, "test_"))
  maybe_single_test_to_run.insert(0, "test_");
for (size_t i=0;  i < sizeof(Tests)/sizeof(Tests[0]);  ++i) {
  if (Test_names[i] == maybe_single_test_to_run) {
    run_test(i);
    if (Passed) cerr << ".\n";
    return 0;
  }
}

//:: Helpers
:(code)
bool is_equal(char* s, const char* lit) {
  return strncmp(s, lit, strlen(lit)) == 0;
}

bool starts_with(const string& s, const string& pat) {
  string::const_iterator a=s.begin(), b=pat.begin();
  for (/*nada*/;  a!=s.end() && b!=pat.end();  ++a, ++b)
    if (*a != *b) return false;
  return b == pat.end();
}

:(before "End Includes")
#include <stdlib.h>
#include <string>
using std::string;

A  => 002trace.cc +354 -0
@@ 1,354 @@
//: The goal of layers is to make programs more easy to understand and more
//: malleable, easy to rewrite in radical ways without accidentally breaking
//: some corner case. Tests further both goals. They help understandability by
//: letting one make small changes and get feedback. What if I wrote this line
//: like so? What if I removed this function call, is it really necessary?
//: Just try it, see if the tests pass. Want to explore rewriting this bit in
//: this way? Tests put many refactorings on a firmer footing.
//:
//: But the usual way we write tests seems incomplete. Refactorings tend to
//: work in the small, but don't help with changes to function boundaries. If
//: you want to extract a new function you have to manually test-drive it to
//: create tests for it. If you want to inline a function its tests are no
//: longer valid. In both cases you end up having to reorganize code as well as
//: tests, an error-prone activity.
//:
//: In response, this layer introduces the notion of domain-driven *white-box*
//: testing. We focus on the domain of inputs the whole program needs to
//: handle rather than the correctness of individual functions. All white-box
//: tests invoke the program in a single way: by calling run() with some
//: input. As the program operates on the input, it traces out a list of
//: _facts_ deduced about the domain:
//:   trace("label") << "fact 1: " << val;
//:
//: Tests can now check for these facts in the trace:
//:   CHECK_TRACE_CONTENTS("label", "fact 1: 34\n"
//:                                 "fact 2: 35\n");
//:
//: Since we never call anything but the run() function directly, we never have
//: to rewrite the tests when we reorganize the internals of the program. We
//: just have to make sure our rewrite deduces the same facts about the domain,
//: and that's something we're going to have to do anyway.
//:
//: To avoid the combinatorial explosion of integration tests, each layer
//: mainly logs facts to the trace with a common *label*. All tests in a layer
//: tend to check facts with this label. Validating the facts logged with a
//: specific label is like calling functions of that layer directly.
//:
//: To build robust tests, trace facts about your domain rather than details of
//: how you computed them.
//:
//: More details: http://akkartik.name/blog/tracing-tests
//:
//: ---
//:
//: Between layers and domain-driven testing, programming starts to look like a
//: fundamentally different activity. Instead of focusing on a) superficial,
//: b) local rules on c) code [like say http://blog.bbv.ch/2013/06/05/clean-code-cheat-sheet],
//: we allow programmers to engage with the a) deep, b) global structure of
//: the c) domain. If you can systematically track discontinuities in the
//: domain, you don't care if the code used gotos as long as it passed all
//: tests. If tests become more robust to run, it becomes easier to try out
//: radically different implementations for the same program. If code is
//: super-easy to rewrite, it becomes less important what indentation style it
//: uses, or that the objects are appropriately encapsulated, or that the
//: functions are referentially transparent.
//:
//: Instead of plumbing, programming becomes building and gradually refining a
//: map of the environment the program must operate under. Whether a program
//: is 'correct' at a given point in time is a red herring; what matters is
//: avoiding regression by monotonically nailing down the more 'eventful'
//: parts of the terrain. It helps readers new and old, and rewards curiosity,
//: to organize large programs in self-similar hierarchies of example tests
//: colocated with the code that makes them work.
//:
//:   "Programming properly should be regarded as an activity by which
//:   programmers form a mental model, rather than as production of a program."
//:   -- Peter Naur (http://alistair.cockburn.us/ASD+book+extract%3A+%22Naur,+Ehn,+Musashi%22)

//:: Core interface

:(before "End Includes")
// Add to the trace (in production code) {

// Example usage:
//   trace(2, "abc") << "line 1" << end();
//
// This call emits this line to the trace:
//   "2 abc: line 1"
//
// The label is a namespace for assertions.
//
// The depth is an indicator of importance, helpful for hiding irrelevant
// details in tools. It is not used in tests.

#define trace(depth, layer)  !trace_stream \
    ? /*print nothing; all args attach to the other branch*/std::cerr \
    : trace_stream->stream(depth, layer)
// }

// Assertions on the trace (in tests) {

// Check that the trace contains some sequence of lines, in sequence, though
// possibly with other lines mixed in.
// Lines are separated by newlines.
//
// Each line is of the form "label: contents".
//
// Example usage:
// To check for the example above:
//   CHECK_TRACE_CONTENTS("abc: line 1\n")
#define CHECK_TRACE_CONTENTS(...)  check_trace_contents(__FUNCTION__, __FILE__, __LINE__, __VA_ARGS__)

// Check that the trace doesn't contain a single line.
#define CHECK_TRACE_DOESNT_CONTAIN(...)  CHECK(trace_doesnt_contain(__VA_ARGS__))

// Check that a trace contains a fixed number of lines with a given label.
#define CHECK_TRACE_COUNT(label, count) \
  if (Passed && trace_count(label) != (count)) { \
    cerr << "\nF - " << __FUNCTION__ << "(" << __FILE__ << ":" << __LINE__ << "): trace_count of " << label << " should be " << count << '\n'; \
    cerr << "  got " << trace_count(label) << '\n';  /* multiple eval */ \
    cerr << trace_stream->readable_contents(label); \
    Passed = false; \
    return;  /* Currently we stop at the very first failure. */ \
  }
// }

//:: Core data structures

:(before "End Types")
struct TraceLine {
  string contents;
  string label;
  int depth;  // 0 is 'sea level'; positive integers are progressively 'deeper' and lower level
  TraceLine(string c, string l, int d);
};

struct TraceStream {
  vector<TraceLine> past_lines;
  // accumulator for current trace_line
  ostringstream* curr_stream;
  string curr_label;
  int curr_depth;
  // other stuff

  TraceStream();
  // start accumulating to a new trace line
  ostream& stream(int depth, string label);
  // finalize the trace line most recently started
  void newline();
  // extract lines matching a given label
  // empty label matches all lines
  string readable_contents(string label);
};

:(before "End Globals")
TraceStream* trace_stream = NULL;
// Trace depths can go from 0 to MAX_DEPTH (both inclusive)
const int MAX_DEPTH = 9999;
std::ofstream trace_file;

:(code)
// start accumulating to a new trace line
ostream& TraceStream::stream(int depth, string label) {
  curr_stream = new ostringstream;
  curr_label = label;
  curr_depth = depth;
  return *curr_stream;
}

// finalize the trace line most recently started
void TraceStream::newline() {
  string trim(const string& s);
  if (!curr_stream) return;
  string curr_contents = curr_stream->str();
  if (!curr_contents.empty()) {
    past_lines.push_back(TraceLine(curr_contents, trim(curr_label), curr_depth));  // preserve indent in contents
    if (trace_file)
      trace_file << std::setw(4) << curr_depth << ' ' << curr_label << ": " << curr_contents << '\n';
  }

  // clean up
  delete curr_stream;
  curr_stream = NULL;
  curr_label.clear();
  curr_depth = MAX_DEPTH;
}

TraceLine::TraceLine(string c, string l, int d) {
  contents = c;
  label = l;
  depth = d;
}

TraceStream::TraceStream() {
  curr_stream = NULL;
  curr_depth = MAX_DEPTH;
}

//: Some syntax for finalizing trace lines
//:  trace(...) << ... << end();

:(before "End Types")
// Passing any object of this type to any ostream stops adding to the current
// trace line.
struct end {};
:(code)
ostream& operator<<(ostream& os, end /*unused*/) {
  if (trace_stream) trace_stream->newline();
  return os;
}

//: Making assertions on the trace

//: first clear trace_stream before every test
:(before "End Reset")
if (trace_stream) delete trace_stream;
trace_stream = new TraceStream;

:(code)
bool check_trace_contents(string FUNCTION, string FILE, int LINE, string expected) {
  if (!Passed) return false;
  if (!trace_stream) return false;
  vector<string> expected_lines = split(expected, "\n");
  int curr_expected_line = 0;
  while (curr_expected_line < expected_lines.size() && expected_lines.at(curr_expected_line).empty())
    ++curr_expected_line;
  if (curr_expected_line == expected_lines.size()) return true;
  string label, contents;
  split_label_contents(expected_lines.at(curr_expected_line), &label, &contents);
  for (vector<TraceLine>::iterator p = trace_stream->past_lines.begin();  p != trace_stream->past_lines.end();  ++p) {
    if (label != p->label) continue;
    if (contents != trim(p->contents)) continue;
    ++curr_expected_line;
    while (curr_expected_line < expected_lines.size() && expected_lines.at(curr_expected_line).empty())
      ++curr_expected_line;
    if (curr_expected_line == expected_lines.size()) return true;
    split_label_contents(expected_lines.at(curr_expected_line), &label, &contents);
  }

  if (line_exists_anywhere(label, contents)) {
    cerr << "\nF - " << FUNCTION << "(" << FILE << ":" << LINE << "): line [" << label << ": " << contents << "] out of order in trace:\n";
    cerr << trace_stream->readable_contents("");
  }
  else {
    cerr << "\nF - " << FUNCTION << "(" << FILE << ":" << LINE << "): missing [" << contents << "] in trace:\n";
    cerr << trace_stream->readable_contents(label);
  }
  Passed = false;
  return false;
}

string TraceStream::readable_contents(string label) {
  string trim(const string& s);  // prototype
  ostringstream output;
  label = trim(label);
  for (vector<TraceLine>::iterator p = past_lines.begin();  p != past_lines.end();  ++p)
    if (label.empty() || label == p->label)
      output << std::setw(4) << p->depth << ' ' << p->label << ": " << p->contents << '\n';
  return output.str();
}

bool trace_doesnt_contain(string expected) {
  vector<string> tmp = split_first(expected, ": ");
  if (tmp.size() == 1) {
    cerr << expected << ": missing label or contents in trace line\n";
    exit(1);
  }
  return trace_count(tmp.at(0), tmp.at(1)) == 0;
}

int trace_count(string label, string line) {
  if (!trace_stream) return 0;
  long result = 0;
  for (vector<TraceLine>::iterator p = trace_stream->past_lines.begin();  p != trace_stream->past_lines.end();  ++p) {
    if (label == p->label) {
      if (line == "" || trim(line) == trim(p->contents))
        ++result;
    }
  }
  return result;
}

int trace_count_prefix(string label, string prefix) {
  if (!trace_stream) return 0;
  long result = 0;
  for (vector<TraceLine>::iterator p = trace_stream->past_lines.begin();  p != trace_stream->past_lines.end();  ++p) {
    if (label == p->label) {
      if (starts_with(trim(p->contents), trim(prefix)))
        ++result;
    }
  }
  return result;
}

void split_label_contents(const string& s, string* label, string* contents) {
  static const string delim(": ");
  size_t pos = s.find(delim);
  if (pos == string::npos) {
    *label = "";
    *contents = trim(s);
  }
  else {
    *label = trim(s.substr(0, pos));
    *contents = trim(s.substr(pos+delim.size()));
  }
}

bool line_exists_anywhere(const string& label, const string& contents) {
  for (vector<TraceLine>::iterator p = trace_stream->past_lines.begin();  p != trace_stream->past_lines.end();  ++p) {
    if (label != p->label) continue;
    if (contents == trim(p->contents)) return true;
  }
  return false;
}

// helpers

// strip whitespace at start and end of a string
string trim(const string& s) {
  string::const_iterator first = s.begin();
  while (first != s.end() && isspace(*first))
    ++first;
  if (first == s.end()) return "";
  string::const_iterator last = --s.end();
  while (last != s.begin() && isspace(*last))
    --last;
  ++last;
  return string(first, last);
}

vector<string> split(string s, string delim) {
  vector<string> result;
  size_t begin=0, end=s.find(delim);
  while (true) {
    if (end == string::npos) {
      result.push_back(string(s, begin, string::npos));
      break;
    }
    result.push_back(string(s, begin, end-begin));
    begin = end+delim.size();
    end = s.find(delim, begin);
  }
  return result;
}

vector<string> split_first(string s, string delim) {
  vector<string> result;
  size_t end=s.find(delim);
  result.push_back(string(s, 0, end));
  if (end != string::npos)
    result.push_back(string(s, end+delim.size(), string::npos));
  return result;
}

:(before "End Includes")
#include <vector>
using std::vector;
#include <iostream>
using std::ostream;
#include <fstream>
using std::ofstream;
#include <sstream>
using std::ostringstream;
#include <iomanip>

A  => 002trace_test.cc +125 -0
@@ 1,125 @@
void test_trace_check_compares() {
  trace(10, "test layer") << "foo" << end();
  CHECK_TRACE_CONTENTS("test layer: foo");
}

void test_trace_check_ignores_other_layers() {
  trace(10, "test layer 1") << "foo" << end();
  trace(10, "test layer 2") << "bar" << end();
  CHECK_TRACE_CONTENTS("test layer 1: foo");
  CHECK_TRACE_DOESNT_CONTAIN("test layer 2: foo");
}

void test_trace_check_ignores_leading_whitespace() {
  trace(10, "test layer 1") << " foo" << end();
  CHECK_EQ(trace_count("test layer 1", /*too little whitespace*/"foo"), 1);
  CHECK_EQ(trace_count("test layer 1", /*too much whitespace*/"  foo"), 1);
}

void test_trace_check_ignores_other_lines() {
  trace(10, "test layer 1") << "foo" << end();
  trace(10, "test layer 1") << "bar" << end();
  CHECK_TRACE_CONTENTS("test layer 1: foo");
}

void test_trace_check_ignores_other_lines2() {
  trace(10, "test layer 1") << "foo" << end();
  trace(10, "test layer 1") << "bar" << end();
  CHECK_TRACE_CONTENTS("test layer 1: bar");
}

void test_trace_ignores_trailing_whitespace() {
  trace(10, "test layer 1") << "foo\n" << end();
  CHECK_TRACE_CONTENTS("test layer 1: foo");
}

void test_trace_ignores_trailing_whitespace2() {
  trace(10, "test layer 1") << "foo " << end();
  CHECK_TRACE_CONTENTS("test layer 1: foo");
}

void test_trace_orders_across_layers() {
  trace(10, "test layer 1") << "foo" << end();
  trace(10, "test layer 2") << "bar" << end();
  trace(10, "test layer 1") << "qux" << end();
  CHECK_TRACE_CONTENTS("test layer 1: foo\n"
                       "test layer 2: bar\n"
                       "test layer 1: qux\n");
}

void test_trace_supports_count() {
  trace(10, "test layer 1") << "foo" << end();
  trace(10, "test layer 1") << "foo" << end();
  CHECK_EQ(trace_count("test layer 1", "foo"), 2);
}

void test_trace_supports_count2() {
  trace(10, "test layer 1") << "foo" << end();
  trace(10, "test layer 1") << "bar" << end();
  CHECK_EQ(trace_count("test layer 1", ""), 2);
}

void test_trace_count_ignores_trailing_whitespace() {
  trace(10, "test layer 1") << "foo\n" << end();
  CHECK_EQ(trace_count("test layer 1", "foo"), 1);
}

// pending: readable_contents() adds newline if necessary.
// pending: raise also prints to stderr.
// pending: raise doesn't print to stderr if Hide_errors is set.
// pending: raise doesn't have to be saved if Hide_errors is set, just printed.
// pending: raise prints to stderr if Trace_stream is NULL.
// pending: raise prints to stderr if Trace_stream is NULL even if Hide_errors is set.

// can't check trace because trace methods call 'split'

void test_split_returns_at_least_one_elem() {
  vector<string> result = split("", ",");
  CHECK_EQ(result.size(), 1);
  CHECK_EQ(result.at(0), "");
}

void test_split_returns_entire_input_when_no_delim() {
  vector<string> result = split("abc", ",");
  CHECK_EQ(result.size(), 1);
  CHECK_EQ(result.at(0), "abc");
}

void test_split_works() {
  vector<string> result = split("abc,def", ",");
  CHECK_EQ(result.size(), 2);
  CHECK_EQ(result.at(0), "abc");
  CHECK_EQ(result.at(1), "def");
}

void test_split_works2() {
  vector<string> result = split("abc,def,ghi", ",");
  CHECK_EQ(result.size(), 3);
  CHECK_EQ(result.at(0), "abc");
  CHECK_EQ(result.at(1), "def");
  CHECK_EQ(result.at(2), "ghi");
}

void test_split_handles_multichar_delim() {
  vector<string> result = split("abc,,def,,ghi", ",,");
  CHECK_EQ(result.size(), 3);
  CHECK_EQ(result.at(0), "abc");
  CHECK_EQ(result.at(1), "def");
  CHECK_EQ(result.at(2), "ghi");
}

void test_trim() {
  CHECK_EQ(trim(""), "");
  CHECK_EQ(trim(" "), "");
  CHECK_EQ(trim("  "), "");
  CHECK_EQ(trim("a"), "a");
  CHECK_EQ(trim(" a"), "a");
  CHECK_EQ(trim("  a"), "a");
  CHECK_EQ(trim("  ab"), "ab");
  CHECK_EQ(trim("a "), "a");
  CHECK_EQ(trim("a  "), "a");
  CHECK_EQ(trim("ab  "), "ab");
  CHECK_EQ(trim(" a "), "a");
  CHECK_EQ(trim("  a  "), "a");
  CHECK_EQ(trim("  ab  "), "ab");
}

A  => 003main.cc +24 -0
@@ 1,24 @@
// First implementation of the example program at https://git.sr.ht/~akkartik/basic-whitebox-test/tree/master/x.cc

int run(int x) {
  trace(0, "app") << "transforming " << x << end();
  int g(int);  // prototype
  int y = g(x);
  int z = 2*y;
  trace(0, "app") << x << " transformed to " << z << end();
  return z;
}

int g(int x) {
  int y = x+1;
  trace(1, "app") << x << " + 1 is " << y << end();
  return y;
}

void test_1() {
  run(3);
  CHECK_TRACE_CONTENTS(
      "app: transforming 3\n"
      "app: 3 + 1 is 4\n"
      "app: 3 transformed to 8\n");
}

A  => 004reorganize.cc +18 -0
@@ 1,18 @@
// Second possible implementation of the example program at https://git.sr.ht/~akkartik/basic-whitebox-test/tree/master/x.cc
//
// This is just a demonstration of the mechanisms provided, not an example of
// good taste. In real projects refactorings should just modify the layer
// involved, rather than take up a new layer.

:(replace{} "int run(int x)")
int run(int x) {
  trace(0, "app") << "transforming " << x << end();
  int y = x+1;
  trace(1, "app") << x << " + 1 is " << y << end();
  int z = 2*y;
  trace(0, "app") << x << " transformed to " << z << end();
  return z;
}

//: No new tests, but the test defined in the previous layer continues to run
//: and pass.

A  => Readme.md +54 -0
@@ 1,54 @@
Example project that allows newcomers to gradually learn about its internals.
It allows newcomers to build subsets of its features so that they can focus on
its core skeleton at first, later adding features at their own pace.

#### Try it out

First build the project:

```
./build
```

Try running the tests:

```
$ ./a.out test
```

Now try building and running tests for subsets of layers:

```
$ ./build_and_test_until 000*
$ ./build_and_test_until 001*
$ ./build_and_test_until 002*
$ ./build_and_test_until 003*
$ ./build_and_test_until 004*
```

Each command builds all .cc files that start with a numeric prefix and are
lexically less then or equal to its argument. For example, building until
`001*` includes two layers: `000organization.cc` and `001test.cc`.

In a well-formed codebase based on layers, all such subsets of layers should
pass all their tests.

Later layers override earlier ones using _directives_ of the form `:(...)`.
See http://akkartik.name/post/wart-layers for more details, and tangle/Readme.md
in this repo for a reference of supported directives.

#### Coda

This repo is pulls together several unconventional ideas:

1. A zero-dependency build system. See https://git.sr.ht/~akkartik/basic-build
for details.

1. A minimal test harness for C. See https://git.sr.ht/~akkartik/basic-test
for details.

1. Primitives for automatic _white-box_ testing, a more comprehensive and
flexible way to express automated tests. See https://git.sr.ht/~akkartik/basic-whitebox-test
for details.

For a more fully fleshed out codebase using all these ideas, check out https://github.com/akkartik/mu

A  => build +109 -0
@@ 1,109 @@
#!/bin/sh
# returns 0 on successful build or nothing to build
# non-zero exit status only on error during building
set -e  # stop immediately on error

# [0-9]*.cc -> main.cc -> a.out
# (layers)   |          |
#          tangle      $CXX

# can also be called with a layer to only build until
#   $ ./build --until 050
UNTIL_LAYER=${2:-zzz}

# we use two mechanisms to speed up rebuilds:
# - older_than: run a command if the output is older than any of the inputs
# - update: if a command is quick to run, always run it but update the result only on any change
#
# avoid combining both mechanisms to generate a single file
# otherwise you'll see spurious messages about files being updated
# risk: a file may unnecessarily update without changes, causing unnecessary work downstream

test "$CXX" || export CXX=c++
test "$CC" || export CC=cc
test "$CFLAGS" || export CFLAGS="-g -O3"
export CFLAGS="$CFLAGS -Wall -Wextra -ftrapv -fno-strict-aliasing"

# return 1 if $1 is older than _any_ of the remaining args
older_than() {
  local target=$1
  shift
  if [ ! -e $target ]
  then
#?     echo "$target doesn't exist"
    echo "updating $target" >&2
    return 0  # success
  fi
  local f
  for f in $*
  do
    if [ $f -nt $target ]
    then
      echo "updating $target" >&2
      return 0  # success
    fi
  done
  return 1  # failure
}

# redirect to $1, unless it's already identical
update() {
  if [ ! -e $1 ]
  then
    cat > $1
  else
    cat > $1.tmp
    diff -q $1 $1.tmp >/dev/null  &&  rm $1.tmp  ||  mv $1.tmp $1
  fi
}

update_cp() {
  if [ ! -e $2/$1 ]
  then
    cp $1 $2
  elif [ $1 -nt $2/$1 ]
  then
    cp $1 $2
  fi
}

noisy_cd() {
  cd $1
  echo "-- `pwd`" >&2
}

older_than enumerate/enumerate enumerate/enumerate.cc && {
  $CXX $CFLAGS enumerate/enumerate.cc -o enumerate/enumerate
}

older_than tangle/tangle tangle/*.cc && {
  noisy_cd tangle
    {
      grep -h "^struct .* {" [0-9]*.cc  |sed 's/\(struct *[^ ]*\).*/\1;/'
      grep -h "^typedef " [0-9]*.cc
    }  |update type_list
    grep -h "^[^ #].*) {" [0-9]*.cc  |sed 's/ {.*/;/'  |update function_list
    ls [0-9]*.cc  |grep -v "\.test\.cc$"  |sed 's/.*/#include "&"/'  |update file_list
    ls [0-9]*.test.cc  |sed 's/.*/#include "&"/'  |update test_file_list
    grep -h "^[[:space:]]*void test_" [0-9]*.cc  |sed 's/^\s*void \(.*\)() {$/\1,/'  |update test_list
    grep -h "^\s*void test_" [0-9]*.cc  |sed 's/^\s*void \(.*\)() {.*/"\1",/'  |update test_name_list
    $CXX $CFLAGS boot.cc -o tangle
    ./tangle test
  noisy_cd ..  # no effect; just to show us returning to the parent directory
}

LAYERS=$(enumerate/enumerate --until $UNTIL_LAYER  |grep '.cc$')
older_than main.cc $LAYERS enumerate/enumerate tangle/tangle && {
  # no update here; rely on 'update' calls downstream
  tangle/tangle $LAYERS  > main.cc
}

grep -h "^[^[:space:]#].*) {$" main.cc  |grep -v ":.*("  |sed 's/ {.*/;/'  |update function_list
grep -h "^\s*void test_" main.cc  |sed 's/^\s*void \(.*\)() {.*/\1,/'  |update test_list
grep -h "^\s*void test_" main.cc  |sed 's/^\s*void \(.*\)() {.*/"\1",/'  |update test_name_list

older_than a.out main.cc *_list && {
  $CXX $CFLAGS main.cc
}

exit 0

A  => build_and_test_until +18 -0
@@ 1,18 @@
#!/bin/sh
# Run tests for just a subset of layers.
#
# Usage:
#   build_and_test_until [file prefix] [test name]
# Provide the second arg to run just a single test.
set -e

# clean previous builds if they were building until a different layer
touch .until
PREV_UNTIL=`cat .until`
if [ "$PREV_UNTIL" != $1 ]
then
  ./clean top-level
  echo $1 > .until
fi

./build --until $1  &&  ./a.out test $2

A  => clean +5 -0
@@ 1,5 @@
#!/bin/sh

rm -rf a.out* main.cc function_list test_function_list test_list test_name_list .until
rm -rf enumerate/enumerate enumerate/enumerate.dSYM
rm -rf tangle/tangle tangle/tangle.dSYM tangle/*_list

A  => enumerate/Readme +1 -0
@@ 1,1 @@
Tool used in build process.

A  => enumerate/enumerate.cc +26 -0
@@ 1,26 @@
#include<assert.h>
#include<cstdlib>
#include<dirent.h>
#include<vector>
using std::vector;
#include<string>
using std::string;
#include<iostream>
using std::cout;

int main(int argc, const char* argv[]) {
  assert(argc == 3);
  assert(string(argv[1]) == "--until");
  string last_file(argv[2]);

  dirent** files;
  int num_files = scandir(".", &files, NULL, alphasort);
  for (int i = 0; i < num_files; ++i) {
    string curr_file = files[i]->d_name;
    if (!isdigit(curr_file.at(0))) continue;
    if (!last_file.empty() && curr_file > last_file) break;
    cout << curr_file << '\n';
  }
  // don't bother freeing files
  return 0;
}

A  => tangle/000test.cc +31 -0
@@ 1,31 @@
typedef void (*test_fn)(void);

const test_fn Tests[] = {
  #include "test_list"  // auto-generated; see 'build*' scripts
};

// Names for each element of the 'Tests' global, respectively.
const string Test_names[] = {
  #include "test_name_list"  // auto-generated; see 'build*' scripts
};

bool Passed = true;

long Num_failures = 0;

#define CHECK(X) \
  if (!(X)) { \
    ++Num_failures; \
    cerr << "\nF " << __FUNCTION__ << "(" << __FILE__ << ":" << __LINE__ << "): " << #X << '\n'; \
    Passed = false; \
    return; \
  }

#define CHECK_EQ(X, Y) \
  if ((X) != (Y)) { \
    ++Num_failures; \
    cerr << "\nF " << __FUNCTION__ << "(" << __FILE__ << ":" << __LINE__ << "): " << #X << " == " << #Y << '\n'; \
    cerr << "  got " << (X) << '\n';  /* BEWARE: multiple eval */ \
    Passed = false; \
    return; \
  }

A  => tangle/001trace.cc +139 -0
@@ 1,139 @@
bool Hide_warnings = false;

struct trace_stream {
  vector<pair<string, string> > past_lines;  // [(layer label, line)]
  // accumulator for current line
  ostringstream* curr_stream;
  string curr_layer;
  trace_stream() :curr_stream(NULL) {}
  ~trace_stream() { if (curr_stream) delete curr_stream; }

  ostringstream& stream(string layer) {
    newline();
    curr_stream = new ostringstream;
    curr_layer = layer;
    return *curr_stream;
  }

  // be sure to call this before messing with curr_stream or curr_layer
  void newline() {
    if (!curr_stream) return;
    string curr_contents = curr_stream->str();
    curr_contents.erase(curr_contents.find_last_not_of("\r\n")+1);
    past_lines.push_back(pair<string, string>(curr_layer, curr_contents));
    delete curr_stream;
    curr_stream = NULL;
  }

  string readable_contents(string layer) {  // missing layer = everything
    newline();
    ostringstream output;
    for (vector<pair<string, string> >::iterator p = past_lines.begin(); p != past_lines.end(); ++p)
      if (layer.empty() || layer == p->first)
        output << p->first << ": " << with_newline(p->second);
    return output.str();
  }

  string with_newline(string s) {
    if (s[s.size()-1] != '\n') return s+'\n';
    return s;
  }
};

trace_stream* Trace_stream = NULL;

// Top-level helper. IMPORTANT: can't nest.
#define trace(layer)  !Trace_stream ? cerr /*print nothing*/ : Trace_stream->stream(layer)
// Warnings should go straight to cerr by default since calls to trace() have
// some unfriendly constraints (they delay printing, they can't nest)
#define raise  ((!Trace_stream || !Hide_warnings) ? cerr /*do print*/ : Trace_stream->stream("warn")) << __FILE__ << ":" << __LINE__ << " "

// raise << die exits after printing -- unless Hide_warnings is set.
struct die {};
ostream& operator<<(ostream& os, __attribute__((unused)) die) {
  if (Hide_warnings) return os;
  os << "dying\n";
  exit(1);
}

#define CLEAR_TRACE  delete Trace_stream, Trace_stream = new trace_stream;

#define DUMP(layer)  cerr << Trace_stream->readable_contents(layer)

// Trace_stream is a resource, lease_tracer uses RAII to manage it.
struct lease_tracer {
  lease_tracer() { Trace_stream = new trace_stream; }
  ~lease_tracer() { delete Trace_stream, Trace_stream = NULL; }
};

#define START_TRACING_UNTIL_END_OF_SCOPE  lease_tracer leased_tracer;

bool check_trace_contents(string FUNCTION, string FILE, int LINE, string layer, string expected) {  // empty layer == everything
  vector<string> expected_lines = split(expected, "\n");
  size_t curr_expected_line = 0;
  while (curr_expected_line < expected_lines.size() && expected_lines[curr_expected_line].empty())
    ++curr_expected_line;
  if (curr_expected_line == expected_lines.size()) return true;
  Trace_stream->newline();
  ostringstream output;
  for (vector<pair<string, string> >::iterator p = Trace_stream->past_lines.begin(); p != Trace_stream->past_lines.end(); ++p) {
    if (!layer.empty() && layer != p->first)
      continue;
    if (p->second != expected_lines[curr_expected_line])
      continue;
    ++curr_expected_line;
    while (curr_expected_line < expected_lines.size() && expected_lines[curr_expected_line].empty())
      ++curr_expected_line;
    if (curr_expected_line == expected_lines.size()) return true;
  }

  ++Num_failures;
  cerr << "\nF " << FUNCTION << "(" << FILE << ":" << LINE << "): missing [" << expected_lines[curr_expected_line] << "] in trace:\n";
  DUMP(layer);
  Passed = false;
  return false;
}

#define CHECK_TRACE_CONTENTS(...)  check_trace_contents(__FUNCTION__, __FILE__, __LINE__, __VA_ARGS__)

int trace_count(string layer, string line) {
  Trace_stream->newline();
  long result = 0;
  for (vector<pair<string, string> >::iterator p = Trace_stream->past_lines.begin(); p != Trace_stream->past_lines.end(); ++p) {
    if (layer == p->first)
      if (line == "" || p->second == line)
        ++result;
  }
  return result;
}

#define CHECK_TRACE_WARNS()  CHECK(trace_count("warn", "") > 0)
#define CHECK_TRACE_DOESNT_WARN() \
  if (trace_count("warn") > 0) { \
    ++Num_failures; \
    cerr << "\nF " << __FUNCTION__ << "(" << __FILE__ << ":" << __LINE__ << "): unexpected warnings\n"; \
    DUMP("warn"); \
    Passed = false; \
    return; \
  }

bool trace_doesnt_contain(string layer, string line) {
  return trace_count(layer, line) == 0;
}

#define CHECK_TRACE_DOESNT_CONTAIN(...)  CHECK(trace_doesnt_contain(__VA_ARGS__))

vector<string> split(string s, string delim) {
  vector<string> result;
  string::size_type begin=0, end=s.find(delim);
  while (true) {
    if (end == string::npos) {
      result.push_back(string(s, begin, string::npos));
      break;
    }
    result.push_back(string(s, begin, end-begin));
    begin = end+delim.size();
    end = s.find(delim, begin);
  }
  return result;
}

A  => tangle/001trace.test.cc +91 -0
@@ 1,91 @@
void test_trace_check_compares() {
  CHECK_TRACE_CONTENTS("test layer", "");
  trace("test layer") << "foo";
  CHECK_TRACE_CONTENTS("test layer", "foo");
}

void test_trace_check_filters_layers() {
  trace("test layer 1") << "foo";
  trace("test layer 2") << "bar";
  CHECK_TRACE_CONTENTS("test layer 1", "foo");
}

void test_trace_check_ignores_other_lines() {
  trace("test layer 1") << "foo";
  trace("test layer 1") << "bar";
  CHECK_TRACE_CONTENTS("test layer 1", "foo");
}

void test_trace_check_always_finds_empty_lines() {
  CHECK_TRACE_CONTENTS("test layer 1", "");
}

void test_trace_check_treats_empty_layers_as_wildcards() {
  trace("test layer 1") << "foo";
  CHECK_TRACE_CONTENTS("", "foo");
}

void test_trace_check_multiple_lines_at_once() {
  trace("test layer 1") << "foo";
  trace("test layer 2") << "bar";
  CHECK_TRACE_CONTENTS("", "foo\n"
                           "bar\n");
}

void test_trace_check_always_finds_empty_lines2() {
  CHECK_TRACE_CONTENTS("test layer 1", "\n\n\n");
}

void test_trace_orders_across_layers() {
  trace("test layer 1") << "foo";
  trace("test layer 2") << "bar";
  trace("test layer 1") << "qux";
  CHECK_TRACE_CONTENTS("", "foo\n"
                           "bar\n"
                           "qux\n");
}

void test_trace_supports_count() {
  trace("test layer 1") << "foo";
  trace("test layer 1") << "foo";
  CHECK_EQ(trace_count("test layer 1", "foo"), 2);
}

//// helpers

// can't check trace because trace methods call 'split'

void test_split_returns_at_least_one_elem() {
  vector<string> result = split("", ",");
  CHECK_EQ(result.size(), 1);
  CHECK_EQ(result[0], "");
}

void test_split_returns_entire_input_when_no_delim() {
  vector<string> result = split("abc", ",");
  CHECK_EQ(result.size(), 1);
  CHECK_EQ(result[0], "abc");
}

void test_split_works() {
  vector<string> result = split("abc,def", ",");
  CHECK_EQ(result.size(), 2);
  CHECK_EQ(result[0], "abc");
  CHECK_EQ(result[1], "def");
}

void test_split_works2() {
  vector<string> result = split("abc,def,ghi", ",");
  CHECK_EQ(result.size(), 3);
  CHECK_EQ(result[0], "abc");
  CHECK_EQ(result[1], "def");
  CHECK_EQ(result[2], "ghi");
}

void test_split_handles_multichar_delim() {
  vector<string> result = split("abc,,def,,ghi", ",,");
  CHECK_EQ(result.size(), 3);
  CHECK_EQ(result[0], "abc");
  CHECK_EQ(result[1], "def");
  CHECK_EQ(result[2], "ghi");
}

A  => tangle/002main.cc +51 -0
@@ 1,51 @@
int main(int argc, const char* argv[]) {
  if (flag("test", argc, argv))
    return run_tests();
  return tangle(argc, argv);
}

bool flag(const string& flag, int argc, const char* argv[]) {
  for (int i = 1; i < argc; ++i)
    if (string(argv[i]) == flag)
      return true;
  return false;
}

string flag_value(const string& flag, int argc, const char* argv[]) {
  for (int i = 1; i < argc-1; ++i)
    if (string(argv[i]) == flag)
      return argv[i+1];
  return "";
}

//// test harness

int run_tests() {
  for (unsigned long i=0; i < sizeof(Tests)/sizeof(Tests[0]); ++i) {
//?     cerr << "running " << Test_names[i] << '\n';
    START_TRACING_UNTIL_END_OF_SCOPE;
    setup();
    (*Tests[i])();
    verify();
  }

  cerr << '\n';
  if (Num_failures > 0)
    cerr << Num_failures << " failure"
         << (Num_failures > 1 ? "s" : "")
         << '\n';
  return Num_failures;
}

void verify() {
  Hide_warnings = false;
  if (!Passed)
    ;
  else
    cerr << ".";
}

void setup() {
  Hide_warnings = false;
  Passed = true;
}

A  => tangle/003tangle.cc +338 -0
@@ 1,338 @@
// Reorder a file based on directives starting with ':(' (tangle directives).
// Insert #line directives to preserve line numbers in the original.
// Clear lines starting with '//:' (tangle comments).

//// Preliminaries regarding line number management

struct Line {
  string filename;
  size_t line_number;
  string contents;
  Line() :line_number(0) {}
  Line(const string& text) :line_number(0) {
    contents = text;
  }
  Line(const string& text, const string& f, const size_t& l) {
    contents = text;
    filename = f;
    line_number = l;
  }
  Line(const string& text, const Line& origin) {
    contents = text;
    filename = origin.filename;
    line_number = origin.line_number;
  }
};

// Emit a list of line contents, inserting directives just at discontinuities.
// Needs to be a macro because 'out' can have the side effect of creating a
// new trace in Trace_stream.
#define EMIT(lines, out) if (!lines.empty()) { \
  string last_file = lines.begin()->filename; \
  size_t last_line = lines.begin()->line_number-1; \
  out << line_directive(lines.begin()->line_number, lines.begin()->filename) << '\n'; \
  for (list<Line>::const_iterator p = lines.begin(); p != lines.end(); ++p) { \
    if (last_file != p->filename || last_line != p->line_number-1) \
      out << line_directive(p->line_number, p->filename) << '\n'; \
    out << p->contents << '\n'; \
    last_file = p->filename; \
    last_line = p->line_number; \
  } \
}

string line_directive(size_t line_number, string filename) {
  ostringstream result;
  if (filename.empty())
    result << "#line " << line_number;
  else
    result << "#line " << line_number << " \"" << filename << '"';
  return result.str();
}

//// Tangle

string Toplevel = "run";

int tangle(int argc, const char* argv[]) {
  list<Line> result;
  for (int i = 1; i < argc; ++i) {
//?     cerr << "new file " << argv[i] << '\n';
    Toplevel = "run";
    ifstream in(argv[i]);
    tangle(in, argv[i], result);
  }

  EMIT(result, cout);
  return 0;
}

void tangle(istream& in, const string& filename, list<Line>& out) {
  string curr_line;
  size_t line_number = 1;
  while (!in.eof()) {
    getline(in, curr_line);
    if (starts_with(curr_line, ":(")) {
      ++line_number;
      process_next_hunk(in, trim(curr_line), filename, line_number, out);
      continue;
    }
    if (starts_with(curr_line, "//:")) {
      ++line_number;
      continue;
    }
    out.push_back(Line(curr_line, filename, line_number));
    ++line_number;
  }

  // Trace all line contents, inserting directives just at discontinuities.
  if (!Trace_stream) return;
  EMIT(out, Trace_stream->stream("tangle"));
}

// just for tests
void tangle(istream& in, list<Line>& out) {
  tangle(in, "", out);
}

void process_next_hunk(istream& in, const string& directive, const string& filename, size_t& line_number, list<Line>& out) {
  istringstream directive_stream(directive.substr(2));  // length of ":("
  string cmd = next_tangle_token(directive_stream);

  // first slurp all lines until next directive
  list<Line> hunk;
  {
    string curr_line;
    while (!in.eof()) {
      std::streampos old = in.tellg();
      getline(in, curr_line);
      if (starts_with(curr_line, ":(")) {
        in.seekg(old);
        break;
      }
      if (starts_with(curr_line, "//:")) {
        // tangle comments
        ++line_number;
        continue;
      }
      hunk.push_back(Line(curr_line, filename, line_number));
      ++line_number;
    }
  }

  if (cmd == "code") {
    out.insert(out.end(), hunk.begin(), hunk.end());
    return;
  }

  if (cmd == "before" || cmd == "after" || cmd == "replace" || cmd == "replace{}" || cmd == "delete" || cmd == "delete{}") {
    list<Line>::iterator target = locate_target(out, directive_stream);
    if (target == out.end()) {
      raise << "couldn't find target " << directive << '\n' << die();
      return;
    }

    indent_all(hunk, target);

    if (cmd == "before") {
      out.splice(target, hunk);
    }
    else if (cmd == "after") {
      ++target;
      out.splice(target, hunk);
    }
    else if (cmd == "replace" || cmd == "delete") {
      out.splice(target, hunk);
      out.erase(target);
    }
    else if (cmd == "replace{}" || cmd == "delete{}") {
      if (find_trim(hunk, ":OLD_CONTENTS") == hunk.end()) {
        out.splice(target, hunk);
        out.erase(target, balancing_curly(target));
      }
      else {
        list<Line>::iterator next = balancing_curly(target);
        list<Line> old_version;
        old_version.splice(old_version.begin(), out, target, next);
        old_version.pop_back();  old_version.pop_front();  // contents only please, not surrounding curlies

        list<Line>::iterator new_pos = find_trim(hunk, ":OLD_CONTENTS");
        indent_all(old_version, new_pos);
        hunk.splice(new_pos, old_version);
        hunk.erase(new_pos);
        out.splice(next, hunk);
      }
    }
    return;
  }

  raise << "unknown directive " << cmd << '\n' << die();
}

list<Line>::iterator locate_target(list<Line>& out, istream& directive_stream) {
  string pat = next_tangle_token(directive_stream);
  if (pat == "") return out.end();

  string next_token = next_tangle_token(directive_stream);
  if (next_token == "") {
    return find_substr(out, pat);
  }
  // first way to do nested pattern: pattern 'following' intermediate
  else if (next_token == "following") {
    string pat2 = next_tangle_token(directive_stream);
    if (pat2 == "") return out.end();
    list<Line>::iterator intermediate = find_substr(out, pat2);
    if (intermediate == out.end()) return out.end();
    return find_substr(out, intermediate, pat);
  }
  // second way to do nested pattern: intermediate 'then' pattern
  else if (next_token == "then") {
    list<Line>::iterator intermediate = find_substr(out, pat);
    if (intermediate == out.end()) return out.end();
    string pat2 = next_tangle_token(directive_stream);
    if (pat2 == "") return out.end();
    return find_substr(out, intermediate, pat2);
  }
  raise << "unknown keyword in directive: " << next_token << '\n';
  return out.end();
}

// indent all lines in l like indentation at exemplar
void indent_all(list<Line>& l, list<Line>::iterator exemplar) {
  string curr_indent = indent(exemplar->contents);
  for (list<Line>::iterator p = l.begin(); p != l.end(); ++p)
    if (!p->contents.empty())
      p->contents.insert(p->contents.begin(), curr_indent.begin(), curr_indent.end());
}

string next_tangle_token(istream& in) {
  in >> std::noskipws;
  ostringstream out;
  skip_whitespace(in);
  if (in.peek() == '"')
    slurp_tangle_string(in, out);
  else
    slurp_word(in, out);
  return out.str();
}

void slurp_tangle_string(istream& in, ostream& out) {
  in.get();
  char c;
  while (in >> c) {
    if (c == '\\') {
      // skip backslash and save next character unconditionally
      in >> c;
      out << c;
      continue;
    }
    if (c == '"') break;
    out << c;
  }
}

void slurp_word(istream& in, ostream& out) {
  char c;
  while (in >> c) {
    if (isspace(c) || c == ')') {
      in.putback(c);
      break;
    }
    out << c;
  }
}

void skip_whitespace(istream& in) {
  while (isspace(in.peek()))
    in.get();
}

list<Line>::iterator balancing_curly(list<Line>::iterator curr) {
  long open_curlies = 0;
  do {
    for (string::iterator p = curr->contents.begin(); p != curr->contents.end(); ++p) {
      if (*p == '{') ++open_curlies;
      if (*p == '}') --open_curlies;
    }
    ++curr;
    // no guard so far against unbalanced curly, including inside comments or strings
  } while (open_curlies != 0);
  return curr;
}

list<Line>::iterator find_substr(list<Line>& in, const string& pat) {
  for (list<Line>::iterator p = in.begin(); p != in.end(); ++p)
    if (p->contents.find(pat) != string::npos)
      return p;
  return in.end();
}

list<Line>::iterator find_substr(list<Line>& in, list<Line>::iterator p, const string& pat) {
  for (; p != in.end(); ++p)
    if (p->contents.find(pat) != string::npos)
      return p;
  return in.end();
}

list<Line>::iterator find_trim(list<Line>& in, const string& pat) {
  for (list<Line>::iterator p = in.begin(); p != in.end(); ++p)
    if (trim(p->contents) == pat)
      return p;
  return in.end();
}

string escape(string s) {
  s = replace_all(s, "\\", "\\\\");
  s = replace_all(s, "\"", "\\\"");
  s = replace_all(s, "", "\\n");
  return s;
}

string replace_all(string s, const string& a, const string& b) {
  for (size_t pos = s.find(a); pos != string::npos; pos = s.find(a, pos+b.size()))
    s = s.replace(pos, a.size(), b);
  return s;
}

// does s start with pat, after skipping whitespace?
// pat can't start with whitespace
bool starts_with(const string& s, const string& pat) {
  for (size_t pos = 0; pos < s.size(); ++pos)
    if (!isspace(s.at(pos)))
      return s.compare(pos, pat.size(), pat) == 0;
  return false;
}

string indent(const string& s) {
  for (size_t pos = 0; pos < s.size(); ++pos)
    if (!isspace(s.at(pos)))
      return s.substr(0, pos);
  return "";
}

string strip_indent(const string& s, size_t n) {
  if (s.empty()) return "";
  string::const_iterator curr = s.begin();
  while (curr != s.end() && n > 0 && isspace(*curr)) {
    ++curr;
    --n;
  }
  return string(curr, s.end());
}

string trim(const string& s) {
  string::const_iterator first = s.begin();
  while (first != s.end() && isspace(*first))
    ++first;
  if (first == s.end()) return "";

  string::const_iterator last = --s.end();
  while (last != s.begin() && isspace(*last))
    --last;
  ++last;
  return string(first, last);
}

const Line& front(const list<Line>& l) {
  assert(!l.empty());
  return l.front();
}

A  => tangle/003tangle.test.cc +392 -0
@@ 1,392 @@
void test_tangle() {
  istringstream in("a\n"
                   "b\n"
                   "c\n"
                   ":(before b)\n"
                   "d\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "d\n"
                                 "b\n"
                                 "c\n");
}

void test_tangle_with_linenumber() {
  istringstream in("a\n"
                   "b\n"
                   "c\n"
                   ":(before b)\n"
                   "d\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "#line 1\n"
                                 "a\n"
                                 "#line 5\n"
                                 "d\n"
                                 "#line 2\n"
                                 "b\n"
                                 "c\n");
  // no other #line directives
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "#line 3");
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "#line 4");
}

void test_tangle_linenumbers_with_filename() {
  istringstream in("a\n"
                   "b\n"
                   "c\n"
                   ":(before b)\n"
                   "d\n");
  list<Line> dummy;
  tangle(in, "foo", dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "#line 5 \"foo\"\n"
                                 "d\n"
                                 "b\n"
                                 "c\n");
}

void test_tangle_line_numbers_with_multiple_filenames() {
  istringstream in1("a\n"
                    "b\n"
                    "c");
  list<Line> dummy;
  tangle(in1, "foo", dummy);
  CLEAR_TRACE;
  istringstream in2(":(before b)\n"
                    "d\n");
  tangle(in2, "bar", dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "#line 2 \"bar\"\n"
                                 "d\n"
                                 "#line 2 \"foo\"\n"
                                 "b\n"
                                 "c\n");
}

void test_tangle_linenumbers_with_multiple_directives() {
  istringstream in1("a\n"
                    "b\n"
                    "c");
  list<Line> dummy;
  tangle(in1, "foo", dummy);
  CLEAR_TRACE;
  istringstream in2(":(before b)\n"
                    "d\n"
                    ":(before c)\n"
                    "e");
  tangle(in2, "bar", dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "#line 2 \"bar\"\n"
                                 "d\n"
                                 "#line 2 \"foo\"\n"
                                 "b\n"
                                 "#line 4 \"bar\"\n"
                                 "e\n"
                                 "#line 3 \"foo\"\n"
                                 "c\n");
}

void test_tangle_with_multiple_filenames_after() {
  istringstream in1("a\n"
                    "b\n"
                    "c");
  list<Line> dummy;
  tangle(in1, "foo", dummy);
  CLEAR_TRACE;
  istringstream in2(":(after b)\n"
                    "d\n");
  tangle(in2, "bar", dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "b\n"
                                 "#line 2 \"bar\"\n"
                                 "d\n"
                                 "#line 3 \"foo\"\n"
                                 "c\n");
}

void test_tangle_skip_tanglecomments() {
  istringstream in("a\n"
                   "b\n"
                   "c\n"
                   "//: 1\n"
                   "//: 2\n"
                   "d\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "b\n"
                                 "c\n"
                                 "\n"
                                 "\n"
                                 "d\n");
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "//: 1");
}

void test_tangle_with_tanglecomments_and_directive() {
  istringstream in("a\n"
                   "//: 1\n"
                   "b\n"
                   "c\n"
                   ":(before b)\n"
                   "d\n"
                   ":(code)\n"
                   "e\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "#line 6\n"
                                 "d\n"
                                 "#line 3\n"
                                 "b\n"
                                 "c\n"
                                 "#line 8\n"
                                 "e\n");
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "//: 1");
}

void test_tangle_with_tanglecomments_inside_directive() {
  istringstream in("a\n"
                   "//: 1\n"
                   "b\n"
                   "c\n"
                   ":(before b)\n"
                   "//: abc\n"
                   "d\n"
                   ":(code)\n"
                   "e\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "#line 7\n"
                                 "d\n"
                                 "#line 3\n"
                                 "b\n"
                                 "c\n"
                                 "#line 9\n"
                                 "e\n");
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "//: 1");
}

void test_tangle_with_multiword_directives() {
  istringstream in("a b\n"
                   "c\n"
                   ":(after \"a b\")\n"
                   "d\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a b\n"
                                 "d\n"
                                 "c\n");
}

void test_tangle_with_quoted_multiword_directives() {
  istringstream in("a \"b\"\n"
                   "c\n"
                   ":(after \"a \\\"b\\\"\")\n"
                   "d\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a \"b\"\n"
                                 "d\n"
                                 "c\n");
}

void test_tangle2() {
  istringstream in("a\n"
                   "b\n"
                   "c\n"
                   ":(after b)\n"
                   "d\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "b\n"
                                 "d\n"
                                 "c\n");
}

void test_tangle_at_end() {
  istringstream in("a\n"
                   "b\n"
                   "c\n"
                   ":(after c)\n"
                   "d\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "b\n"
                                 "c\n"
                                 "d\n");
}

void test_tangle_indents_hunks_correctly() {
  istringstream in("a\n"
                   "  b\n"
                   "c\n"
                   ":(after b)\n"
                   "d\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "  b\n"
                                 "  d\n"
                                 "c\n");
}

void test_tangle_warns_on_missing_target() {
  Hide_warnings = true;
  istringstream in(":(before)\n"
                   "abc def\n");
  list<Line> lines;
  tangle(in, lines);
  CHECK_TRACE_WARNS();
}

void test_tangle_warns_on_unknown_target() {
  Hide_warnings = true;
  istringstream in(":(before \"foo\")\n"
                   "abc def\n");
  list<Line> lines;
  tangle(in, lines);
  CHECK_TRACE_WARNS();
}

void test_tangle_delete_range_of_lines() {
  istringstream in("a\n"
                   "b {\n"
                   "c\n"
                   "}\n"
                   ":(delete{} \"b\")\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n");
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "b");
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "c");
}

void test_tangle_replace() {
  istringstream in("a\n"
                   "b\n"
                   "c\n"
                   ":(replace b)\n"
                   "d\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "d\n"
                                 "c\n");
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "b");
}

void test_tangle_replace_range_of_lines() {
  istringstream in("a\n"
                   "b {\n"
                   "c\n"
                   "}\n"
                   ":(replace{} \"b\")\n"
                   "d\n"
                   "e\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "d\n"
                                 "e\n");
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "b {");
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "c");
}

void test_tangle_replace_tracks_old_lines() {
  istringstream in("a\n"
                   "b {\n"
                   "c\n"
                   "}\n"
                   ":(replace{} \"b\")\n"
                   "d\n"
                   ":OLD_CONTENTS\n"
                   "e\n");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "d\n"
                                 "c\n"
                                 "e\n");
  CHECK_TRACE_DOESNT_CONTAIN("tangle", "b {");
}

void test_tangle_nested_patterns() {
  istringstream in("a\n"
                   "c\n"
                   "b\n"
                   "c\n"
                   "d\n"
                   ":(after \"b\" then \"c\")\n"
                   "e");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "c\n"
                                 "b\n"
                                 "c\n"
                                 "e\n"
                                 "d\n");
}

void test_tangle_nested_patterns2() {
  istringstream in("a\n"
                   "c\n"
                   "b\n"
                   "c\n"
                   "d\n"
                   ":(after \"c\" following \"b\")\n"
                   "e");
  list<Line> dummy;
  tangle(in, dummy);
  CHECK_TRACE_CONTENTS("tangle", "a\n"
                                 "c\n"
                                 "b\n"
                                 "c\n"
                                 "e\n"
                                 "d\n");
}

// todo: include line numbers in tangle errors

//// helpers

void test_trim() {
  CHECK_EQ(trim(""), "");
  CHECK_EQ(trim(" "), "");
  CHECK_EQ(trim("  "), "");
  CHECK_EQ(trim("a"), "a");
  CHECK_EQ(trim(" a"), "a");
  CHECK_EQ(trim("  a"), "a");
  CHECK_EQ(trim("  ab"), "ab");
  CHECK_EQ(trim("a "), "a");
  CHECK_EQ(trim("a  "), "a");
  CHECK_EQ(trim("ab  "), "ab");
  CHECK_EQ(trim(" a "), "a");
  CHECK_EQ(trim("  a  "), "a");
  CHECK_EQ(trim("  ab  "), "ab");
}

void test_strip_indent() {
  CHECK_EQ(strip_indent("", 0), "");
  CHECK_EQ(strip_indent("", 1), "");
  CHECK_EQ(strip_indent("", 3), "");
  CHECK_EQ(strip_indent(" ", 0), " ");
  CHECK_EQ(strip_indent(" a", 0), " a");
  CHECK_EQ(strip_indent(" ", 1), "");
  CHECK_EQ(strip_indent(" a", 1), "a");
  CHECK_EQ(strip_indent(" ", 2), "");
  CHECK_EQ(strip_indent(" a", 2), "a");
  CHECK_EQ(strip_indent("  ", 0), "  ");
  CHECK_EQ(strip_indent("  a", 0), "  a");
  CHECK_EQ(strip_indent("  ", 1), " ");
  CHECK_EQ(strip_indent("  a", 1), " a");
  CHECK_EQ(strip_indent("  ", 2), "");
  CHECK_EQ(strip_indent("  a", 2), "a");
  CHECK_EQ(strip_indent("  ", 3), "");
  CHECK_EQ(strip_indent("  a", 3), "a");
}

A  => tangle/Readme.md +113 -0
@@ 1,113 @@
[Literate Programming](https://en.wikipedia.org/wiki/Literate_programming)
tool to convert a series of layers into a single .cc file.

These tangling directives differ from Knuth's classic implementation. The
classical approach starts out with labeled subsystems that are initially
empty, and adds code to them using two major directives:

```
<name> ≡
<code>
```

```
<name> +≡
<code>
```

_(`<code>` can span multiple lines.)_

This approach is best suited for top-down exposition.

On the other hand, the tangling directives here are better suited for a
cleaned-up history of a codebase. Subsystems start out with a simple skeleton
of the core of the program. Later versions then tell a story of the evolution
of the program, with each version colocating all the code related to new
features.

Read more:
* http://akkartik.name/post/wart-layers
* http://akkartik.name/post/literate-programming
* https://github.com/akkartik/mu/blob/master/000organization.cc

## directives

Add code to a project:

```
:(code)
<code>
```

Insert code before a specific line:

```
:(before <waypoint>)
<code>
```

Here `<waypoint>` is a substring matching a single line in the codebase. (We
never use regular expressions.) Surround the substring in `"` quotes if it
spans multiple words.

Insert code _after_ a specific line:

```
:(after <waypoint>)
<code>
```

Delete a specific previously-added line (because it's not needed in a newer
version).

```
:(delete <line>)
```

Delete a block of code starting with a given header and surrounded by `{` and
`}`:

```
:(delete{} <header>)
```

_(Caveat: doesn't directly support C's `do`..`while` loops.)_

Replace a specific line with new code:

```
:(replace <line>)
<code>
```

This is identical to:
```
:(before <line>)
<code>
:(delete <line>)
```
_(Assuming `<code>` did not insert a new line matching the substring `<line>`.)_

Replace a block of code with another:

```
:(replace{} <header>)
<code>
```

Insert code before or after a substring pattern that isn't quite a unique
waypoint in the whole codebase:

```
:(before <line> following <waypoint>)
<code>
:(after <line> following <waypoint>)
<code>
```

```
:(before <waypoint> then <line>)
<code>
:(after <waypoint> then <line>)
<code>
```

A  => tangle/boot.cc +38 -0
@@ 1,38 @@
#include<assert.h>
#include<cstdlib>
#include<cstring>

#include<vector>
using std::vector;
#include<list>
using std::list;
#include<utility>
using std::pair;

#include<string>
using std::string;

#include<iostream>
using std::istream;
using std::ostream;
using std::cin;
using std::cout;
using std::cerr;

#include<sstream>
using std::istringstream;
using std::ostringstream;

#include<fstream>
using std::ifstream;

#include <locale>
using std::isspace;  // unicode-aware

#include "type_list"

#include "function_list"

#include "file_list"

#include "test_file_list"