~ecs/tm

339c1b38133ef0c0628f38c5ca64de8b26184999 — Ember Sawady 3 years ago af88b50
Rewrite in POSIX sh

Also don't compress objects and make a few other small changes to the
spec.
14 files changed, 153 insertions(+), 268 deletions(-)

A COMMANDS.md
D Makefile
M README.md
D cat/main.c
D include/cat.h
D include/lib.h
A lib.sh
D lib/open_object.c
D lib/zerror.c
D main.c
A tm-cat
A tm-init
A tm-insert
A tm-update-ref
A COMMANDS.md => COMMANDS.md +13 -0
@@ 0,0 1,13 @@
# tm commands

## Plumbing

- `cat <hash>`: read an object
- `insert <path>`: write a blob or tree. Note: will not recursively add
  subdirectories.
- `update-ref <ref-name> <commit-hash>`: update a ref to point to a
  commit

TODO: porcelain for adding commits

## Porcelain

D Makefile => Makefile +0 -41
@@ 1,41 0,0 @@
.POSIX:
.SUFFIXES:
CC ?= c99
CPP ?= cpp
CFLAGS = \
	  -std=c99 \
	  -pedantic \
	  -Werror \
	  -Wall
CFLAGS!+=pkg-config --cflags zlib
LDFLAGS!=pkg-config --libs zlib
OUTDIR=.build

OBJS!=find . -type f -name '*.c' | sed 's/c$$/o/'

INCLUDE=-Iinclude

all: tm

tm: $(OBJS)
	@printf 'CCLD\t$@\n'
	@$(CC) -o $@ $(CFLAGS) $(LDFLAGS) $(OBJS)

-include $(OUTDIR)/cppcache

.SUFFIXES: .c .o

.c.o:
	@mkdir -p $(OUTDIR)
	@mkdir -p $$(dirname "$@")
	@printf 'CC\t$@\n'
	@touch $(OUTDIR)/cppcache
	@grep $< $(OUTDIR)/cppcache >/dev/null || \
		$(CPP) $(INCLUDE) -MM -MT $@ $< >> $(OUTDIR)/cppcache
	@$(CC) -c $(CFLAGS) $(INCLUDE) -o $@ $<

clean:
	rm -f $(OBJS)
	rm -f tm

.PHONY: clean

M README.md => README.md +36 -24
@@ 1,10 1,12 @@
# tm

Time Machine, a (mostly) dependency-free simple version control system
written in C99.
Time Machine, a simple version control system.

Note: WIP, expect major breakage.

See COMMANDS.md for a comprehensive list of all commands. Note that not
all of them are implemented.

## Goals

- Easy to convert between tm and git repos.


@@ 16,9 18,10 @@ Note: WIP, expect major breakage.

## Dependencies

POSIX, zlib.
POSIX, sha512sums(1).

TODO: replace zlib?
TODO: do we want to do compression?
TODO: rewrite performance-critical commands in C

## Deliberate omissions



@@ 45,18 48,16 @@ reason *not* to protect it from SaaS.)
Internals are similar to git, except where I thought I could get away
with something simpler.

All text is UTF-8. All files are text files, except for objects, which
are zlib-compressed text. All text files are newline-terminated.
All text is UTF-8. All files are text files, and are newline-terminated.

As in git, objects are identified by 160-bit SHA-1 hashes. A "pointer"
is the 40 byte hexadecimal representation of a hash, encoded in UTF-8.
As in git, objects are identified by 512-bit SHA-512 hashes. A "pointer"
is the 64 byte hexadecimal representation of a hash, encoded in UTF-8.

```
.tm
| index
| objects
  | [aa-ff]
    | [hex SHA-1 hash]
  | [hex SHA-512 hash]
| refs
  | HEAD
  ...


@@ 74,8 75,8 @@ The first line of the object is the type of the object. Unlike in git,
the size of the object is *not* stored in the object, and the object
type is terminated with a newline instead of a NUL.

The SHA-1 hash is of the decompressed contents of the object, *including
the object type*.
The SHA-512 hash is of the contents of the object, *including the object
type*.

The path at which an object with hash `$HASH` whose first two
characters are `X` and `Y` will be stored is `.tm/objects/XY/$HASH`.


@@ 85,6 86,11 @@ characters are `X` and `Y` will be stored is `.tm/objects/XY/$HASH`.
A blob is just a flat array of bytes. tm doesn't care about its
contents.

```
blob
The remainder of this object can be anything at all, including ✓ non-ASCII characters or � invalid UTF-8.
```

#### Commits

A commit is a tagged tree. More specifically, a commit encapsulates the


@@ 104,9 110,9 @@ The format of a commit is:

```
commit
tree deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
parent cafebabecafebabecafebabecafebabecafebabe
parent deafbeaddeafbeaddeafbeaddeafbeaddeafbead
tree deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
parent cafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabe
parent deafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbead
author J. Random Hacker <jrh@example.org>
committer K. Random Hacker <krh@example.org>
date SECS


@@ 118,19 124,25 @@ This is another line. It serves no purpose except demonstrating that the
body can have multiple lines.
```

This commit tags the tree `deadbeefdeadbeefdeadbeefdeadbeefdeadbeef` and
has two parents: `cafebabecafebabecafebabecafebabecafebabe` and
`deafbeaddeafbeaddeafbeaddeafbeaddeafbead`.
This commit tags the tree
`deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef`
and has two parents:
`cafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabe`
and
`deafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbead`.

Because the full commit hash is extremely long, it is permitted to use
any unique prefix in commands.

Commits must have one tree, any number of parents, one committer, one
author, one date, one subject line, and any number of body lines. These
lines MUST occur in the order specified here.
Commits must have one tree, any number of parents, one committer, any
number of authors, one date, one subject line, and any number of body
lines. These lines MUST occur in the order specified here.

`SECS` is the number of seconds since 1970-01-01 00:00 UTC at which the
commit occurred.

The subject line MUST be less than 50 characters. The body MUST be
hard-wrapped at 72 characters. There must be a blank line between the
hard-wrapped at 72 characters. There MUST be a blank line between the
subject and the body.

#### Trees


@@ 142,8 154,8 @@ The format of a tree is:

```
tree
rwxrwxr-x deadbeefdeadbeefdeadbeefdeadbeefdeadbeef docs
rw-rw-r-- 4242424242424242424242424242424242424242 README.md
775 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef docs
664 42424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242 README.md
```

This tree has two entries, `docs` and `README.md`. `docs` is

D cat/main.c => cat/main.c +0 -68
@@ 1,68 0,0 @@
#include <stdio.h>
#include <stdlib.h>
#include <zlib.h>

#include "lib.h"

#include "cat.h"

int
tm_cat_main(int argc, char *argv[])
{
	char in[BUFSIZ];
	char out[BUFSIZ];
	z_stream stream;
	int ret;
	if (argc < 2) {
		printf("usage: %s <hash>...\n", argv[0]);
		return 1;
	}

	FILE *obj = tm_open_object(argv[1], "r");
	if (!obj) {
		perror(argv[0]);
		return 1;
	}

	stream.zalloc = Z_NULL;
	stream.zfree = Z_NULL;
	stream.opaque = Z_NULL;
	stream.avail_in = 0;
	stream.next_in = Z_NULL;
	if (handle_zerror(argv[0], inflateInit(&stream))) {
		return 2;
	}

	do {
		stream.avail_in = fread(in, 1, BUFSIZ, obj);
		if (ferror(obj)) {
			perror(argv[0]);
			goto error;
		}
		if (stream.avail_in == 0) {
			(void)inflateEnd(&stream);
			return 0;
		}
		stream.next_in = (unsigned char *)in;
		do {
			size_t have;
			stream.avail_out = BUFSIZ;
			stream.next_out = (unsigned char *)out;
			if (handle_zerror(argv[0],
					ret = inflate(&stream, Z_NO_FLUSH))) {
				goto error;
			}
			have = BUFSIZ - stream.avail_out;
			if (fwrite(out, 1, have, stdout) != have
					|| ferror(stdout)) {
				goto error;
			}
		} while (stream.avail_out == 0);
	}

	while (fgets(in, BUFSIZ, obj) && fputs(in, stdout) > 0);
	return 0;
error:
	(void)inflateEnd(&stream);
	return 2;
}

D include/cat.h => include/cat.h +0 -4
@@ 1,4 0,0 @@
#ifndef TM_CAT_H
#define TM_CAT_H
int tm_cat_main(int argc, char *argv[]);
#endif

D include/lib.h => include/lib.h +0 -20
@@ 1,20 0,0 @@
#ifndef TM_LIB_H
#define TM_LIB_H
#include <stdbool.h>
#include <stdio.h>

/* TODO: run tm from a subdir */
#define TM_DIR "./.tm/"
#define TM_REFDIR TM_DIR "refs/"
#define TM_OBJDIR TM_DIR "objects/"

/* Open an the object with hash `hash`, passing `mode` to fopen() */
FILE *tm_open_object(const char *hash, const char *mode);

/*
 * Handle a zlib error.
 * Prints an error message to stderr corresponding to the error.
 * Returns `true` if there was an error, and false if there wasn't.
 */
bool handle_zerror(const char *prefix, int ret);
#endif

A lib.sh => lib.sh +38 -0
@@ 0,0 1,38 @@
TM_DIR="${TM_DIR:-.tm}"
TMPDIR="${TMPDIR:-/tmp}"
TMPDIR="$TMPDIR/tm.$$.$(date)"
mkdir -- "$TMPDIR"
trap "rm -f '$TMPDIR'; exit" INT
trap "rm -rf '$TMPDIR'" EXIT

# Resolve a reference
# TODO: ref~X, ref^X
resolve_ref() {
	if [ "z$1" = "zindex" ]; then
		resolve_ref "$(cat "$TM_DIR/index")"
		return 0
	fi
	if [ -f "$TM_DIR/refs/$1" ]; then
		resolve_ref "$(cat "$TM_DIR/refs/$1")"
		return 0
	fi
	if [ -f "$TM_DIR/objects/$1"* ] \
			&& [ "$(printf "$TM_DIR/objects/$1"* \
			| awk '{print NF}')" -eq 1 ]; then
		printf "%s\n" "$(basename "$TM_DIR/objects/$1"*)"
		return 0
	fi
	return 1
}

# Write stdin, which must include object type, to the object store
# Write the resulting hash to stdout
write() {
	tmp="$TMPDIR/write"
	cat >"$tmp"
	# TODO: replace this with something POSIX
	hash="$(sha512sum -- "$tmp" | cut -f1 -d' ')"
	cat "$tmp" >"$TM_DIR/objects/$hash"
	rm -f -- "$tmp"
	printf "%s\n" "$hash"
}

D lib/open_object.c => lib/open_object.c +0 -26
@@ 1,26 0,0 @@
#define _POSIX_VERSION 200809L
/* Fuck you to hell and back, glibc */
#define _POSIX_C_SOURCE _POSIX_VERSION
#include <limits.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#include "lib.h"

FILE *
tm_open_object(const char *hash, const char *mode)
{
	char path[PATH_MAX] = { 0 };
	printf("%u\n", _PC_PATH_MAX);
	char *buf = &path[0];
	if (strlen(TM_OBJDIR) + strlen("XX/") + strlen(hash) > PATH_MAX) {
		return NULL;
	}

	strcat(buf, TM_OBJDIR);
	strncat(buf, hash, 2);
	strcat(buf, "/");
	strcat(buf, hash);
	return fopen(path, mode);
}

D lib/zerror.c => lib/zerror.c +0 -42
@@ 1,42 0,0 @@
#include <stdio.h>
#include <stdlib.h>
#include <zlib.h>

#include "lib.h"

/* Handle zlib errors. Is a no-op if ret == Z_OK */
bool
handle_zerror(const char *prefix, int ret)
{
	const char *msg;
	switch (ret) {
	case Z_ERRNO:
		if (ferror(stdout)) {
			msg = "error writing to stdout";
		} else if (ferror(stdin)) {
			msg = "error reading from stdin";
		} else {
			msg = "zlib IO error on unknown file";
		}
		break;
	case Z_OK:
	case Z_STREAM_END:
		return false;
	case Z_STREAM_ERROR:
		msg = "invalid compression level";
		break;
	case Z_DATA_ERROR:
		msg = "invalid or incomplete zlib data";
	case Z_MEM_ERROR:
		msg = "out of memory";
		break;
	case Z_VERSION_ERROR:
		msg = "zlib version mismatch";
		break;
	default:
		msg = "unknown zlib error";
	}

	fprintf(stderr, "%s: %s\n", prefix, msg);
	return true;
}

D main.c => main.c +0 -43
@@ 1,43 0,0 @@
#include <stdio.h>
#include <string.h>

#include "cat.h"

/* Note: the usage message is hard-wrapped at 80 chars with 8-char tabs. */
void
usage(char *name)
{
	printf("usage: %s <command> [<subcommand>...] [<arg>...]\n", name);
	printf("For more detailed help, run `%s help`\n", name);
	printf("Commands:\n");
	printf("Porcelain (unimplemented):\n");
	printf("\tcommit: Commit the current index and update HEAD\n");
	printf("\tadd [-p] <file>...: Add files to the current index.\n");
	printf("\t\t-p: Split each file into a set of patches and\n"
		"\t\t    interactively select which ones to add.\n");
	printf("\trm <file>...: Remove files from the current index\n");
	printf("Plumbing:\n");
	printf("\tcat <hash>: Read the object identified by the hash <hash>.\n");
}

/*
 * Return values:
 * 0: success
 * 1: Usage error
 * 2: I/O or zlib error
 */
int
main(int argc, char *argv[])
{
	if (argc < 2) {
		usage(argv[0]);
		return 1;
	}

	if (strcmp(argv[1], "cat") == 0) {
		return tm_cat_main(argc - 1, &argv[1]);
	}

	usage(argv[0]);
	return 1;
}

A tm-cat => tm-cat +8 -0
@@ 0,0 1,8 @@
#!/bin/sh -eu

if [ $# -ne 1 ]; then
	echo "usage: tm cat <ref>"
	exit
fi
. "$(dirname -- "$0")/lib.sh"
cat "$TM_DIR/objects/$(resolve_ref "$1" || echo "error: invalid ref" >&2 || exit)"

A tm-init => tm-init +9 -0
@@ 0,0 1,9 @@
#!/bin/sh -eu

if [ $# -ne 0 ]; then
	echo "usage: tm init"
	exit
fi
. "$(dirname -- "$0")/lib.sh"
mkdir -p -- "$TM_DIR/objects" "$TM_DIR/refs"
echo tree | write >"$TM_DIR/index"

A tm-insert => tm-insert +31 -0
@@ 0,0 1,31 @@
#!/bin/sh -eu

usage() {
	echo "usage: tm insert [-t type] [<file>]"
	exit
}

. "$(dirname -- "$0")/lib.sh"
type=blob

while getopts t: opt; do
	case "$opt" in
		t)
			type="$OPTARG"
			;;
		?)
			usage
			;;
	esac
done

shift "$((OPTIND - 1))"

if [ $# -gt 1 ]; then
	usage
fi

tmp="$TMPDIR/insert"
printf "%s\n" "$type" >"$tmp"
cat -- $@ >"$tmp"
write <"$tmp"

A tm-update-ref => tm-update-ref +18 -0
@@ 0,0 1,18 @@
#!/bin/sh -eu

if [ $# -ne 2 ]; then
	echo "usage: tm update-ref <refname> <ref>"
	exit
fi
. "$(dirname -- "$0")/lib.sh"
out=""
if [ "z$1" = "zindex" ]; then
	out="$TM_DIR/index"
elif [ -f "$TM_DIR/refs/$1" ]; then
	out="$TM_DIR/refs/$1"
else
	printf "creating ref %s\n" "$1"
	out="$TM_DIR/refs/$1"
fi
ref="$(resolve_ref "$2" || echo "error: invalid ref" >&2 || exit)"
printf "%s\n" "$ref" >"$out"