~ecs/tm

A version-control system
Fix shellcheck warnings
init: use commit instead of commit-tree
commit: handle lack of HEAD gracefully

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~ecs/tm
read/write
git@git.sr.ht:~ecs/tm

You can also use your local clone with git send-email.

#tm

Time Machine, a simple version control system.

Note: WIP, expect major breakage.

#Goals

  • Easy to convert between tm and git repos.
  • No other import/export facilities. If you want that, go through git.
  • Prioritize simplicity (in concept and in implementation) and portability over performance and features.
  • It should be possible to reimplement the tm plumbing from scratch in POSIX sh in a few weeks.

#Dependencies

POSIX, sha512sum(1).

TODO: do we want to do compression? TODO: drop dependency on sha512sum(1) TODO: rewrite performance-critical commands in C

#Deliberate omissions

  • Tags: use refs
  • Signed commits: put the signature in the commit message
  • Hooks: write a wrapper script
  • Branches: use refs
  • Configuration files: deal
  • Symlinks: deal

Support for the following might be removed in the future:

  • Binary files: potential workarounds include base64-encoding and something like git-lfs but with bittorrent.

#License

AGPLv3.

(While tm is unlikely to come in contact with a network, there's no reason not to protect it from SaaS.)

#Internals

Internals are similar to git, except where I thought I could get away with something simpler.

All text is UTF-8. All files are text files, and are newline-terminated.

Objects are identified by 512-bit SHA-512 hashes. A "pointer" is the hexadecimal representation of a hash, encoded in UTF-8.

.tm
| objects
  | [hex SHA-512 hash]
  ...
| refs
  | index
  | HEAD
  ...

#Objects

There are three types of objects: blobs, commits, and trees.

The first line of the object is the type of the object. Unlike in git, the size of the object is not stored in the object, and the object type is terminated with a newline instead of a NUL.

The SHA-512 hash is of the contents of the object, including the object type.

An object with hash $HASH will be stored at .tm/objects/$HASH.

#Blobs

A blob is just a flat array of bytes. tm doesn't care about its contents.

blob
The remainder of this object can be anything at all, though we can't
put an invalid UTF-8 character in for demonstration because it messes up
sourcehut.
#Commits

A commit is a tagged tree. More specifically, a commit encapsulates the following information:

  • The state of the working directory at some point in time
  • The previous state(s) of the working directory. Usually, these are ordered chronologically, but there's nothing stopping you from doing weird stuff.
  • The people who made the changes between this state and the immediately previous states.
  • The person who added this commit to the repository. This is distinct from the person who made the changes. A patch sent over email may be authored by one person and commited by another.
  • A human-readable description of the changes in this commit.

The format of a commit is:

commit
tree deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
parent cafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabe
parent deafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbead
author J. Random Hacker <jrh@example.org>
committer K. Random Hacker <krh@example.org>
date SECS
This line will become the subject of the patch
These lines will become the body of the patch.
This is another line. It serves no purpose except demonstrating that the
body can have multiple lines.

This commit tags the tree deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef and has two parents: cafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabecafebabe and deafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbeaddeafbead.

Because the full commit hash is extremely long, it is permitted to use any unique prefix in commands.

Commits must have one tree, any number of parents, any number of authors, one committer, one date, one subject line, and any number of body lines. These lines MUST occur in the order specified here.

SECS is the number of seconds since 1970-01-01 00:00 UTC at which the commit occurred.

The subject line MUST be less than 72 characters, and SHOULD be less than 50 chars. The body SHOULD be hard-wrapped at 72 characters, except for lines which contain logs or other data which must be copied verbatim.

#Trees

A tree represents a directory. It contains a list of blobs or trees, each of which is associated with a name and a set of permissions.

The format of a tree is:

tree
775 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef docs
664 42424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242424242 README.md

Each line is of the form $mode $hash $filename, where $mode is the octal mode of the file, $hash points to the contents of the file (either a tree or a blob), and $filename is the name of the file.

Note that we can't tell from this whether docs and README.md are trees or blobs. We can get that information from the objects referenced. (Though in this case, it's obvious that docs is a tree and README.md is a blob.)

#Refs

Refs are pointers to commits. Refs may be used anywhere an object is required, and are equivalent to specifying $(cat .tm/refs/$REF).

The HEAD ref will always exist, and points to the commit currently checked out. New commits are parented to HEAD.

The index ref will always exist, and points to the commit which is currently being built on top of HEAD.

#Design guidelines

Avoid "best practices". If you think some way of doing something is better, make everything else illegal.

Similarly, avoid configuration. tm should do the right thing, and only the right thing.

tm has a small number of fundamental abstractions -- currently only objects and patches. There are a few non-trivial operations each one supports. Make sure you understand these, and try your hardest to frame new features in terms of these operations.

Avoid doing anything complicated. If you have to do something complicated, make sure to get the most out of it. This goes doubly for the plumbing.

#Draft remote spec

Like in git, each remote gets a set of branches under $remotename/ dedicated to it.

Considerations

  • Should we replace the tarred up .tm with a set of patches?
  • Is there a way to make this simpler?
  • Do we actually want remotes?

#Fetch:

  • Client sends server the latest commit they have on the ref being fetched
  • If server has that commit, it tarballs up all the commits since then and all new trees and blobs along with the new ref in the same structure as they have in .tm. The client then untars this into .tm
  • If the server doesn't have that commit, the client sends that commit's parents, doing a breadth-first search on the commit history up until that point until the client finds a commit that the server also has. If the entire history up until the client's latest commit has been traversed and nothing in common has been found, the server just sends the whole .tm directory

For the initial clone, the second case happens immediately and the client just gets the entire contents of the repo.

#Push:

  • Client tells server that it wants to push
  • Server responds with the latest commit that it has
  • Fetch steps are repeated, with server and client switched