~ivilata/gwit-spec

6937e1b9b502ef45826bbe96984867fa46b25c58 — Ivan Vilata-i-Balaguer 29 days ago afbc6a9
Rename the `_gwit` directory to `.gwit`.

Although a hidden directory has its usability issues, if gwit is to become
easier to use in arbitrary Git repositories, then it is probably more likely
for tools (like static site generators) not to have issues with a hidden
directory than one starting with underscore.  Such hidden directories are
commonplace in Git repos: think of `.gitlab` or `.github` for CI/CD tasks,
besides the obvious `.git` directory.
1 files changed, 22 insertions(+), 22 deletions(-)

M README.md
M README.md => README.md +22 -22
@@ 80,9 80,9 @@ The site branch associated with a gwit site MUST be named `gwit-0x........`, whe

This means that the same Git repository may hold different but related gwit sites, each one in its own branch and with its own key. For instance, while the `master` or `main` branch may contain common sources for a static site generator, generated Gemini and Web files may go to separate gwit site branches.

The different versions of a gwit site that constitute its history are Git commits in the site branch. To use one such commit as a valid site version, its top directory MUST include a `_gwit` directory (underscore `gwit`), which in turn:
The different versions of a gwit site that constitute its history are Git commits in the site branch. To use one such commit as a valid site version, its top directory MUST include a `.gwit` directory (dot `gwit`), which in turn:

- MUST include a `self.key` file containing the site key (and any signing subkeys) in OpenPGP public key format, ASCII-armored or not (e.g. the output of `gpg --export [--armor] <SITE-KEY>`). Although the primary key itself SHOULD NOT change, subsequent updates to `_gwit/self.key` MAY add new subkeys, identities, signatures, revocations and other metadata.
- MUST include a `self.key` file containing the site key (and any signing subkeys) in OpenPGP public key format, ASCII-armored or not (e.g. the output of `gpg --export [--armor] <SITE-KEY>`). Although the primary key itself SHOULD NOT change, subsequent updates to `.gwit/self.key` MAY add new subkeys, identities, signatures, revocations and other metadata.
- SHOULD include a `self.ini` file with the **site configuration**. Its contents are described below.

Also, the commit MUST be signed by the private key associated with the site key, or by a signing subkey of it.


@@ 93,7 93,7 @@ No restrictions are placed upon content files themselves, but it is RECOMMENDED 

### Site configuration file

`_gwit/self.ini` has the same format as [Git configuration files][git-config-file], which can be summarized as an [INI file][ini-file] where subsection definitions have a `[section-name "subsection-name"]` format. It MUST be encoded using UTF-8, all its values MUST be considered as simple strings (i.e. no special parsing of integers or pathnames), and includes MUST be disabled. Each single encoded value MUST NOT exceed 1000 bytes (unless otherwise stated below), values with multiple occurrences MUST NOT have more than 10 single values, and the whole file MUST NOT exceed 65536 bytes.
`.gwit/self.ini` has the same format as [Git configuration files][git-config-file], which can be summarized as an [INI file][ini-file] where subsection definitions have a `[section-name "subsection-name"]` format. It MUST be encoded using UTF-8, all its values MUST be considered as simple strings (i.e. no special parsing of integers or pathnames), and includes MUST be disabled. Each single encoded value MUST NOT exceed 1000 bytes (unless otherwise stated below), values with multiple occurrences MUST NOT have more than 10 single values, and the whole file MUST NOT exceed 65536 bytes.

Recognized sections and values are described below, and unknown ones SHOULD be ignored. If a value marked as "single" is assigned more than once in the file, then the last assignment is used.



@@ 102,7 102,7 @@ Recognized sections and values are described below, and unknown ones SHOULD be i
[ini-file]: https://en.wikipedia.org/wiki/INI_file
    "INI file (Wikipedia)"

The `[site "<ID>"]` subsection of `_gwit/self.ini` contains some basic information about the gwit site, meant for its readers except where otherwise noted. Its `<ID>` MUST be the identifier of the site itself, encoded as `0x` plus the lower case hexadecimal digits of the full fingerprint of the PGP site key. Example: `[site "0xfedcba98765432100123456789abcdef76543210"]`. Values recognized in the subsection are:
The `[site "<ID>"]` subsection of `.gwit/self.ini` contains some basic information about the gwit site, meant for its readers except where otherwise noted. Its `<ID>` MUST be the identifier of the site itself, encoded as `0x` plus the lower case hexadecimal digits of the full fingerprint of the PGP site key. Example: `[site "0xfedcba98765432100123456789abcdef76543210"]`. Values recognized in the subsection are:

- `name` (single, recommended): A short name or handle for the site. It MUST NOT (i) be empty or consist only of whitespace characters, (ii) contain newline or control characters, (iii) start with `0x` or `0X`. Example: "Foo Bar".



@@ 127,7 127,7 @@ The scope of the different site configuration values is described below:
- `site.<ID>.license`, `site.<ID>.root`, `site.<ID>.index`, `site.<ID>.alt`: The value in a specific version, if defined, SHOULD be applied only to that version.
- `site.<ID>.remote`: The value in the latest site version, if defined, MAY be applied on initial site retrieval and updates. The handling of old values is at the discretion of the gwit client.

This is a sample `_gwit/self.ini` file using all sections and values:
This is a sample `.gwit/self.ini` file using all sections and values:

```
[site "0xfedcba98765432100123456789abcdef76543210"]


@@ 149,17 149,17 @@ alt = gemini://foo.example.net/bar-site/

### Site introductions

A site's `_gwit` directory may also contain **site introductions**, which allow the site author to provide the information needed for the retrieval of other gwit sites. *This is the main means of content discovery in gwit*, thus site authors SHOULD provide such introductions for the sites that they link to.
A site's `.gwit` directory may also contain **site introductions**, which allow the site author to provide the information needed for the retrieval of other gwit sites. *This is the main means of content discovery in gwit*, thus site authors SHOULD provide such introductions for the sites that they link to.

An introduction for a given site MUST be contained in the file `_gwit/<ID>.ini`, where `<ID>` is the identifier of the introduced site, encoded as `0x` plus the lower case hexadecimal digits of the full fingerprint of the PGP site key. Example: `_gwit/0x0123456789abcdef0123456789abcdeffedcba98.ini`.
An introduction for a given site MUST be contained in the file `.gwit/<ID>.ini`, where `<ID>` is the identifier of the introduced site, encoded as `0x` plus the lower case hexadecimal digits of the full fingerprint of the PGP site key. Example: `.gwit/0x0123456789abcdef0123456789abcdeffedcba98.ini`.

The format and features of a site introduction file are those of a site configuration file (see further above). For introducing a site with identifier `<ID>`, the introduction file MUST contain a `[site "<ID>"]` subsection (the introduction proper), which MUST define at least one `site.<ID>.remote` value. The site identifier in the file name `_gwit/<ID>.ini` MUST match that in the file's `[site "<ID>"]` subsection.
The format and features of a site introduction file are those of a site configuration file (see further above). For introducing a site with identifier `<ID>`, the introduction file MUST contain a `[site "<ID>"]` subsection (the introduction proper), which MUST define at least one `site.<ID>.remote` value. The site identifier in the file name `.gwit/<ID>.ini` MUST match that in the file's `[site "<ID>"]` subsection.

While the value of `site.<ID>.remote` may be used for retrieving the introduced site, the rest of values may be considered as mere hints (since there is no guarantee that they come from that site's author), and they SHOULD be overridden by the client with the equivalent values of the actual site configuration file, once available locally.

Also note that a gwit client MAY regard an introduction's `site.<ID>.name` as this site author's proposed name for that site (its edge name); as such, the client SHOULD allow configuring a petname value that overrides it along other proposed names for the site.

This is a sample introduction, stored in the `_gwit/0x0123456789abcdef0123456789abcdeffedcba98.ini` file:
This is a sample introduction, stored in the `.gwit/0x0123456789abcdef0123456789abcdeffedcba98.ini` file:

```
[site "0x0123456789abcdef0123456789abcdeffedcba98"]


@@ 184,15 184,15 @@ To retrieve a site for the first time, given `<SITE-ID>` as its identifier (a st

1. Clone the Git repository from the given location into temporary storage (e.g. `git clone --bare --branch <SITE-BRANCH> <SITE-LOCATION> <TEMP-REPO> && cd <TEMP-REPO>`).
2. Get the commit at the head of the site branch as `<HEAD-COMMIT>` (e.g. `git show-ref --verify --hash refs/heads/<SITE-BRANCH>`).
3. Check that `self.key` exists as a file (blob) in the `_gwit` directory of `<HEAD-COMMIT>` (e.g. `git ls-tree --format='%(objecttype) %(objectname)' <HEAD-COMMIT> _gwit/self.key` reports `blob <KEY-FILE-HASH>`).
4. Check that the fingerprint of the primary PGP key in `_gwit/self.key` is equal to `<SITE-ID>` (case-insensitively) (e.g. `git cat-file blob <KEY-FILE-HASH> | gpg --show-keys --with-fingerprint --with-colons | grep -A1 '^pub:' | grep -qiE '^fpr:+<SITE-ID>:$'`).
5. Import `_gwit/self.key` into the client's keyring (e.g. `git cat-file blob <KEY-FILE-HASH> | gpg --homedir <CLIENT-GPG-DIR> --import`).
3. Check that `self.key` exists as a file (blob) in the `.gwit` directory of `<HEAD-COMMIT>` (e.g. `git ls-tree --format='%(objecttype) %(objectname)' <HEAD-COMMIT> .gwit/self.key` reports `blob <KEY-FILE-HASH>`).
4. Check that the fingerprint of the primary PGP key in `.gwit/self.key` is equal to `<SITE-ID>` (case-insensitively) (e.g. `git cat-file blob <KEY-FILE-HASH> | gpg --show-keys --with-fingerprint --with-colons | grep -A1 '^pub:' | grep -qiE '^fpr:+<SITE-ID>:$'`).
5. Import `.gwit/self.key` into the client's keyring (e.g. `git cat-file blob <KEY-FILE-HASH> | gpg --homedir <CLIENT-GPG-DIR> --import`).
6. Check that `<HEAD-COMMIT>` has a valid signature by the key that matches `<SITE-ID>` (case-insensitively), or by a subkey of it (e.g. `git verify-commit --raw <HEAD-COMMIT> 2>&1 | sed -nE 's/^\[GNUPG:\] VALIDSIG .*\b(\S+)$/\1/p'` reports `<SITE-ID>`).
7. Save the temporary clone into persistent client storage.

Any error or failed check in the previous steps would cause the process to stop at the current step, discard any temporary data, and report an error.

After the previous steps, the client MAY access the `_gwit/self.ini` file in the head of the site branch (e.g. `git cat-file blob <HEAD-COMMIT>:_gwit/self.ini`) and apply any relevant configuration values (see further above).
After the previous steps, the client MAY access the `.gwit/self.ini` file in the head of the site branch (e.g. `git cat-file blob <HEAD-COMMIT>:.gwit/self.ini`) and apply any relevant configuration values (see further above).

**Note:** Example commands using `git verify-commit --raw <COMMIT>` report the fingerprint of the *primary key* of the key used to sign the commit. An alternative approach would be to get the signing key (e.g. `git show --no-patch --format=format:%GK <COMMIT>` as `<SIG-KEY>`), check that it is (a subkey of) the key that matches `<SITE-ID>` (e.g. `gpg --homedir <CLIENT-GPG-DIR> --list-keys --with-fingerprint --with-colons <SIG-KEY> | grep -A1 '^pub:' | grep -qiE '^fpr:+<SITE-ID>:$'`), then just run `git verify-commit <COMMIT>`.



@@ 206,14 206,14 @@ If someone wants to retrieve updates to a gwit site identified by `<SITE-ID>` fo
2. Try to fetch new objects from `<REMOTE>` (e.g. `git fetch --atomic --no-write-fetch-head <REMOTE> '+refs/heads/*:refs/remotes/<REMOTE>/*'`; this preserves all fetch heads for each remote).
3. Get the commit hash of the new head as `<NEW-HEAD>` (e.g. `git show-ref --verify --hash refs/remotes/<REMOTE>/<SITE-BRANCH>`).
4. Check that `<NEW-HEAD>` is not an ancestor of the current head (e.g. not `git merge-base --is-ancestor <NEW-HEAD> <OLD-HEAD>`). If it is, then `<REMOTE>` does not contain newer content.
5. Update the site key (e.g. to allow new signing subkeys) by importing the `_gwit/self.key` file into the client's keyring (e.g. `git cat-file blob <NEW-HEAD>:_gwit/self.key | gpg --homedir <CLIENT-GPG-DIR> --import-options merge-only --import`).
5. Update the site key (e.g. to allow new signing subkeys) by importing the `.gwit/self.key` file into the client's keyring (e.g. `git cat-file blob <NEW-HEAD>:.gwit/self.key | gpg --homedir <CLIENT-GPG-DIR> --import-options merge-only --import`).
6. Check that `<NEW-HEAD>` has a valid signature by the key that matches `<SITE-ID>` (case-insensitively), or by a subkey of it (e.g. `git verify-commit --raw <NEW-HEAD> 2>&1 | sed -nE 's/^\[GNUPG:\] VALIDSIG .*\b(\S+)$/\1/p'` reports `<SITE-ID>`).
7. If the current head is not an ancestor of `<NEW-HEAD>` (e.g. not `git merge-base --is-ancestor <OLD-HEAD> <NEW-HEAD>`), then `<REMOTE>` contains a **site history rewrite**. This scenario is supported by the specification, and this step may or may not succeed depending on different conditions (see further below).
8. Update the head of `<SITE-BRANCH>` in the clone to `<NEW-HEAD>` (e.g. `git update-ref refs/heads/<SITE-BRANCH> <NEW-HEAD>`).

Any error or failed check in the previous steps would cause the process to stop at the current step, discard any temporary data, and report an error. If the Git clone includes additional remotes, the client MAY choose to repeat the procedure with another one in case of error, or to look for newer content.

After the previous steps, the client MAY access the `_gwit/self.ini` file in the `<NEW-HEAD>` commit (e.g. `git cat-file blob <NEW-HEAD>:_gwit/self.ini`) and apply any relevant configuration values (see further above). In particular, a change in `site.<SITE-ID>.remote` MAY trigger another update with the new value (e.g. after `git remote set-url origin <NEW-REMOTE>`).
After the previous steps, the client MAY access the `.gwit/self.ini` file in the `<NEW-HEAD>` commit (e.g. `git cat-file blob <NEW-HEAD>:.gwit/self.ini`) and apply any relevant configuration values (see further above). In particular, a change in `site.<SITE-ID>.remote` MAY trigger another update with the new value (e.g. after `git remote set-url origin <NEW-REMOTE>`).

### Site history rewrites



@@ 229,9 229,9 @@ Although site history rewrites (and subsequent cleanups) should be accepted in t

- As a general protection measure, a gwit client SHOULD retrieve content from other clones using the mechanisms described above (instead of copying their content straight into its own storage), as they may contain malicious hooks, tags, branches and others.

- OpenPGP implementations like GnuPG require that keys be imported into a keyring before using them to verify signatures, which means that `_gwit/self.key` must be imported before verifying its own authenticity on initial site retrieval and updates. A gwit client MAY perform extra verifications on `_gwit/self.key` (e.g. with `gpg --show-keys`) before importing it, or it MAY set a temporary keyring (e.g. via GnuPG's `GNUPGHOME` environment variable) to import `_gwit/self.key` and verify commit signatures (initial retrieval steps 5-6 and site update steps 5-6), then import `_gwit/self.key` again into the client's keyring if the verification succeeded.
- OpenPGP implementations like GnuPG require that keys be imported into a keyring before using them to verify signatures, which means that `.gwit/self.key` must be imported before verifying its own authenticity on initial site retrieval and updates. A gwit client MAY perform extra verifications on `.gwit/self.key` (e.g. with `gpg --show-keys`) before importing it, or it MAY set a temporary keyring (e.g. via GnuPG's `GNUPGHOME` environment variable) to import `.gwit/self.key` and verify commit signatures (initial retrieval steps 5-6 and site update steps 5-6), then import `.gwit/self.key` again into the client's keyring if the verification succeeded.

- Depending on the implementation of Git, some operations expecting a commit or object name (hash) may instead act upon a tag or branch with the same name. This behavior may allow certain attacks, e.g. the site author may craft signed tags to avoid history rewrite detection in a client when retrieving site updates, or to trick a client into importing a `_gwit/self.key` file in a commit different from the head of the site branch on initial site retrieval; other attackers may insert unsigned tags or branches in their public clones that cause errors in clients using them as remotes.
- Depending on the implementation of Git, some operations expecting a commit or object name (hash) may instead act upon a tag or branch with the same name. This behavior may allow certain attacks, e.g. the site author may craft signed tags to avoid history rewrite detection in a client when retrieving site updates, or to trick a client into importing a `.gwit/self.key` file in a commit different from the head of the site branch on initial site retrieval; other attackers may insert unsigned tags or branches in their public clones that cause errors in clients using them as remotes.

  As a way to fend off these attacks, clients SHOULD warn about and remove Git tags and branches with names matching the format of a SHA-1 or SHA-256 hash (40 or 64 hexadecimal digits, lower or upper case) right after cloning a repository (initial retrieval step 1) or fetching new objects (site update step 2), as those tags and branches are certainly malicious.



@@ 298,7 298,7 @@ A client MAY display a petname-decorated version of a gwit URI. Such representat

As mentioned further above, a gwit client may learn the self-proposed name of a site from its configuration file, as well as the edge names of introduced sites. In that case, it should also allow to set a different petname for any such site.

For instance, Alice retrieves Bob's site (with ID `<BOB-ID>`) for the first time using her gwit client. That site's `_gwit/self.ini` file sets `Bob's site` as the value of `site.<BOB-ID>.name`; the site also contains an introduction of Carol's site (with ID `<CAROL-ID>`) having `This is Carol` as the value of `site.<CAROL-ID>.name`.
For instance, Alice retrieves Bob's site (with ID `<BOB-ID>`) for the first time using her gwit client. That site's `.gwit/self.ini` file sets `Bob's site` as the value of `site.<BOB-ID>.name`; the site also contains an introduction of Carol's site (with ID `<CAROL-ID>`) having `This is Carol` as the value of `site.<CAROL-ID>.name`.

Alice's gwit client follows the petname implementation hints described in the paper [Implementation of a petnames system in an existing chat application][petnames-impl]. Thus, when the client finds a link to



@@ 343,7 343,7 @@ Once the client has established the value of `<COMMIT>`, it MUST check that `<CO

The client MUST then resolve the path `<PATH>` in the URI (which has already been percent-decoded if necessary) to a file or directory in the Git tree associated with the commit `<COMMIT>`, by following the steps below, so as to produce some output:

1. If `_gwit/self.ini` exists as a file (blob) in the desired commit `<COMMIT>` (e.g. `git ls-tree --format='%(objecttype) %(objectname)' <COMMIT> _gwit/self.ini` succeeds and reports `blob <CONF-FILE-HASH>`), then parse it (e.g. `git cat-file blob <CONF-FILE-HASH> | git config -f- …`). If it does not exist, treat site configuration as empty for the next steps.
1. If `.gwit/self.ini` exists as a file (blob) in the desired commit `<COMMIT>` (e.g. `git ls-tree --format='%(objecttype) %(objectname)' <COMMIT> .gwit/self.ini` succeeds and reports `blob <CONF-FILE-HASH>`), then parse it (e.g. `git cat-file blob <CONF-FILE-HASH> | git config -f- …`). If it does not exist, treat site configuration as empty for the next steps.
2. Compute `<RELPATH>` by replacing repetitions of the forward slash (`/`) in `<PATH>` by a single slash, then removing leading and trailing slashes, then removing dot segments according to the `remove_dot_segments` algorithm described in Section 5.2.4 of RFC3986 (e.g. `/foo//../bar/` becomes `bar`).

   The resulting `<RELPATH>` is relative to the site's root directory `<ROOT>` (as per site configuration) and either empty (meaning `<ROOT>` itself), or it consists of one or more non-empty path components separated by a single slash (for other files or directories).


@@ 360,9 360,9 @@ When producing or displaying contents on URI retrieval, the gwit client MAY make

## Appendix: Enabling discovery of combined sites via Well-Known URIs

One of gwit's goals is to make existing Web or Gemini static sites easy to publish in parallel as gwit sites. This may be as simple as distributing site files in a Git repository, along with `_gwit/self.key` and `_gwit/self.ini` files, and using the key in `_gwit/self.key` to sign commits.
One of gwit's goals is to make existing Web or Gemini static sites easy to publish in parallel as gwit sites. This may be as simple as distributing site files in a Git repository, along with `.gwit/self.key` and `.gwit/self.ini` files, and using the key in `.gwit/self.key` to sign commits.

For a more seamless integration, it should be possible to use the other protocols supported by such a **combined site** to both identify it as such and get the information needed to then access it over gwit. This information may be found in the files in the `_gwit` directory. However, since this is always found in the Git repository's top directory, if the site is configured in the other protocol to use some subdirectory `<SITE-ROOT>` as a root, then those files may not be available via the other protocol's URIs.
For a more seamless integration, it should be possible to use the other protocols supported by such a **combined site** to both identify it as such and get the information needed to then access it over gwit. This information may be found in the files in the `.gwit` directory. However, since this is always found in the Git repository's top directory, if the site is configured in the other protocol to use some subdirectory `<SITE-ROOT>` as a root, then those files may not be available via the other protocol's URIs.

A Well-Known URI ([RFC8615][]) MAY be used to provide such site metadata, accessible via the other protocol's `/.well-known/gwit.ini` URI path, mapping to the repository file `<SITE-ROOT>/.well-known/gwit.ini`. The format and features of this file are those of a site introduction file (see further above), where the site introduces itself. The file MUST contain exactly one `[site "<ID>"]` subsection. As with any introduction, the only truly relevant pieces of information are the site ID and the value(s) of `site.<ID>.remote` (e.g. `git config -f … --get-regexp '^site\.0x[0-9a-f]+\.remote$'`).



@@ 377,4 377,4 @@ remote = https://git.example.net/foo/bar-site.git
remote = https://lab.example.org/foo-mirror/bar-site.git
```

Since the same values of `site.<ID>.remote` may also appear in a site's configuration file `_gwit/self.ini`, a site author may make `<SITE-ROOT>/.well-known/gwit.ini` a relative symbolic link to the former to avoid duplicating information among both files.
Since the same values of `site.<ID>.remote` may also appear in a site's configuration file `.gwit/self.ini`, a site author may make `<SITE-ROOT>/.well-known/gwit.ini` a relative symbolic link to the former to avoid duplicating information among both files.