~ivilata/gwit-spec

aca7fb3a6e55a7fe613f87e67b9658f80878fcdb — Ivan Vilata-i-Balaguer 1 year, 1 month ago 9a7cfd1
Minor corrections and enhancements for readability.
1 files changed, 5 insertions(+), 5 deletions(-)

M README.md
M README.md => README.md +5 -5
@@ 108,7 108,7 @@ The `[site "<ID>"]` subsection of `_gwit/self.ini` contains some basic informati
- `desc` (single, optional): A longer text (in an unspecified language) describing the site, maybe over several lines or paragraphs. Its encoding MUST NOT exceed 4000 bytes.
- `desc-<LANGUAGE>` (single, optional): A language-specific description for the site, with the same characteristics as `desc`. `<LANGUAGE>` MUST be a two-letter ISO 639-1 language code.
- `license` (single, recommended): A short text hinting about the legal terms of use for the site, if meaningful. It MUST NOT contain newline characters. Example: "CC-BY-4.0" (meaning "Creative Commons Attribution 4.0 International" as per the [SPDX License List][spdx-licenses]).
- `root` (single, optional): A directory to be used as the site's **root directory** instead of, and relative to, the commit's top directory. If missing, it defaults to that top directory. It MUST consist of one or more non-empty path components separated by a single forward slash (`/`). It MUST NOT contain `.` or `..` path components. Convenient when using a static site generator that writes its output to a directory. Example (for a site containing Gemini files): `output`.
- `root` (single, optional): A directory to be used as the site's **root directory** instead of, and relative to, the commit's top directory. If missing, it defaults to that top directory. It MUST consist of one or more non-empty path components separated by a single forward slash (`/`). It MUST NOT contain `.` or `..` path components. This is convenient when using a static site generator that writes its output to a directory. Example (for such a generated site): `output`.
- `index` (single, optional): The name of the **index file**. It MUST NOT be empty, `.` or `..`, or contain slash characters (`/`). When a gwit client is told to retrieve a directory, and it contains a file named as the index file, the contents of the file SHOULD be produced instead of a directory listing. Example (for a site containing Gemini files): `index.gmi`.
- `remote` (multiple, recommended): A location recommended by the author for retrieving the site, the URL of a Git remote. Multiple such locations may be given (for increased availability), each as a different `remote` value, which a client MAY consider in order of appearance. Example: `https://git.example.net/foo/bar-site.git`.
- `alt` (multiple, optional): If given, the prefix for this site's URIs in a publication system other than gwit. The gwit client MAY interpret links in this site using those prefixes as if they began with a single slash (`/`) instead of the prefix and subsequent slashes. This enables reusing site contents in gwit without needing to adapt local absolute links. Multiple such prefixes may be given, each as a different `alt` value. Example: `https://foo.example.net/bar/` enables rewriting `https://foo.example.net/bar//page.html` to `/page.html`.


@@ 174,7 174,7 @@ If someone wants to use a client program to retrieve a gwit site for the first t
- The site identifier, i.e. the site key fingerprint. This MUST be a string of hexadecimal digits.
- The location of an existing copy of the site, accessible to it (either locally or remotely). This MUST be a local file system path or other URL format supported by Git for a remote.

These ID/location pairs may be conveyed to the client via different methods (like person-to-person, search engines, or site directories), however this specification only covers a discovery mechanism (described further above) where each site can provide a number of introductions for other sites with their respective ID and location. At any rate, the choice among a variety of available locations for the initial retrieval of a particular site is up to the implementation.
These ID/location pairs may be conveyed to the client via different methods (like person-to-person, search engines, or site directories), however this specification only covers the discovery mechanism described further above, where each site can provide a number of introductions for other sites with their respective ID and location. At any rate, the choice among a variety of available locations for the initial retrieval of a particular site is up to the implementation.

To retrieve a site for the first time, given `<SITE-ID>` as its identifier, `<SITE-LOCATION>` as its location, and `<SITE-BRANCH>` as its branch (derived from `<SITE-ID>` as described further above), the gwit client MUST clone the Git repository at `<SITE-LOCATION>` and verify that the head of the site branch is signed by the key matching `<SITE-ID>`. An implementation may follow the steps below, or some others with equivalent results:



@@ 244,7 244,7 @@ with parts in square brackets being optional, and where

- `<SITE>` indicates the target gwit site. It is the site identifier, encoded as a string of hexadecimal digits (case-insensitive) prefixed with `0x` or `0X`. Shortened versions of site key fingerprints (as accepted as key identifiers by some PGP implementations) MUST NOT be allowed, as they would weaken site authentication and open up attack vectors (esp. on initial retrieval).

  Links found inside of a gwit site may also use the string `self` (case-insensitive) for `<SITE>`, which allows the site to easily link to a particular version of itself (i.e. `gwit://<VERSION>@self<PATH>…`). When parsing such a URI, a gwit client MUST first replace `self` with the site identifier as described above. URIs using `self` MUST NOT be allowed outside of a site, and a gwit client SHOULD replace `self` with the site identifier when exporting them (e.g. when copying the URI to the clipboard).
  Links found inside of a gwit site may also use the string `self` (case-insensitive) for `<SITE>`, which allows the site to easily link to a particular version of itself (i.e. `gwit://<VERSION>@self<PATH>…`). When parsing such a URI, a gwit client MUST first replace `self` with the site identifier as described above. A URI using `self` MUST NOT be allowed outside of a site, and a gwit client SHOULD replace `self` with the site identifier when exporting it (e.g. when copying the URI to the clipboard).
- `<VERSION>`, when present and not empty, specifies a particular version of the target site. It is the object name (hash) of a Git commit in the site's history, encoded as a string of hexadecimal digits (case-insensitive). The name may be shortened by removing characters from its end, but this may cause content retrieval to fail if the client's Git clone of the site contains several commits with that same shortened name.

  When `<VERSION>` is missing or empty, the URI refers to whatever site version is most recent to a client when it accesses the site (i.e. the head of the site branch in the client's Git clone of the site).


@@ 262,7 262,7 @@ A link consisting of a URI with both site identifier and full version hash is ca
Some URI examples:

- `gwit://0x0123456789abcdef0123456789abcdeffedcba98/` links to the root directory of the latest known version of the site.
- `gwit://0x0123456789abcdef0123456789abcdeffedcba98/posts.html#latest` links to the element with ID `latest` in the file `posts.html` of the latest known version of the site.
- `gwit://0x0123456789abcdef0123456789abcdeffedcba98/posts.html#latest` links to the HTML element with ID `latest` in the file `posts.html` of the latest known version of the site.
- `gwit://9c359d88d4882d17d673a7fb89c9af8349a4fb7c@0x0123456789abcdef0123456789abcdeffedcba98/breaking-news.gmi` is a permanent link to the file `breaking-news.gmi` of version (Git commit) `9c359d88d4882d17d673a7fb89c9af8349a4fb7c` of the site.
- `gwit://9c359d88@0x0123456789abcdef0123456789abcdeffedcba98/tag/cats/` links to the directory `tag/cats` in the same version of the site as above (in shortened notation, thus not a permanent link).
- `gwit://v1.0@0x0123456789abcdef0123456789abcdeffedcba98/NEWS.txt`, with `v1.0` being a signed tag, links to the file `NEWS.txt` in the version of the site pointed by that tag.


@@ 358,7 358,7 @@ One of gwit's goals is to make existing Web or Gemini static sites easy to publi

For a more seamless integration, it should be possible to use the other protocols supported by such a **combined site** to both identify it as such and get the information needed to access it over gwit. However, since the `_gwit` directory is always found in the Git repository's top directory, if the site is configured to use a different root directory (i.e. `site.<ID>.root` in `_gwit/self.ini`), those files may not be available via URIs.

A Well-Known URI ([RFC8615][]) MAY be used to provide such site metadata, accessible via the `/.well-known/gwit.ini` URI path, mapping to the repository file `<SITE-ROOT>/.well-known/gwit.ini`. The format and features of this file are those of a site introduction file (see further above), where the site introduces itself: there MUST be exactly one `[site "<ID>"]` subsection. As with any introduction, the only truly relevant pieces of information are the site ID and the value(s) of `site.<ID>.remote` (e.g. `git config -f … --get-regexp '^site\.0x[0-9a-f]+\.remote$'`).
A Well-Known URI ([RFC8615][]) MAY be used to provide such site metadata, accessible via the `/.well-known/gwit.ini` URI path, mapping to the repository file `<SITE-ROOT>/.well-known/gwit.ini`. The format and features of this file are those of a site introduction file (see further above), where the site introduces itself. The file MUST contain exactly one `[site "<ID>"]` subsection. As with any introduction, the only truly relevant pieces of information are the site ID and the value(s) of `site.<ID>.remote` (e.g. `git config -f … --get-regexp '^site\.0x[0-9a-f]+\.remote$'`).

[RFC8615]: https://www.rfc-editor.org/rfc/rfc8615.html
    "Well-Known Uniform Resource Identifiers (URIs) (RFC 8615)"