~ivilata/gwit-spec

4cd66cdf816ac0d18550d712668f6a0bf71b19ad — Ivan Vilata-i-Balaguer 25 days ago 28b3593
Be more precise and consistent about full format of commit hashes.

Clarify that the precise format depends on which type of hashes is used by the
particular Git repository.
1 files changed, 3 insertions(+), 3 deletions(-)

M README.md
M README.md => README.md +3 -3
@@ 233,7 233,7 @@ Although site history rewrites (and subsequent cleanups) should be accepted in t

- Depending on the implementation of Git, some operations expecting a commit or object name (hash) may instead act upon a tag or branch with the same name. This behavior may allow certain attacks, e.g. the site author may craft signed tags to avoid history rewrite detection in a client when retrieving site updates, or to trick a client into importing a `.gwit/self.key` file in a commit different from the head of the site branch on initial site retrieval; other attackers may insert unsigned tags or branches in their public clones that cause errors in clients using them as remotes.

  As a way to fend off these attacks, clients SHOULD warn about and remove Git tags and branches with names matching the format of a SHA-1 or SHA-256 hash (40 or 64 hexadecimal digits, lower or upper case) right after cloning a repository (initial retrieval step 1) or fetching new objects (site update step 2), as those tags and branches are certainly malicious.
  As a way to fend off these attacks, clients SHOULD warn about and remove Git tags and branches with names matching the format of hashes used by the repository (either 40 or 64 hexadecimal digits for SHA-1 or SHA-256, lower or upper case) right after cloning a it (initial retrieval step 1) or fetching new objects (site update step 2), as those tags and branches are certainly malicious.

## URI format



@@ 263,7 263,7 @@ with parts in square brackets being optional, and where
  Except for the root directory, a component in `<PATH>` is a non-empty sequence of bytes that map directly to the bytes of its associated path name in a Git tree object. Bytes in the component corresponding to non-ASCII characters or reserved URI characters MUST be percent-encoded (as per Sections 2.1 and 2.2 of RFC3986). For instance, the Git path name `foo+f\xFCr+bar` would be encoded as `foo%2Bf%FCr%2Bbar` in a gwit URI. Clients SHOULD NOT further encode or decode the byte sequences with other encodings (like UTF-8 to get a Unicode string).
- `<FRAGMENT>`, when present and not empty, indicates some secondary resource inside of the primary resource referenced by the rest of the URI. Its interpretation is up to the gwit client, agreeing with standards applicable to the particular media type.

A link consisting of a URI with both site identifier and full commit hash is called a **permanent link**, as it uniquely identifies an exact, immutable object in a site regardless of subsequent updates to it. These links may be preferred for certain applications like long-term archival or citation. However, a site history rewrite (see further above) may render such a link unavailable to others, so applications relying on permanent links may want to handle rewrites in a special manner.
A link consisting of a URI with both a site identifier and a full commit hash as used in the particular Git repository (either 40 or 64 hexadecimal digits for SHA-1 or SHA-256, lower or upper case) is called a **permanent link**, as it uniquely identifies an exact, immutable object in a site regardless of subsequent updates to it. These links may be preferred for certain applications like long-term archival or citation. However, a site history rewrite (see further above) may render such a link unavailable to others, so applications relying on permanent links may want to handle rewrites in a special manner.

Some URI examples:



@@ 329,7 329,7 @@ Let `gwit://[<VERSION>@]<SITE><PATH>` be the URI which identifies a particular f
The client MUST then establish which Git commit `<COMMIT>` to use, according to the `<VERSION>` in the URI (which has already been percent-decoded if necessary), by following the first of the steps below whose condition applies:

1. If `<VERSION>` is missing or empty, get the commit hash of the head of the site branch `<SITE-BRANCH>` (derived from `<SITE-ID>` as described further above) as `<COMMIT>` (e.g. `git show-ref --verify --hash refs/heads/<SITE-BRANCH>`).
2. Else, if `<VERSION>` matches the format of a SHA-1 or SHA-256 hash (40 or 64 hexadecimal digits, lower or upper case), use it as `<COMMIT>`. This is the case for a permanent link.
2. Else, if `<VERSION>` matches the format of hashes used by the repository (either 40 or 64 hexadecimal digits for SHA-1 or SHA-256, lower or upper case), use it as `<COMMIT>`. This is the case for a permanent link.
3. Else, if `<VERSION>` consists only of hexadecimal digits (lower or upper case), check that there is neither Git tag nor branch with that name (e.g. `git show-ref --tags --heads <VERSION>` reports nothing), then check that it is the prefix of a single commit object, and use its complete name as `<COMMIT>` (e.g. `git rev-parse --verify <VERSION>^{commit}` only reports the `<COMMIT>`).

   **Note:** The check for Git tags or branches named after `<VERSION>` prevents an attacker from using such a named reference in their public clone to confuse other gwit clients which try to access a URI with that abbreviated commit name as `<VERSION>`, and tricking them into accessing a different commit. Shall the check fail, the client SHOULD report the situation as a potential attack (e.g. to help neutralize the problematic references or remotes). This security check is the reason why tag and branch names which are to be used in gwit URIs MUST NOT consist only of hexadecimal digits (lower or upper case).