~sircmpwn/drewdevault.com

18554b4206db80688e95f3217e725d91ef57cffa — Drew DeVault a month ago 906b9b0
Gemini!
M .build.yml => .build.yml +1 -0
@@ 40,3 40,4 @@ tasks:
    cd drewdevault.com
    echo "StrictHostKeyChecking=no" >> ~/.ssh/config
    rsync -rP public/ deploy@drewdevault.com:/var/www/drewdevault.com/
    rsync -rP public/gemini/ deploy@drewdevault.com:/srv/gemini/drewdevault.com/

M config.toml => config.toml +24 -0
@@ 12,3 12,27 @@ unsafe = true

[markup.tableOfContents]
ordered = true

[mediaTypes]
[mediaTypes."text/gemini"]
suffixes = ["gmi"]

[outputFormats]
[outputFormats.Gemini]
name = "GEMTEXT"
isPlainText = true
isHTML = false
mediaType = "text/gemini"
protocol = "gemini://"
permalinkable = true
path = "gemini/"

[outputFormats.GEMRSS]
name = "GEMRSS"
isHTML = false
mediaType = "application/rss+xml"
protocol = "gemini://"
path = "gemini/"

[outputs]
section = ["HTML", "RSS", "GEMRSS"]

A content/_index.gmi => content/_index.gmi +17 -0
@@ 0,0 1,17 @@
```ASCII art of a rocket next to "Drew DeVault" in a stylized font
  /\
  ||    ________                         ________       ____   ____            .__   __
  ||    \______ \_______   ______  _  __ \______ \   ___\   \ /   /____   __ __|  |_/  |_
 /||\    |    |  \_  __ \_/ __ \ \/ \/ /  |    |  \_/ __ \   Y   /\__  \ |  |  \  |\   __\
/:||:\   |    `   \  | \/\  ___/\     /   |    `   \  ___/\     /  / __ \|  |  /  |_|  |
|:||:|  /_______  /__|    \___  >\/\_/   /_______  /\___  >\___/  (____  /____/|____/__|
|/||\|        \/            \/                 \/     \/             \/
  **
  **
```

# Drew DeVault's geminispace

=> gmni.gmi gmni: a Gemini client
=> gmnisrv.gmi gmnisrv: a Gemini server
=> https://drewdevault.com Drew DeVault's blog on the WWW

M content/_index.html => content/_index.html +1 -0
@@ 1,3 1,4 @@
---
title: Drew DeVault's blog
outputs: [html, gemtext]
---

A content/blog/A-story-of-two-libcs.gmi => content/blog/A-story-of-two-libcs.gmi +208 -0
@@ 0,0 1,208 @@
---
title: A tale of two libcs
date: 2020-09-25
---

I received a bug report from Debian today, who had fed some garbage into scdoc[0], and it gave them a SIGSEGV back. Diving into this problem gave me a good opportunity to draw a comparison between musl libc and glibc. Let's start with the stack trace:

```
==26267==ERROR: AddressSanitizer: SEGV on unknown address 0x7f9925764184
(pc 0x0000004c5d4d bp 0x000000000002 sp 0x7ffe7f8574d0 T0)
==26267==The signal is caused by a READ memory access.
    0 0x4c5d4d in parse_text /scdoc/src/main.c:223:61
    1 0x4c476c in parse_document /scdoc/src/main.c
    2 0x4c3544 in main /scdoc/src/main.c:763:2
    3 0x7f99252ab0b2 in __libc_start_main
/build/glibc-YYA7BZ/glibc-2.31/csu/../csu/libc-start.c:308:16
    4 0x41b3fd in _start (/scdoc/scdoc+0x41b3fd)
```

=> https://git.sr.ht/~sircmpwn/scdoc [0]: scdoc

And if we pull up that line of code, we find...

```
if (!isalnum(last) || ((p->flags & FORMAT_UNDERLINE) && !isalnum(next))) {
```

Hint: p is a valid pointer. "last" and "next" are both uint32_t. The segfault happens in the second call to isalnum. And, the key: it can only be reproduced on glibc, not on musl libc. If you did a double-take, you're not alone. There's nothing here which could have caused a segfault.

Since it was narrowed down to glibc, I pulled up the source code and went digging for the isalnum implementation, expecting some stupid bullshit. But before I get into their stupid bullshit, of which I can assure you there is *a lot*, let's briefly review the happy version. This is what the musl libc `isalnum` implementation looks like:

```
int isalnum(int c)
{
	return isalpha(c) || isdigit(c);
}

int isalpha(int c)
{
	return ((unsigned)c|32)-'a' < 26;
}

int isdigit(int c)
{
	return (unsigned)c-'0' < 10;
}
```

As expected, for any value of `c`, isalnum will never segfault. Because why the fuck would isalnum segfault? Okay, now, let's compare this to the glibc implementation[1]. When opening this header, you're greeted with the typical GNU bullshit, but let's trudge through and grep for isalnum.

=> https://sourceware.org/git/?p=glibc.git;a=blob;f=ctype/ctype.h;h=351495aa4feaf23993fe65afc0760615268d044e;hb=HEAD [1]: The glibc implementation

```
enum
{
  _ISupper = _ISbit (0),        /* UPPERCASE.  */
  _ISlower = _ISbit (1),        /* lowercase.  */
  // ...
  _ISalnum = _ISbit (11)        /* Alphanumeric.  */
};
```

This looks like an implementation detail, let's move on.

```
__exctype (isalnum);
```

But what's `__exctype`? Back up the file a few lines...

```
#define __exctype(name) extern int name (int) __THROW
```

Okay, apparently that's just the prototype. Not sure why they felt the need to write a macro for that. Next search result...

```
#if !defined __NO_CTYPE
# ifdef __isctype_f
__isctype_f (alnum)
// ...
```

Okay, this looks useful. What is `__isctype_f`? Back up the file now...

```
#ifndef __cplusplus
# define __isctype(c, type) \
  ((*__ctype_b_loc ())[(int) (c)] & (unsigned short int) type)
#elif defined __USE_EXTERN_INLINES
# define __isctype_f(type) \
  __extern_inline int                                                         \
  is##type (int __c) __THROW                                                  \
  {                                                                           \
    return (*__ctype_b_loc ())[(int) (__c)] & (unsigned short int) _IS##type; \
  }
#endif
```

Oh.... oh dear. It's okay, we'll work through this together. Let's see, `__isctype_f` is some kind of inline function... wait, this is the else branch of `#ifndef __cplusplus`. Dead end. Where the fuck is isalnum *actually* defined? Grep again... okay... here we are?

```
#if !defined __NO_CTYPE
# ifdef __isctype_f
__isctype_f (alnum)
// ...
# elif defined __isctype
# define isalnum(c)     __isctype((c), _ISalnum) // <- this is it
```

Hey, there's that implementation detail from earlier! Remember this?

```
enum
{
  _ISupper = _ISbit (0),        /* UPPERCASE.  */
  _ISlower = _ISbit (1),        /* lowercase.  */
  // ...
  _ISalnum = _ISbit (11)        /* Alphanumeric.  */
};
```

Let's suss out that macro real quick:

```
# include <bits/endian.h>
# if __BYTE_ORDER == __BIG_ENDIAN
#  define _ISbit(bit)   (1 << (bit))
# else /* __BYTE_ORDER == __LITTLE_ENDIAN */
#  define _ISbit(bit)   ((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8))
# endif
```

Oh, for fuck's sake. Whatever, let's move on and just assume this is a magic number. The other macro is `__isctype`, which is similar to the `__isctype_f` we were just looking at a moment ago. Let's go look at that `ifndef __cplusplus` branch again:

```
#ifndef __cplusplus
# define __isctype(c, type) \
  ((*__ctype_b_loc ())[(int) (c)] & (unsigned short int) type)
#elif defined __USE_EXTERN_INLINES
// ...
#endif
```

...

Well, at least we have a pointer dereference now, that could explain the segfault. What's `__ctype_b_loc`?

```
/* These are defined in ctype-info.c.
   The declarations here must match those in localeinfo.h.

   In the thread-specific locale model (see `uselocale' in <locale.h>)
   we cannot use global variables for these as was done in the past.
   Instead, the following accessor functions return the address of
   each variable, which is local to the current thread if multithreaded.

   These point into arrays of 384, so they can be indexed by any `unsigned
   char' value [0,255]; by EOF (-1); or by any `signed char' value
   [-128,-1).  ISO C requires that the ctype functions work for `unsigned
   char' values and for EOF; we also support negative `signed char' values
   for broken old programs.  The case conversion arrays are of `int's
   rather than `unsigned char's because tolower (EOF) must be EOF, which
   doesn't fit into an `unsigned char'.  But today more important is that
   the arrays are also used for multi-byte character sets.  */
extern const unsigned short int **__ctype_b_loc (void)
     __THROW __attribute__ ((__const__));
extern const __int32_t **__ctype_tolower_loc (void)
     __THROW __attribute__ ((__const__));
extern const __int32_t **__ctype_toupper_loc (void)
     __THROW __attribute__ ((__const__));
```

That is just so, super cool of you, glibc. I just *love* dealing with locales. Anyway, my segfaulted process is sitting in gdb, and equipped with all of this information I wrote the following monstrosity:

```
(gdb) print ((unsigned int **(*)(void))__ctype_b_loc)()[next]
Cannot access memory at address 0x11dfa68
```

Segfault found. Reading that comment again, we see "ISO C requires that the ctype functions work for 'unsigned char' values and for EOF". If we cross-reference that with the specification:

> In all cases [of functions defined by ctype.h,] the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF.

So the fix is obvious at this point. Okay, fine, my bad. My code is wrong. I apparently cannot just hand a UCS-32 codepoint to isalnum and expect it to tell me if it's between 0x30-0x39, 0x41-0x5A, or 0x61-0x7A.

But, I'm going to go out on a limb here: maybe isalnum should never cause a program to segfault no matter what input you give it. Maybe because the spec says you *can* does not mean you *should*. Maybe, just maybe, the behavior of this function should not depend on five macros, whether or not you're using a C++ compiler, the endianness of your machine, a look-up table, thread-local storage, and two pointer dereferences.

Here's the musl version as a quick reminder:

```
int isalnum(int c)
{
	return isalpha(c) || isdigit(c);
}

int isalpha(int c)
{
	return ((unsigned)c|32)-'a' < 26;
}

int isdigit(int c)
{
	return (unsigned)c-'0' < 10;
}
```

Bye!

M content/blog/A-story-of-two-libcs.md => content/blog/A-story-of-two-libcs.md +1 -0
@@ 1,6 1,7 @@
---
title: A tale of two libcs
date: 2020-09-25
outputs: ["html", "gemtext"]
---

I received a bug report from Debian today, who had fed some garbage into

A content/blog/Gemini-TOFU.gmi => content/blog/Gemini-TOFU.gmi +54 -0
@@ 0,0 1,54 @@
I will have more to say about Gemini in the future, but for now, I wanted to write up some details about one thing in particular: the trust-on-first-use algorithm I implemented for my client, gmni. I think you should implement this algorithm, too!

=> /gmni.gmi gmni: A gemini client

First of all, it's important to note that the Gemini specification explicitly mentions TOFU and the role of self-signed certificates: they are the norm in Geminiland, and if your client does not support them then you're going to be unable to browse many sites. However, the exact details are left up to the implementation. Here's what mine does:

First, on startup, it finds the known_hosts file. For my client, this is `~/.local/share/gmni/known_hosts` (the exact path is adjusted as necessary per the XDG basedirs specification). Each line of this file represents a known host, and each host has four fields separated by spaces, in this order:

* Hostname (e.g. gemini.circumlunar.space)
* Fingerprint algorithm (e.g. SHA-512)
* Fingerprint, in hexadecimal, with ':' between each octet (e.g. 55:01:D8...)
* Unix timestamp of the certificate's notAfter date

If a known_hosts entry is encountered with a hashing algorithm you don't understand, it is disregarded.

Then, when processing a request and deciding whether or not to trust its certificate, take the following steps:

1. Verify that the certificate makes sense. Check the notBefore and notAfter dates against the current time, and check that the hostname is correct (including wildcards). Apply any other scrutiny you want, like enforcing a good hash algorithm or an upper limit on the expiration date. If these checks do not pass, the trust state is INVALID, GOTO 5.

2. Compute the certificate's fingerprint. Use the entire certificate (in OpenSSL terms, `X509_digest` will do this), not just the public key.†

3. Look up the known_hosts record for this hostname. If one is found, but the record is expired, disregard it. If one is found, and the fingerprint does not match, the trust state is UNTRUSTED, GOTO 5. Otherwise, the trust state is TRUSTED. GOTO 7.

4. The trust state is UNKNOWN. GOTO 5.

5. Display information about the certficate and its trust state to the user, and prompt them to choose an action, from the following options:

* If INVALID, the user's choices are ABORT or TRUST_TEMPORARY.
* If UNKNOWN, the user's choices are ABORT, TRUST_TEMPORARY, or TRUST_ALWAYS.
* If UNTRUSTED, abort the request and display a diagnostic message. The user must manually edit the known_hosts file to correct the issue.

6. Complete the requested action:

* If ABORT, terminate the request.
* If TRUST_TEMPORARY, update the session's list of known hosts.
* If TRUST_ALWAYS, append a record to the known_hosts file and update the session's list of known hosts.

7. Allow the request to proceed.

† Rationale: this fingerprint matches the output of `openssl x509 -sha512 -fingerprint`.

If the trust state is UNKNOWN, instead of requring user input to proceed, the implementation MAY proceed with the request IF the UI displays that a new certificate was trusted and provides a means to review the certificate and revoke that trust.

Note that being signed by a certificate authority in the system trust store is not considered meaningful to this algorithm. Such a cert is TOFU'd all the same.

That's it! If you have feedback on this approach, please send me an email.

=> mailto:sir@cmpwn.com Send me an email

My implementation doesn't *entirely* match this behavior, but it's close and I'll finish it up before 1.0. If you want to read the code, here it is:

=> https://git.sr.ht/~sircmpwn/gmni/tree/master/src/tofu.c src/tofu.c

Bonus recommendation for servers: you *should* use a self-signed certificate, and you *should not* use a certificate signed by one of the mainstream certificate authorities. We don't need to carry along the legacy CA cabal into our brave new Gemini future.

M content/blog/Gemini-TOFU.md => content/blog/Gemini-TOFU.md +1 -0
@@ 1,6 1,7 @@
---
title: TOFU recommendations for Gemini
date: 2020-09-21
outputs: [html, gemtext]
---

I will have more to say about [Gemini][0] in the future, but for now, I wanted to

A content/blog/Gemini-and-Hugo.gmi => content/blog/Gemini-and-Hugo.gmi +88 -0
@@ 0,0 1,88 @@
This is my first Gemini-exclusive blog post. Enjoy!

My blog on the WWW is managed by Hugo, a static site generator written in Go.

=> https://drewdevault.com My home page on the WWW
=> https://gohugo.io Hugo

I want to have something similar set up to allow me to more easily share content between my WWW site and my Gemini site, and so today I set out to teach Hugo about Gemini. At first I expected to be patching Hugo, but I was able to get something with a reasonable level of workitude with the OOTB tools for custom output formats.

=> https://gohugo.io/templates/output-formats/ Hugo's custom output formats

I had these goals from the outset:

1. I wanted to opt-in to mixing content between the Gemini site and the WWW site. Not all WWW content is appropriate to Gemini.

2. By no means was I going to attempt an automated translation of Markdown (the original source for my WWW articles) to Gemini. The Gemini experience should be first-class, so a manual translation was called for.

3. Some means of having Gemini-exclusive content is desirable. Not just blog posts like this, but also pages like information about my Gemini software.

=> /gmni.gmi gmni: a gemini client
=> /gmnisrv.gmi gmnisrv: a gemini server

In order to accomplish these goals, I needed to set aside some kind of Gemini-specific output directory for Hugo, convince it to read Gemtext alternate versions of my pages, and figure out how to designate some pages as Gemini-only. Turns out Hugo already supports custom output formats, which can have their own templates and needn't be HTML. The relevant config.toml additions, for me, were:

```
[mediaTypes]
[mediaTypes."text/gemini"]
suffixes = ["gmi"]

[outputFormats]
[outputFormats.Gemini]
name = "GEMTEXT"
isPlainText = true
isHTML = false
mediaType = "text/gemini"
protocol = "gemini://"
permalinkable = true
path = "gemini/"
```

This also accomplishes another goal: by adding `path = "gemini/"`, I can cordon the Gemini content off into a subdirectory, and avoid polluting the Gemini site with WWW content or vice versa.

However, after a few minutes trying to figure out how this worked, it dawned upon me that Hugo does not support custom *input* formats as well. This made goal #2 a challenge. Ultimately I came up with the following hack for layouts/blog/single.gmi:

```
# {{$.Title}}

{{ trim (readFile (replace $.File.Path ".md" ".gmi")) "\n" | safeHTML }}

(further templating code trimmed)
```

This just swaps .md for .gmi in the file extension of the input file, then reads it and runs it through safeHTML to get rid of the typical HTML garbage (e.g. &amp;). Gemtext is whitespace-sensitive, so I also trim off any leading or trailing newlines so that I can make it flow more nicely into the templated content.

In order to write a Gemini version of an article, I add `outputs: [html, gemtext]` to the frontmatter of the WWW version, then write a gemtext version to the same file path s/.html/.gmi/. Easy!

I was also able to write a layout for the Gemini index page which enumerates all of the articles with Gemini versions:

```
## Blog posts
{{ range (where .Site.RegularPages "Section" "blog") }}
{{- if .OutputFormats.Get "gemtext" }}
=> {{replace .Permalink "/gemini" "" 1}} {{.Date.Format "January 2, 2006"}}: {{.Title}}{{ end }}{{ end }}
```

Gemini's sensitivity to whitespace is again the reason why this is a bit ugly. A similar change to the WWW index page omits articles which have no HTML version. Also note the replacing of "/gemini" with "" in the permalinks - this was necessary to un-do the path = "gemini/" from config.toml so that once the gemini subdirectory was rehomed as the root of a Gemini site, the links lined up right.

I also wanted to generate a Gemini-specific RSS feed. I updated config.toml with another custom format:

```
[outputFormats.GEMRSS]
name = "GEMRSS"
isHTML = false
mediaType = "application/rss+xml"
protocol = "gemini://"
path = "gemini/"
```

Then I updated the default output formats for "section"-class pages, i.e. blog posts.

```
[outputs]
section = ["HTML", "RSS", "GEMRSS"]
```

layouts/_default/section.gemrss.xml renders the feed, but I'll let you read that on your own time rather than paste that mess into this article. An oddity that I decided not to care about is that the rendered feed is *not* output to the gemini directory - I'll just update my build script to move it to the right location after Hugo finishes its work.

And that's it! A few minor tweaks & updates to my deploy script and this is ready to ship. Tada! Thanks for having me here in Geminispace - I'm enjoying my stay.

A content/blog/Gemini-and-Hugo.md => content/blog/Gemini-and-Hugo.md +5 -0
@@ 0,0 1,5 @@
---
title: Gemini and Hugo
date: 2020-09-27
outputs: [gemtext]
---

A content/gmni.gmi => content/gmni.gmi +16 -0
@@ 0,0 1,16 @@
# gmni: a Gemini client

gmni is a client for the Gemini protocol. Included are:

* A CLI utility (like curl): gmni
* A line-mode browser: gmnlm

=> https://sr.ht/~sircmpwn/gmni Development information
=> https://git.sr.ht/~sircmpwn/gmni Source code (git)

Browser features:

* Page history
* Regex searches
* Bookmarks
* TOFU support

A layouts/_default/section.gemrss.xml => layouts/_default/section.gemrss.xml +44 -0
@@ 0,0 1,44 @@
{{- $pctx := . -}}
{{- if .IsHome -}}{{ $pctx = .Site }}{{- end -}}
{{- $pages := slice -}}
{{- if or $.IsHome $.IsSection -}}
{{- $pages = $pctx.RegularPages -}}
{{- else -}}
{{- $pages = $pctx.Pages -}}
{{- end -}}
{{- $limit := .Site.Config.Services.RSS.Limit -}}
{{- if ge $limit 1 -}}
{{- $pages = $pages | first $limit -}}
{{- end -}}
{{- printf "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?>" | safeHTML }}
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Drew DeVault's Geminispace blog</title>
    <link>gemini://drewdevault.com</link>
    <description>Drew DeVault's Geminispace blog</description>
    <generator>Hugo -- gohugo.io</generator>{{ with .Site.LanguageCode }}
    <language>{{.}}</language>{{end}}{{ with .Site.Author.email }}
    <managingEditor>{{.}}{{ with $.Site.Author.name }} ({{.}}){{end}}</managingEditor>{{end}}{{ with .Site.Author.email }}
    <webMaster>{{.}}{{ with $.Site.Author.name }} ({{.}}){{end}}</webMaster>{{end}}{{ with .Site.Copyright }}
    <copyright>{{.}}</copyright>{{end}}{{ if not .Date.IsZero }}
    <lastBuildDate>{{ .Date.Format "Mon, 02 Jan 2006 15:04:05 -0700" | safeHTML }}</lastBuildDate>{{ end }}
    {{ with .OutputFormats.Get "RSS" }}
      {{ printf "<atom:link href=%q rel=\"self\" type=%q />" .Permalink .MediaType | safeHTML }}
    {{ end }}
    {{ range $pages }}
    {{- if .OutputFormats.Get "GEMTEXT" -}}
    <item>
      <title>{{ .Title }}</title>
      {{ with .OutputFormats.Get "GEMTEXT" }}
      <link>{{replace .Permalink "/gemini" "" 1}}</link>
      {{ end }}
      <pubDate>{{ .Date.Format "Mon, 02 Jan 2006 15:04:05 -0700" | safeHTML }}</pubDate>
      {{ with .Site.Author.email }}<author>{{.}}{{ with $.Site.Author.name }} ({{.}}){{end}}</author>{{end}}
      {{ with .OutputFormats.Get "GEMTEXT" }}
      <guid>{{replace .Permalink "/gemini" "" 1}}</guid>
      {{ end }}
    </item>
    {{- end -}}
    {{ end }}
  </channel>
</rss>

A layouts/blog/single.gmi => layouts/blog/single.gmi +17 -0
@@ 0,0 1,17 @@
# {{$.Title}}

{{ trim (readFile (replace $.File.Path ".md" ".gmi")) "\n" | safeHTML }}

```An ASCII art rocket
   \ \_____
###[==_____>
   /_/
```

“{{$.Title}}” was published on {{.Date.Format "January 2, 2006"}}

=> / Back to the home page{{ with .OutputFormats.Get "html" }}
=> {{.Permalink}} View “{{$.Title}}” on the WWW
{{- end }}

The content for this site is CC-BY-SA. The code for this site is MIT.

A layouts/index.gmi => layouts/index.gmi +11 -0
@@ 0,0 1,11 @@
{{readFile (replace (replace $.File.Path ".md" ".gmi") ".html" ".gmi") | safeHTML}}
## Blog posts
{{ range (where .Site.RegularPages "Section" "blog") }}
{{- if .OutputFormats.Get "gemtext" }}
=> {{replace .Permalink "/gemini" "" 1}} {{.Date.Format "January 2, 2006"}}: {{.Title}}{{ end }}{{ end }}

A backlog of additional articles is available on the World Wide Web:

=> https://drewdevault.com Drew DeVault's blog

The content for this site is CC-BY-SA. The code for this site is MIT.

M layouts/index.html => layouts/index.html +2 -0
@@ 5,11 5,13 @@
    <h1>{{$.Title}}</h1>

    {{ range (where .Site.RegularPages "Section" "blog") }}
    {{- if .OutputFormats.Get "html" }}
    <div class="article">
      <span class="date">{{.Date.Format "January 2, 2006"}}</span>
      <a href="{{.Permalink}}">{{.Title}}</a>
    </div>
    {{ end }}
    {{ end }}
  </section>

  <aside>