~sircmpwn/drewdevault.com

66b44a4fee0a8df0afbf5c18d738997ecd780c89 — Drew DeVault a month ago 6964eed
fuck you glibc
2 files changed, 252 insertions(+), 1 deletions(-)

M assets/main.scss
A content/blog/2020-09-25-A-story-of-two-libcs.md
M assets/main.scss => assets/main.scss +9 -1
@@ 136,7 136,7 @@ pre {
    font-style: italic;
  }

  .c1 {
  .cm, .c1 {
    color: #333;
    font-style: italic;
  }


@@ 228,3 228,11 @@ details[open] {
.text-center {
  text-align: center;
}

blockquote {
  border-left: 5px solid #bbb;
  background-color: #eee;
  padding: 0 1rem;
  margin-left: calc(-1rem - 5px);
  margin-right: -1rem;
}

A content/blog/2020-09-25-A-story-of-two-libcs.md => content/blog/2020-09-25-A-story-of-two-libcs.md +243 -0
@@ 0,0 1,243 @@
---
title: A story of two libc's
date: 2020-09-25
---

I received a bug report from Debian today, who had fed some garbage into
[scdoc](https://git.sr.ht/~sircmpwn/scdoc), and it gave them a SIGSEV back.
Diving into this problem gave me a good opportunity to draw a comparison between
musl libc and glibc. Let's start with the stack trace:

```
==26267==ERROR: AddressSanitizer: SEGV on unknown address 0x7f9925764184
(pc 0x0000004c5d4d bp 0x000000000002 sp 0x7ffe7f8574d0 T0)
==26267==The signal is caused by a READ memory access.
    0 0x4c5d4d in parse_text /scdoc/src/main.c:223:61
    1 0x4c476c in parse_document /scdoc/src/main.c
    2 0x4c3544 in main /scdoc/src/main.c:763:2
    3 0x7f99252ab0b2 in __libc_start_main
/build/glibc-YYA7BZ/glibc-2.31/csu/../csu/libc-start.c:308:16
    4 0x41b3fd in _start (/scdoc/scdoc+0x41b3fd)
```

And if we pull up that line of code, we find...

```c
if (!isalnum(last) || ((p->flags & FORMAT_UNDERLINE) && !isalnum(next))) {
```

Hint: p is a valid pointer. "last" and "next" are both uint32_t. The segfault
happens in the second call to isalnum. And, the key: it can only be reproduced
on glibc, not on musl libc. If you did a double-take, you're not alone. There's
nothing here which could have caused a segfault.

Since it was narrowed down to glibc, I pulled up the source code and went
digging for the isalnum implementation, expecting some stupid bullshit. But
before I get into their stupid bullshit, of which I can assure you there is *a
lot*, let's briefly review the happy version. This is what the musl libc
`isalnum` implementation looks like:

```c
int isalnum(int c)
{
	return isalpha(c) || isdigit(c);
}

int isalpha(int c)
{
	return ((unsigned)c|32)-'a' < 26;
}

int isdigit(int c)
{
	return (unsigned)c-'0' < 10;
}
```

As expected, for any value of `c`, isalnum will never segfault. Because why the
fuck would isalnum segfault? Okay, now, let's compare this to the
[glibc implementation][ctype]. When opening this header, you're greeted with the
typical GNU bullshit, but let's trudge through and grep for isalnum.

[ctype]: https://sourceware.org/git/?p=glibc.git;a=blob;f=ctype/ctype.h;h=351495aa4feaf23993fe65afc0760615268d044e;hb=HEAD

The first result is this:

```c
enum
{
  _ISupper = _ISbit (0),        /* UPPERCASE.  */
  _ISlower = _ISbit (1),        /* lowercase.  */
  // ...
  _ISalnum = _ISbit (11)        /* Alphanumeric.  */
};
```

This looks like an implementation detail, let's move on.

```c
__exctype (isalnum);
```

But what's `__exctype`? Back up the file a few lines...

```c
#define __exctype(name) extern int name (int) __THROW
```

Okay, apparently that's just the prototype. Not sure why they felt the need to
write a macro for that. Next search result...

```c
#if !defined __NO_CTYPE
# ifdef __isctype_f
__isctype_f (alnum)
// ...
```


Okay, this looks useful. What is `__isctype_f`? Back up the file now...

```c
#ifndef __cplusplus
# define __isctype(c, type) \
  ((*__ctype_b_loc ())[(int) (c)] & (unsigned short int) type)
#elif defined __USE_EXTERN_INLINES
# define __isctype_f(type) \
  __extern_inline int                                                         \
  is##type (int __c) __THROW                                                  \
  {                                                                           \
    return (*__ctype_b_loc ())[(int) (__c)] & (unsigned short int) _IS##type; \
  }
#endif
```

Oh.... oh dear. It's okay, we'll work through this together. Let's see,
`__isctype_f` is some kind of inline function... wait, this is the else branch
of `#ifndef __cplusplus`. Dead end. Where the fuck is isalnum *actually*
defined? Grep again... okay... here we are?

```c
#if !defined __NO_CTYPE
# ifdef __isctype_f
__isctype_f (alnum)
// ...
# elif defined __isctype
# define isalnum(c)     __isctype((c), _ISalnum)
```

Hey, there's that implementation detail from earlier! Remember this?

```c
enum
{
  _ISupper = _ISbit (0),        /* UPPERCASE.  */
  _ISlower = _ISbit (1),        /* lowercase.  */
  // ...
  _ISalnum = _ISbit (11)        /* Alphanumeric.  */
};
```

Let's suss that out macro real quick:

```c
# include <bits/endian.h>
# if __BYTE_ORDER == __BIG_ENDIAN
#  define _ISbit(bit)   (1 << (bit))
# else /* __BYTE_ORDER == __LITTLE_ENDIAN */
#  define _ISbit(bit)   ((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8))
# endif
```

Oh, for fuck's sake. Whatever, let's move on and just assume this is a magic
number. The other macro is `__isctype`, which is similar to the `__isctype_f` we
were just looking at a moment ago. Let's go look at that `ifndef __cplusplus`
branch again:

```c
#ifndef __cplusplus
# define __isctype(c, type) \
  ((*__ctype_b_loc ())[(int) (c)] & (unsigned short int) type)
#elif defined __USE_EXTERN_INLINES
// ...
#endif
```

...

Well, at least we have a pointer dereference now, that could explain the
segfault. What's `__ctype_b_loc`?

```c
/* These are defined in ctype-info.c.
   The declarations here must match those in localeinfo.h.

   In the thread-specific locale model (see `uselocale' in <locale.h>)
   we cannot use global variables for these as was done in the past.
   Instead, the following accessor functions return the address of
   each variable, which is local to the current thread if multithreaded.

   These point into arrays of 384, so they can be indexed by any `unsigned
   char' value [0,255]; by EOF (-1); or by any `signed char' value
   [-128,-1).  ISO C requires that the ctype functions work for `unsigned
   char' values and for EOF; we also support negative `signed char' values
   for broken old programs.  The case conversion arrays are of `int's
   rather than `unsigned char's because tolower (EOF) must be EOF, which
   doesn't fit into an `unsigned char'.  But today more important is that
   the arrays are also used for multi-byte character sets.  */
extern const unsigned short int **__ctype_b_loc (void)
     __THROW __attribute__ ((__const__));
extern const __int32_t **__ctype_tolower_loc (void)
     __THROW __attribute__ ((__const__));
extern const __int32_t **__ctype_toupper_loc (void)
     __THROW __attribute__ ((__const__));
```

That is just so, super cool of you, glibc. I just *love* dealing with locales.
Anyway, my segfaulted process is sitting in gdb, and equipped with all of this
information I wrote the following monstrosity:

```
(gdb) print ((unsigned int **(*)(void))__ctype_b_loc)()[next]
Cannot access memory at address 0x11dfa68
```

Segfault found. Reading that comment again, we see "ISO C requires that the
ctype functions work for 'unsigned char' values and for EOF". If we
cross-reference that with the specification:

> In all cases [of functions defined by ctype.h,] the argument is an int, the
> value of which shall be representable as an unsigned char or shall equal the
> value of the macro EOF.

So the fix is obvious at this point. Okay, fine, my bad. My code is wrong. I
apparently cannot just hand a UCS-32 codepoint to isalnum and expect it to tell
me if it's between 0x30-0x39, 0x41-0x5A, or 0x61-0x7A.

But, I'm going to go out on a limb here: maybe isalnum should never cause a
program to segfault no matter what input you give it. Maybe because the spec
says you *can* does not mean you *should*. Maybe, just maybe, the behavior of
this function should not depend on five macros, whether or not you're using a
C++ compiler, the endianness of your machine, a look-up table, thread-local
storage, and two pointer dereferences.

Here's the musl version as a quick reminder:

```c
int isalnum(int c)
{
	return isalpha(c) || isdigit(c);
}

int isalpha(int c)
{
	return ((unsigned)c|32)-'a' < 26;
}

int isdigit(int c)
{
	return (unsigned)c-'0' < 10;
}
```

Bye!