60ca7467122dfb718110d942c95ee1bf8cdece60 — Chris Vittal 1 year, 2 months ago 73f20d5
Add getopt post.
1 files changed, 107 insertions(+), 0 deletions(-)

A content/posts/2019-10-28_gnu-getopt-sucks.md
A content/posts/2019-10-28_gnu-getopt-sucks.md => content/posts/2019-10-28_gnu-getopt-sucks.md +107 -0
@@ 0,0 1,107 @@
title = "The Problem With GNU getopt; Or, On Standards"
draft = true

GNU `getopt(3)` is broken, the term that they would use is 'nonstandard', but
nonstandard interfaces make software less robust, less portable and less
maintainable. Generally speaking such interfaces are present to create vendor
lock-in. No matter how much GNU "respects your freedom" or allows you to do
anything with their software, they are still software vendors who are
incentivized to make it harder for us to use other versions of software. To be
clear, I prefer GNU's attempts at lock-in over proprietary lock-in, but the
consequences are often the same anyways, non-portable and fragile programs.

The [`getopt(3)`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html)
function is a POSIX system interface implementing argument parsing according to
the [Utility Syntax Guidelines]. It's widely implemented across programming
languages as a sane way to handle command line options like the `-l` in `ls -l`.
Those guidelines require that POSIX conformant utilities place all their
options before their operands. A conformant `getopt` will stop parsing when the
first non-option value is encountered. Needless to say glibc's `getopt` is

[Utility Syntax Guidelines]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_02

The glibc implementation of `getopt` does not conform to the Utility Syntax
Guidelines. In fact, when trying to write a command line C utility, it has some
_extremely surprising_ behavior. To illustrate, I present an example that I ran
into while implementing the `env(1)` command for **ctools**{{fn(id=1)}}.  Here,
`env` is linked to glibc:

$ cat /dev/urandom | base64 | env -i PATH=/usr/bin head -n 15
env: invalid option -- 'n'
env [-i] [name=value]... [utility [argument...]]

What? The `-n` option was clearly for `head`. What happened? It turns out that
glibc's `getopt` _permutes_ the elements of `argv` as it scans. Why? To quote
glibc's [docs](https://www.gnu.org/software/libc/manual/html_node/Using-Getopt.html#Using-Getopt):

> * The default is to permute the contents of argv while scanning it so that
>   eventually all the non-options are at the end. This allows options to be
>   given in any order, even with programs that were not written to expect this.
> * POSIX demands the following behavior: the first non-option stops option
>   processing. This mode is selected by either setting the environment variable
>   POSIXLY\_CORRECT or beginning the options argument string with a plus sign
>   (‘+’).

To summarize, to write a utility conforming to the syntax guidelines with no
dependencies other than the system interfaces under GNU, POSIXLY\_CORRECT must
be set, otherwise, _glibc may break the program_{{fn(id=2)}}. In trying to write
portable, robust, software, there is instead a broken program that doesn't work
on the most installed system in the world.

Why do we even try to standardize? We write standards because we disagree. In
software, we disagree on what algorithm to use to sort a list, or what
programming language to use to write a client to send and read an email.
Standards mean that I don't have to care about what you use in order to know
that my message will be readable, or the list I give you will be sorted. It
doesn't matter if `/bin/true` is an empty file with the execute bit set or
a C program the entire text of which is `int main(void) { return 0; }`. They
both return a true value that my shell that I can rely on in a script. When
I use `getopt` to implement an option parsing for a utility, I expect it to
behave, not _break my program_.

It's not in GNU's interest to break my program, but it's also not not in GNU's
interest to break my program. GNU wants us to write programs for GNU.
A proliferation of programs that only work under GNU makes it more attractive to
install GNU, and less attractive to install alternatives. There may have been
historical reasons for this, GNU's not UNIX after all, but these days GNU/Linux
is way more the 800 pound *NIX gorilla compared to any official

Without standards, even de facto ones, alternatives proliferate. When
programmers can't rely on the properties of the system, they implement their
own. These are generally buggy. We end up with POSIX compatibility
implementations for GNU and GNU behaving implementations for POSIX
systems. These all get fewer eyes on them than standard interfaces and so can
often be buggier than their standard counterparts. Standards can furthermore be
validated against specifications and different implementations.

We need to both push on standards and work within them. They are both an
artifact (like ISO C 1999) and a living process, frustratingly ossified and
a rock solid base to build programs. We should be collaborative in
developing interfaces, actively working to prevent fragmentation in our
ecosystems while always being open to innovation. If we do this we can make
using standards obvious and freeing, rather than difficult and limiting. In
making and using standards, we free ourselves and we free our users, as we and
they can be confident that our utilities and interfaces will work, robustly,
predictably, and everywhere.

  '<a href="https://git.sr.ht/~sircmpwn/ctools" target="_blank">ctools</a> is an
  implementation of strictly conformant core POSIX utilities written in C. The
  self contained nature of every utility makes it a really easy project to
  contribute to. You can find the source of the `env` command I mentioned
  <a href="https://git.sr.ht/~sircmpwn/ctools/tree/master/src/env.c"
  'I sincerely hope that no other utilities on my system rely on behavior that
  will change when POSIXLY\_CORRECT is set, but who am I kidding? 100% chance
  that some other program would break.',
  'Except perhaps macOS, which is often worse as we have to deal with
  proprietary bullshit _and_ GNU bullshit (from 2006!).',