~nabijaczleweli/voreutils

ref: fc0570ef7873d07f39b7872636ab1d1e0a9fdc37 voreutils/README.md -rw-r--r-- 10.5 KiB
fc0570efнаб Clean up extension stanza for cut.1 3 months ago

#voreutils builds.sr.ht build status Licence

Drop-in Policy-compatible coreutils replacement, at the very least.

This probably wants a better blurb.

GNU coreutils provide the following 105 binaries, according to dpkg -L coreutils | grep bin/ on Buster (8.30-3):

  • ☑ /bin/cat – actually faster for raw catting to/from pipes and to/from files, splice(2)s and copy_file_range(2)s by default
  • ☑ /bin/chgrp
  • ☐ /bin/chmod
  • ☑ /bin/chown
  • ☐ /bin/cp
  • ☐ /bin/date
  • ☑ /bin/dd
  • ☑ /bin/df
  • ☐ /bin/dir
  • ☑ /bin/echo – -n only
  • ☑ /bin/false
  • ☐ /bin/ln
  • ☐ /bin/ls
  • ☑ /bin/mkdir
  • ☑ /bin/mknod
  • ☑ /bin/mktemp
  • ☐ /bin/mv
  • ☑ /bin/pwd
  • ☐ /bin/readlink
  • ☑ /bin/rm
  • ☑ /bin/rmdir
  • ☑ /bin/sleep
  • ☐ /bin/stty
  • ☑ /bin/sync
  • ☐ /bin/touch
  • ☑ /bin/true
  • ☑ /bin/uname
  • ☐ /bin/vdir
  • ☑ /usr/bin/[ — no V7-style -l string -eq ...
  • ☑ /usr/bin/arch
  • ☑ /usr/bin/b2sum
  • ☑ /usr/bin/base32
  • ☑ /usr/bin/base64
  • ☑ /usr/bin/basename
  • ☐ /usr/bin/chcon
  • ☑ /usr/bin/cksum
  • ☐ /usr/bin/comm
  • ☐ /usr/bin/csplit
  • ☑ /usr/bin/cut – #993258: -d only accepts a single byte, must accept character, #992667: -c doesn't seem to actually take characters?, #992666: -n ignored => mangles output
  • ☐ /usr/bin/dircolors
  • ☑ /usr/bin/dirname
  • ☐ /usr/bin/du
  • ☑ /usr/bin/env – some parsing restrixions might be too strict (-0 in particular), but they match GNU env
  • ☑ /usr/bin/expand
  • ☑ /usr/bin/expr
  • ☑ /usr/bin/factor – only u64 (for now?); could also use -hx from NetBSD? also should maybe include primes(6)
  • ☐ /usr/bin/fmt
  • ☑ /usr/bin/fold
  • ☑ /usr/bin/groups
  • ☑ /usr/bin/head
  • ☑ /usr/bin/hostid
  • ☑ /usr/bin/id
  • ☐ /usr/bin/install
  • ☐ /usr/bin/join
  • ☑ /usr/bin/link
  • ☑ /usr/bin/logname
  • ☑ /usr/bin/md5sum
  • ☑ /usr/bin/mkfifo – https://bugs.debian.org/990962 lole
  • ☑ /usr/bin/nice
  • ☐ /usr/bin/nl
  • ☑ /usr/bin/nohup – #1010133: gives up if stderr closed, doesn't run program, GNU nohup returns 125 instead of 127 for processing/no-nohup.out
  • ☑ /usr/bin/nproc
  • ☐ /usr/bin/numfmt
  • ☐ /usr/bin/od
  • ☑ /usr/bin/paste
  • ☑ /usr/bin/pathchk
  • ☐ /usr/bin/pinky
  • ☐ /usr/bin/pr
  • ☑ /usr/bin/printenv
  • ☑ /usr/bin/printf
  • ☐ /usr/bin/ptx
  • ☐ /usr/bin/realpath
  • ☐ /usr/bin/runcon
  • ☑ /usr/bin/seq
  • ☑ /usr/bin/sha1sum
  • ☑ /usr/bin/sha224sum
  • ☑ /usr/bin/sha256sum
  • ☑ /usr/bin/sha384sum
  • ☑ /usr/bin/sha512sum
  • ☑ /usr/bin/shred – a perfunctory novote
  • ☑ /usr/bin/shuf
  • ☐ /usr/bin/sort
  • ☐ /usr/bin/split
  • ☐ /usr/bin/stat
  • ☑ /usr/bin/stdbuf
  • ☑ /usr/bin/sum
  • ☑ /usr/bin/tac
  • ☐ /usr/bin/tail
  • ☑ /usr/bin/tee – --output-error is multiple levels of wrong
  • ☑ /usr/bin/test — see [
  • ☑ /usr/bin/timeout
  • ☑ /usr/bin/tr – implements -C as -c and [=e=] as e: this matches 4.4BSD and GNU tr, but is nevertheless a missing POSIX feature; OTOH, POSIX tr appears to think it operates on characters, which is both very optimistic and explodes instantly, is a horrible and confused hold-over from XPG3, and doesn't match historical implementations; implements [:class:] properly, unlike GNU tr; could also stand to do buffering lower than fgetc/fputc, as I imagine the overhead of calling those for each byte is noninsignificant locked I/O 33MB/s, unlocked I/O 46.6, GNU tr 180-200; coreutils: tr: confusing error message w.r.t. backwards c-c set points at nonconformant behaviour; coreutils: tr: tr.1 (and tr --help) falsely claims -t is only valid when translating
  • ☑ /usr/bin/truncate
  • ☑ /usr/bin/tsort – GNU tsort is turbofucked, and returns 1 for loops
  • ☑ /usr/bin/tty
  • ☑ /usr/bin/unexpand – historically, i've had the worst time of my life writing it, due to coreutils: unexpand: nonconformantly (to both POSIX and heirloom) replaces single spaces with tabs; it also processes bytes, not characters, for some reason; POSIX words tab folding muddily (but see interpretation) + it's unclear non-space/tab blanks are processed (we don't for compat + ease of impl): https://www.mail-archive.com/austin-group-l@opengroup.org/msg09780.html
  • ☐ /usr/bin/uniq
  • ☑ /usr/bin/unlink
  • ☐ /usr/bin/users
  • ☑ /usr/bin/wc
  • ☐ /usr/bin/who
  • ☑ /usr/bin/whoami
  • ☑ /usr/bin/yes
  • ☑ /usr/sbin/chroot
  • ☑ /usr/bin/md5sum.textutils

TODO: import descriptions of the one-line 1BSD-style imports

TODO: multicalls should default to something rather than abort when appropriate like netbsd id(1) maybe? This is already what we do with cksum.

TODO: should posix_fadvise(sequential) where appropriate maybe?

TODO: support SMACK in addition to SELinux? or don't either way, rip it out (rm has it) or add it in

TODO: support TrustedBSD maybe?

TODO: filesize.js-style (include/vore-human) sizing might be a bit suboptimal for this (df) sort of display?

TODO? does "UNIX Programmer's Manual" want to have some part/entirety .Tned

TODO: some sort of consistent uid/gid/pwent/grent caching?

TODO: probably want to generate each locale at most once

#Building

You'll need a non-ancient C++ toolchain, a POSIX AWK, GNU make, mandoc (linting and HTML manuals only, MANDOC=: to disable), and shellcheck (for shell wrappers, SHELLCHECK=: to disable).

libb2 and libcrypto are required (searched with pkg-config if available). It'd be just libcrypto if the implementation correctly used the result size in EVP_MD.

libselinux-dev and pkg-config will provide SELinux support.

Run GNU make. See the head of the Makefile for tunables, notably VOREUTILS_VERSION, derived from git HEAD by default, and VOREUTILS_DATE{,_MODE}, derived from the latest git commit affecting each file by default, OUTDIR (and {CMD,LIB,MAN,HTMLMAN}DIR) where artifacts land, and OBJDIR where intermediate objects land; these can all be set independently, SYMLINK, if set to "y", will link binary altnames together symbolically. VOREUTILS_LIB_PREFIX (/usr/lib/voreutils/) is the location of libstdbuf.

#Installation

If you just want the manuals, copy MANDIR (out/man by default) to somewhere in your $MANPATH (like /usr/local/man).

Otherwise, point your $PATH at CMDDIR (out/cmd); if you're using groff, $MANPATH works automagically, otherwise, adjust it as well. Or copy {CMD,LIB,MAN}DIR (out/{cmd,lib,man}) to ~/{bin,lib,man}.

If you're feeling brave, copy them to /usr/local/{bin,lib,man}, which globally masks your system coreutils. Depending on which GNU coreutils bugs your system depends on, this may be undesirable.

VOREUTILS_LIB_PREFIX needs to be set correctly (to the final destination of LIBDIR) at build time for stdbuf to work right.

#Organisation

Who knows yet!

The version is included in each output file, via the .version directive (=> it ends up in .note).

Unlocked stdio used by default, toggle comment in include/vore-stdio to disable (for testing or otherwise). TODO: temporarily permanently disabled for testing; enable later.

If C++ were good, gcc would have [[no_destroy]]; it doesn't: use placement new for (function-)static maps et al. like rm.cpp. Ideally we could do the same to main()-scope variables, but it's too verbose.

The environment block is read-only (except as hidden by libc et al., but that's hidden) – argv and environ are const char * const *.

argc doesn't exist because argv is a forward iterator: consecutive elements are *(argv + n); argv[0] is self.

Enable in-line eqn(1) with

.EQ
delim %%
.EN

(or whichever delimiter is best) after .Sh DESCRIPTION and disable it at the end.

If typesetting something that doesn't work in nroff mode (like the big equations in base64.1) provide an .ie n/.el alternative in .Fn-like syntax; otherwise (like the polynomial in cksum.1) enable eqn(1) preprocessing in man(1) by starting with '\" e.

If typesetting something that doesn't work in troff mode, prefer .ie t (cf. pathchk.1).

In mandoc delimited eqn(1) breaks conditionals, wrap them in braces (.el \{ [text] % eqn % [text] \}).

#Tests

Need to be attached to a teletype. Use script(1), for example, if the test environment doesn't allocate one by default.

Test data is compacted per data directory w/find -exec b2sum {} + | sort | mawk '{h = substr($0, 1, 128); fn = substr($0, 1 + 128 + 2); if(h == hash) {tgt = "." fname; split(fn, curs, "/"); if(curs[2] == fnames[2]) tgt = fnames[3]; print "ln -sf -- \"" tgt "\" \"" fn "\""} else {hash = h; fname = fn; split(fname, fnames, "/")}}' | sh.

#Compatibility

Free UNIXes, hopefully. Debian, OpenBSD, and FreeBSD are on CI, as normal, bare, and fucked baselines, respectively. I also test on NetBSD (and TODO: some Illumos distro) before release.

#Contributing

Send a patch inline, as an attachment, or a git link and a ref to pull from to the list (~nabijaczleweli/voreutils@lists.sr.ht) or me directly. I'm not picky, just please include the repo name in the subject prefix.

#Discussion

Please use the tracker, the list, or Twitter.

#Licences

Except where noted otherwise (e.g. in the headers of files from NetBSD) (if there end up being any), all contents of this repository are subject to the 0-clause BSD licence.

#Special thanks

To all who support further development on Patreon, in particular:

  • ThePhD
  • Embark Studios
  • Jasper Bekkers