Drop-in Policy-compatible coreutils replacement, at the very least.
This probably wants a better blurb.
GNU coreutils provide the following 106 binaries, according to dpkg -L coreutils | grep bin/
on Bullseye (8.32-4+b1):
- ☑ /bin/cat – actually faster for raw catting to/from pipes/files/blockdevs under Linux:
splice(2)
s/copy_file_range(2)
s/sendfile(2)
s by default
- ☑ /bin/chgrp
- ☑ /bin/chmod
- ☑ /bin/chown
- ☐ /bin/cp
- ☐ /bin/date – #1014436: +%ON doesn't use alternate representation when available, always writes Arabic digits, #1014497: %z/%:z/%::z/%:::z interact with O/E modifiers and widths unpredictably, tangentially #444589: strftime(%Y) overflows into the negatives for very positive years
- ☑ /bin/dd – #1014839: oflag=seek_bytes brokenly interacts with sub-obs= truncation (polite spelling of "dd data loss moment lmao")
- ☑ /bin/df – #1014366: -P still defaults to blocksize=1024), #1014971: applies statvfs(2) values of topmost filesystem in overmount stack to all the ones below; also #863955: -l does not display zfs dataset containing a colon ":", and data for --total overflows unchecked (this is, admittedly, mildly pathological)
- ☐ /bin/dir
- ☑ /bin/echo –
-n
only, but see note inside
- ☑ /bin/false
- ☐ /bin/ln
- ☐ /bin/ls
- ☑ /bin/mkdir
- ☑ /bin/mknod
- ☑ /bin/mktemp
- ☐ /bin/mv
- ☑ /bin/pwd
- ☐ /bin/readlink
- ☑ /bin/rm – #1015273: -d doesn't try to remove unreadable directories, lies in error message, fails to prompt with -i
- ☑ /bin/rmdir
- ☑ /bin/sleep
- ☑ /bin/stty – #1018790: "ispeed anything" is valid and doesn't seem to do anything, #1018803: drain/-drain doesn't interact with -g/-a/default output?, #1018806: everything after -- ignored?, #1018844: raw and cooked behaviour don't match documentation,
#1018919: default/-a output for characters wrong for disabled ones actually POSIX default output/-a _POSIX_VDISABLE
character "undef
", contrasts with "<undef>
" on all known implementations? Formatting loss->overinterpretation since Issue 6. → POSIX 0001604: default output for control characters, #1018958: default output doesn't actually show "deviations from stty sane" for c_cc[VTIME] and c_cc[VMIN], #1019344: * (non-POSIX) markings in manual wrong), #1019463: unchecked overflow for rows/cols, checked for all other arguments, #1019466: CHAR empty-string argument undocumented, ^C underconstrained, #1019468: changing ispeed/ospeed says "unable to perform all requested operations" but the change sticks, #1027442: soft-wrapping broken, wraps to cols+1; also XCU, stty, CHANGE HISTORY, Issue 6, para. 1 implies -xcase is still available, ispeed number/ospeed number descriptions copy-paste error, min/time are "used in non-canonical mode input processing (icanon)" – all fixed in 202x draft 3
- ☑ /bin/sync
- ☐ /bin/touch
- ☑ /bin/true
- ☑ /bin/uname
- ☐ /bin/vdir
- ☑ /usr/bin/[ – no V7-style
-l string -eq ...
- ☑ /usr/bin/arch
- ☑ /usr/bin/b2sum
- ☑ /usr/bin/base32
- ☑ /usr/bin/base64
- ☑ /usr/bin/basename
- ☑ /usr/bin/basenc
- ☑ /usr/bin/chcon – #1015959: --reference always follows the link, even with -h; no way to use a link's context?, #1015996: -v always says "changing security context" even when it doesn't?
- ☑ /usr/bin/cksum – POSIX format specifier ambiguous/wrong?
- ☑ /usr/bin/comm – #1014008: --output-delimiter= (empty) separates with a single NUL, not the empty string, vice versa for --total, also the documentation for
--[no]check-order
is, uh, Not Great
- ☑ /usr/bin/csplit – #1029103: -z misdocumented, #1029119: makes an empty file when an expression goes backwards, must error per POSIX, #1029120: handles {*} repetitions inconsistently, #1029220: expr with lookbehind loses lines vs equivalent no-offset construct, #1029264: /regex/+1 matches past the end of the file?, #1029335: regexes don't match literal newline at the end, must do so per POSIX and for portability; also rexp-style operand needs to match the entire line, including the new-line, so /q$/ never matches; probably shouldn't need to?
- ☑ /usr/bin/cut – #993258: -d only accepts a single byte, must accept character, #992667: -c doesn't seem to actually take characters?, #992666: -n ignored => mangles output
- ☐ /usr/bin/dircolors
- ☑ /usr/bin/dirname
- ☑ /usr/bin/du – #1014357: -d0 documented to be the same as -s, but isn't, in two different ways, #1014360: -t is underdocumented and a lie, #1014361: --time-style=/$TIME_STYLE error message wrong, #1014622: -S breaks --time output in two different ways; -c --time always broken, #1014738: --exclude 4x slower when given a trivial string than with an equivalent glob?, what the title doesn't say is that it's actually 471.46x slower than the trivialest of optimisations; also honorary #1007777: consistently SIGABRTs on specific directory; no apparent reason? because I was remound of this one and we go to completion
- ☑ /usr/bin/env – #1016049: --{default,ignore,block}-signal with empty argument is valid and does nothing?. some parsing restrixions might be too strict (-0 in particular), but they match GNU env
- ☑ /usr/bin/expand
- ☑ /usr/bin/expr
- ☑ /usr/bin/factor – could also use -hx from NetBSD? also should maybe include primes(6)
- ☐ /usr/bin/fmt
- ☑ /usr/bin/fold
- ☑ /usr/bin/groups
- ☑ /usr/bin/head – same deal as cat for byte prefix (
-c128
)
- ☑ /usr/bin/hostid
- ☑ /usr/bin/id
- ☑ /usr/bin/install – #1034421: install(1): says -v only notes directory creation, but also writes regular file creation, #1034429: -s runs ["strip", $f] instead of ["strip", "--", $f]; should use strip from $STRIP,
- ☐ /usr/bin/join
- ☑ /usr/bin/link
- ☑ /usr/bin/logname
- ☑ /usr/bin/md5sum
- ☑ /usr/bin/mkfifo – https://bugs.debian.org/990962 lole
- ☑ /usr/bin/nice
- ☑ /usr/bin/nl – #1034602: resets line numbers on each new section instead of new page, POSIX and SysV violation; also XCU nl STDOUT bears no relation to actual output from real implementations (unnumbered lines), XCU nl STDOUT bears no relation to actual output from real implementations (heading lines), XCU nl carries vestigial LC_CTYPE which contradicts other text since Issue 5, XCU nl -l described in terms of "blank lines", all implementations actually use empty lines; -l works across sections; no non-standard
-d
modes
- ☑ /usr/bin/nohup – #1010133: gives up if stderr closed, doesn't run program, GNU nohup returns 125 instead of 127 for processing/no-nohup.out
- ☑ /usr/bin/nproc
- ☐ /usr/bin/numfmt
- ☐ /usr/bin/od
- ☑ /usr/bin/paste – #1025342: tokenises -d on bytes, not characters, contrary to POSIX
- ☑ /usr/bin/pathchk
- ☑ /usr/bin/pinky – #1016117: columnation broken, GECOS cut off in short mode?
- ☑ /usr/bin/pr – #1034808: -d makes -l $odd enter a busy loop (seems to've only changed 1, but also happens with 11 :); let's see how it shakes out), #1034849: -ien don't accept characters for tab override, only bytes, #1034857: -p missing, #1034858: date format is %F %R always, must be %b %e %H:%M %Y if LC_TIME=C, #1034859: -s disables truncation in columnated output w/o -w, must be equivalent to -w512, #1035112: -e broken?, #1035117: -o + -n break the output completely, #1035319: -d ejects extraneous line at last page, #1035388: form feed behaves erratically, deviates from SVr4, #1035519: -i not respected for -o in header, #1035533: manual says -F changes header depth, #1035534: -s broken entirely?, #1035586: -n overly narrows the text in the columns? and eats the separator?, #1035591: -n width explodes sometimes?, #1035596: --number-lines= yields "extra characters or invalid number in the argument: ‘SHELL=/bin/bash’: ERANGE", #1035599: taking +0 to be a file-name is inconsistent with refusing +99999999999999999999, diverges from SysVr4, #1035600: columnation just doesn't work? at all?, #1036194: writes errors as encountered even if writing to teletype, also
printf 'a\t\ncd\n' | pr -2F
behaves weirdly but idk if I had that in one of these 15
- ☑ /usr/bin/printenv
- ☑ /usr/bin/printf –
#1017110: 'X takes X to be a byte, not a character (and missing in manual)? fixed in 9.1-1, refuses ~all unicode values smaller than 0x7F for \u and \U?
- ☐ /usr/bin/ptx
- ☐ /usr/bin/realpath
- ☑ /usr/bin/runcon – #1013924: -c
getfscon()
s program verbatim but execve()
s it; trojan moment?, cf. BUGS
- ☑ /usr/bin/seq
- ☑ /usr/bin/sha1sum
- ☑ /usr/bin/sha224sum
- ☑ /usr/bin/sha256sum
- ☑ /usr/bin/sha384sum
- ☑ /usr/bin/sha512sum
- ☑ /usr/bin/shred – a perfunctory novote
- ☑ /usr/bin/shuf – #1027412: -i EOVERFLOW when speccing either side of the range >2^32 (ILP32 only: limited to size_t?!?), #1027413: -i forbids 2^64-1/2^32-1, max value is ...-2?
- ☐ /usr/bin/sort
- ☑ /usr/bin/split – #1036651: -n <digit> with (some?) devices fails with EOVERFLOW, accepts some chardevs?, #1036827: --additional-suffix= doesn't always find the /, but also shouldn't look for it?
- ☐ /usr/bin/stat – #1034416: stat.1: are there any shells which actually provide stat as a built-in?
- ☑ /usr/bin/stdbuf
- ☑ /usr/bin/sum
- ☑ /usr/bin/tac
- ☐ /usr/bin/tail
- ☑ /usr/bin/tee –
--output-error
is multiple levels of wrong
- ☑ /usr/bin/test – see
[
- ☑ /usr/bin/timeout
- ☑ /usr/bin/tr – implements -C as -c and
[=e=]
as e
: this matches 4.4BSD and GNU tr, but is nevertheless a missing POSIX feature; OTOH, POSIX tr appears to think it operates on characters, which is both very optimistic and explodes instantly, is a horrible and confused hold-over from XPG3, and doesn't match historical implementations; implements [:class:]
properly, unlike GNU tr; coreutils: tr: confusing error message w.r.t. backwards c-c set points at nonconformant behaviour; coreutils: tr: tr.1 (and tr --help) falsely claims -t is only valid when translating
- ☑ /usr/bin/truncate
- ☑ /usr/bin/tsort – GNU tsort is turbofucked, and returns 1 for loops
- ☑ /usr/bin/tty
- ☑ /usr/bin/unexpand – historically, i've had the worst time of my life writing it, due to #1012545: nonconformantly (to both POSIX and heirloom) replaces single spaces with tabs; it also processes bytes, not characters, for some reason; POSIX words tab folding muddily (but see interpretation) + it's unclear non-space/tab blanks are processed (we don't for compat + ease of impl): https://www.mail-archive.com/austin-group-l@opengroup.org/msg09780.html
- ☑ /usr/bin/uniq – #1017482: -w and -s select bytes, not characters, #1017643: -i entirely non-functional?, also: what are they cooking?
- ☑ /usr/bin/unlink
- ☑ /usr/bin/users
- ☑ /usr/bin/wc – #1026977: -c optimisation brokenly doesn't consume the input, #1027100: total overflows unchecked, #1027101: -c no longer optimises to
st_size
for regular files
- ☑ /usr/bin/who – #1016456: default isn't -s, -s doesn't force "only name, line, and time" output, #1016492: --ip/--lookup terminally broken with IPv6 entries
- ☑ /usr/bin/whoami
- ☑ /usr/bin/yes – we don't ignore
--
, mirroring POSIX echo
- ☑ /usr/sbin/chroot
- ☑ /usr/bin/md5sum.textutils
Also an honorary glibc bug: #1017852: C locale is 7-bit (127 characters), must be 8-bit (256 characters) since POSIX Issue 7 TC2/Issue 8
TODO? for du/df, we process DU_BLOCK_SIZE=Q BLOCK_SIZE=2 as -B2 not as default block size, unlike coreutils
TODO: import descriptions of the one-line 1BSD-style imports
TODO: ioctl(FICLONE/FICLONERANGE) for head/cat/cp/&c. There's probably something to be gained from FS_IOC_FIEMAP too
TODO: multicalls should default to something rather than abort when appropriate like netbsd id(1) maybe? This is already what we do with cksum
.
TODO: should posix_fadvise(sequential) where appropriate maybe?
TODO: support SMACK in addition to SELinux? or don't either way, rip it out (rm
has it) or add it in
TODO: support TrustedBSD maybe?
#Building
You'll need a non-ancient C++ toolchain, a BSD AWK, GNU make, mandoc (linting and HTML manuals only, MANDOC=true
to disable), and shellcheck (for shell wrappers, SHELLCHECK=:
to disable).
libb2
, libcrypto
, and libacl
are required (searched with pkg-config
if available).
It'd be just libcrypto
and libacl
if the implementation correctly used the result size in EVP_MD.
libselinux-dev
and pkg-config
will provide SELinux support.
libgmp-dev
and pkg-config
will provide fast bignum support for factor.
Run GNU make. See the head of the Makefile for tunables,
notably VOREUTILS_VERSION
, derived from git HEAD by default,
and VOREUTILS_DATE{,_MODE}
, derived from the latest git commit affecting each file by default,
OUTDIR
(and {CMD,LIB,MAN,HTMLMAN}DIR
) where artifacts land, and
OBJDIR
where intermediate objects land; these can all be set independently,
SYMLINK
, if set to "y", will link binary altnames together symbolically.
VOREUTILS_LIB_PREFIX
(/usr/lib/voreutils/
) is the location of libstdbuf
,
VOREUTILS_INSTALL_LINK
(apt
) is the wayward-user cross-ref for install(1).
Makefile.local
is sourced at the top for config persistence.
#Installation
If you just want the manuals, copy MANDIR
(out/man
by default) to somewhere in your $MANPATH
(like /usr/local/man
).
Otherwise, point your $PATH
at CMDDIR
(out/cmd
); if you're using groff, $MANPATH
works automagically, otherwise, adjust it as well.
Or copy {CMD,LIB,MAN}DIR
(out/{cmd,lib,man}
) to ~/{bin,lib,man}
.
If you're feeling brave, copy them to /usr/local/{bin,lib,man}
, which globally masks your system coreutils.
Depending on which GNU coreutils bugs your system depends on, this may be undesirable.
VOREUTILS_LIB_PREFIX
needs to be set correctly (to the final destination of LIBDIR
) at build time for stdbuf
to work right.
#Organisation
Who knows yet!
The version is included in each output file, via the .version
directive (=> it ends up in .note
).
Unlocked stdio used by default, toggle comment in include/vore-stdio
to disable (for testing or otherwise).
TODO: temporarily permanently disabled for testing; enable later.
If C++ were good, gcc would have [[no_destroy]]
; it doesn't: use placement new
for (function-)static maps et al. like rm.cpp
.
Ideally we could do the same to main()
-scope variables, but it's too verbose.
The environment block is read-only (except as hidden by libc et al., but that's hidden) – argv and environ are const char * const *
.
argc
doesn't exist because argv
is a forward iterator: consecutive elements are *(argv + n)
; argv[0]
is self.
Enable in-line eqn(1) with
.EQ
delim %%
.EN
(or whichever delimiter is best) after .Sh DESCRIPTION
and disable it at the end.
If typesetting something that doesn't work in nroff mode (like the big equations in base64.1
) provide an .ie n
/.el
alternative in .Fn
-like syntax;
otherwise (like the polynomial in cksum.1
) enable eqn(1) preprocessing in man(1) by starting with '\" e
.
If typesetting something that doesn't work in troff mode, prefer .ie t
(cf. pathchk.1
).
In mandoc delimited eqn(1) breaks conditionals,
wrap them in braces (.el \{ [text] % eqn % [text] \}
).
#Tests
Need to be attached to a teletype. Use script(1)
, for example, if the test environment doesn't allocate one by default.
Test data is compacted per data directory w/find -exec b2sum {} + | sort | mawk '{h = substr($0, 1, 128); fn = substr($0, 1 + 128 + 2); if(h == hash) {tgt = "." fname; split(fn, curs, "/"); if(curs[2] == fnames[2]) tgt = fnames[3]; print "[ -s \"" fn "\" ] && ln -sf -- \"" tgt "\" \"" fn "\""} else {hash = h; fname = fn; split(fname, fnames, "/")}}' | sh
.
#Compatibility
Free UNIXes, hopefully.
Debian, OpenBSD, and FreeBSD are on CI, as normal, bare, and fucked baselines, respectively.
I also test on NetBSD (and TODO: some Illumos distro) before release.
#Contributing/opining
Post to
the tracker (~nabijaczleweli/voreutils@todo.sr.ht, preferable for bugs),
the list
(~nabijaczleweli/voreutils@lists.sr.ht, preferable for opinions and patches), or
me directly (now with Platform integration!).
Not picky about patches — inline, attachment, and a git link and ref to pull are fine — just please include the repo name in the subject prefix.
#Licence
All contents of this repository are subject to the 0-clause BSD licence.
#Special thanks
To all who support further development on Patreon, in particular:
- ThePhD
- Embark Studios
- Lars Strojny
- EvModder