misc-tools
==========
Overview
--------
A collection of miscellaneous tools made in POSIX C99:
* genhtab Generate static C99 hash tables (cf genhtab_bench/ for
a performance comparison with gperf)
* htmldecode HTML decoding to UTF-8
* htmlencode HTML encoding from UTF-8
* mbcut Multibyte aware string trimming
* natsort Natural sorting for UTF-8
* urldecode URL decoding
* urlencode URL encoding
* textwidth Like wcswidth(3) but with tab and backspace expansion (as cursor moving)
Note: for simplicity, Unicode handling is limited to code points, treating
combining characters and emoji as a sequence of code points instead of a
complete grapheme.
Dependencies
------------
A POSIX environment with the following additions at build time:
* Internet access and curl(1), wget(1) or fetch(1)
for htmldecode, mbcut, natsort (to get Unicode data) and
genhtab if USE_XXHASH=true
Building and installation
-------------------------
Building and installation (default (resp. optional) values shown inside curly
(resp. square) brackets):
$ [BIN=<tool>] {CC=c99} {LTO=false} {NATIVE=false} ./build.sh
$ [BIN=<tool>] {DESTDIR=} {PREFIX=/usr/local} ./build.sh install
For genhtab, you can set USE_XXHASH=true to switch from FNV1-A to XXH3/XXH32.
Cleanup:
$ [BIN=<tool>] ./build.sh clean (or mrproper if you the binaries gone too)
Uninstall:
$ [BIN=<tool>] ./build.sh uninstall
Test:
$ ./test.sh [<tool>...]
For all operations, if no <tool> is specified, all tools will be iterated on.
LTO=true if strongly recommended for mbcut and htmldecode, to avoid binary
bloat due to utf8.c containing big Unicode LUTs.