hal: move all condition flags to Low HAL

This allows us to remove the kernel < word and make basic comparison words much

We also remove C) and NC) which aren't used at the moment and have ambiguous
meaning if used with compare, because under ARM, the C flag is inverted.

These flags would have a correct meaning if they were used with shift
instructions, but semantics for this haven't been developed yet, so for now I'm
just removing them.
rpi: ( hello, another comment! )
rpi: hello, this is a comment!
rpi: add missing -W,

This fixes broken - which fixes broken immediate which fixes broken c,
rpi: add findmeta findmod

This brings us to , in bootlo, enabling all 8b and 16b mods.
rpi: add and or xor
rpi: add [!],

This brings us up to :16b in bootlo. Speeding up!

@!+ can be verified with stuff like "$12345678 HERE @!+ -4 HERE +! HERE @@+"
rpi: add [@],

This brings us to @@+ in bootlo, which can be verified with stuff like:
$12345678 HERE @ ! HERE @@+

and again with w@@+ and c@@+ variants.
rpi: add W+n, A+n, and W<>A,

This brings us up to !+ in bootlo. It can be checked with something like
"$12345678 HERE ! HERE c@+ drop c@+ drop c@+ drop c@+" with QEMU monitor checks
in between steps.
rpi: add 16b support to @!,

This brings us to @! in bootlo.

Also, use immediate modes for LDRH and STRH, which simplifies the code. I simply
hadn't notice its presence yet, I thought I was forced to register-based
rpi: add [+n],

This gets us to 1+! and 1-!.
rpi: add +,

This makes +! work.
rpi: add W>A,

Both @ and ! words work now, in all widths.
rpi: add branch, code16b code8b and add 16b support to @ and !

This can be verified by typing "HERE @ HERE w@ HERE c@" and checking r9 in QEMU
monitor at each step.

There's no findmod yet, but the code16b and code8b part can be verified by
dumping the 0x10000-0x10300 memory area from QEMU and seeing that the metadata
structure is fine.
rpi: add @!, and all base PS juggling words

These can be tested with the help of the QEMU monitor. For example,
"1 2 3 rot rot rot" and examining r9 between each rot.
Update README to reduce duplicate with website contents
bootlo: optimize tuck and 2dup

I can't, for the life of me, remember why I wrote them thus during the HAL
rewrite. Minimize stack operations while leveraging the speed of W and A
interactions? or maybe in early HAL designs it made more sense.

In any case, tuck goes from 6 HAL operations to 4 and 2dup goes from 6 HAL
operations to 5.
rpi: add litn and use it in compword
rpi: change call conventions

See "pushret, and popret," section in doc/hal.