~vdupras/duskos

a9fde351f68491eca745b1fedf8a419bc9c40130 — Virgil Dupras a month ago 89d1d5e
Document the HAL
6 files changed, 236 insertions(+), 142 deletions(-)

D fs/doc/cc/forth.txt
M fs/doc/cc/impl.txt
M fs/doc/cc/index.txt
M fs/doc/dict.txt
M fs/doc/hal.txt
M fs/doc/index.txt
D fs/doc/cc/forth.txt => fs/doc/cc/forth.txt +0 -47
@@ 1,47 0,0 @@
# Forth VM

The Forth VM doesn't have registers, making the implementation of this C backend
a bit more... interesting. It has, however, a PS space where each element can be
addressed by its byte offset relative to current PSP. These addresses are what
we use to fill the role generally reserved for CPU registers.

That "register" space directly follows the argument frame and is managed in the
exact same way as VM_ARGSFRAME operands, except that we still use the
VM_REGISTER location constant for it (if we use VM_ARGSFRAME for registers, we
end up "missing" an indirection level for reasons that are hard to explain with
words, but if you think really hard about it, you'll get it).

However, because we keep adding and removing elements from PS, and because our
addressing scheme is relative to PSP, we need to keep track of our relative PS
level at all times so that our call to "p'," can have the correct argument.

We do this through a variable we call "psoff" which starts at 0, increases by 4
bytes every time we compile a word that adds to the stack and decreases by 4
when we compile a word that removes an element from the stack.

Important note: PSP points to *current* element. When a function starts, PSP+0
points to the last arguments in the ARGSFRAME.

< mem down                               mem up >
+-----------------------------------------------+
| ...   | REG1  | REG0  | ARG0  | ARG1  | ...   |
+-----------------------------------------------+
psoff=8 ^       psoff=0 ^

For example, let's say that we begin the function with psoff=0 and that we
"allocate a register". that will yield a VMOp with loc=VM_REGISTER and arg=-4.
(args=0 is the last argument for the function). PSP is increased by 4 bytes
(it's always 4 bytes).

Now, what happens if we want to compile an access to that register? We do
"psoff+args", which yields 0, which triggers the special case "compile dup".
We then increase psoff by 4 (we're now at 8). If we're in the middle of an
operation that will consume this value right away, we let that operation manage
the value it puts on PS. If that value needs to be kept in a "register", then
we allocate a new space for it, with "arg=-8".

What do we do with the old value at arg=-4? we leak it to PS. That's a side
effect of this scheme, which is a tradeoff against generaly simplicity. We clean
up PS only when we return with vmret, or when "ops$" is called (which is done
between each statement).


M fs/doc/cc/impl.txt => fs/doc/cc/impl.txt +25 -17
@@ 1,5 1,7 @@
# Dusk C compiler implementation details

Prerequisite: doc/hal

## How the code is generated

DuskCC is a single pass compiler. It generates its code as it consumes the


@@ 168,7 170,7 @@ to "release" Result structs when they're finally consumed.
Then, this result is either used as an operand of an assign operatorm used as a
call argument, or used as a return value. The cycle is complete.

### :hal# vs :hal$
## :hal# vs :hal$

The main purpose of a Result is to, at some point, produce a HAL operand (or to
"be W"). This is done through the :hal# ( self -- halop ) method. This method


@@ 184,27 186,33 @@ Once a HAL operand has been generated, we usually don't reuse the Result because
the resulting operation lives in W. To protect us from ourselves, it's better to
use the ":hal$" method which additionally releases the Result (sets it to NONE).

### A register usage
## A register usage

The A register is used in some places during code generation, but only in
contexts that resolve immediately. Results stored in there can't be, for
example, still stored when a function call is generated.

### Result indirection levels

We track two types of indirection levels in the Result struct. First, there's
the "regular" level, that is, where the result presently points to, with all &
and * applied, relative to the current location of the result. For example, if
the Result presently points to RS+4 and has "**" applied to it (lvl=2), we know
that when push comes to shove, we'll have two "@," to apply before we can
consider that we hold the value we're looking for. When the base type is 32-bit,
then it's easy. All "@," ops can be made in 32-bit mode. When the base type is
16-bit or 8-bit, however, it's trickier and we need a second indirection
information: whether the final value we're about to get is a direct value or a
pointer. For this reason, we have the "blvl" (Bottom Level), that is, the
indirection level at which the final value lies. This value comes from the
initial CDecl. That value has a size "basesz" (also from the CDecl). All other
levels have a 4b size.
## Result indirection levels

An important challenge in the CC is to know when we've "hit bottom" in the
indirection chain, that is, the moment at which the instruction acquires the
width of the base type (32-bit, 16-bit or 8-bit).

A CDecl has a "lvl" attributes which indicate the indirection level of the
declaration (with a ":lvl" helper that conditionally adds 1 to it for types that
"naturally" yield references), so that's our starting point.

On top of that, the Result also maintain its own "lvl" variable. This is because
some operators (such as *, &, -> etc.) change the indirection level. So, when a
Result is created from a CDecl (which is the majority of the cases), it inherits
its "lvl" from "CDecl :lvl", and keeps track of indirection levels from there.

When that "lvl" hits 0, we know we've hit bottom, so that's when we apply width
to our instruction.

Another use for that "lvl" is for pointer arithmetics. As you know, "p + 1" when
"p" is a "int*" actually generates "p + 4". We perform this mangling when "lvl"
is more than 0 and that the other side of the operator is a lvl0 operand.

## Caller save


M fs/doc/cc/index.txt => fs/doc/cc/index.txt +0 -1
@@ 30,4 30,3 @@ For this reason, the core of the language is very close to ANSI C.
* Usage (cc/usage)
* Implementation details (cc/impl)
* Standard Library (cc/lib)
* DuskCC's Forth VM details (cc/forth)

M fs/doc/dict.txt => fs/doc/dict.txt +5 -0
@@ 58,6 58,11 @@ $ - Initialize
? - As a suffix, means "Is it ...?". As a prefix, "do ... if flag"
[...] - Indicates immediateness

## Maybe a HAL word?

If the word you're looking for ends with ")" or "," and isn't in here, it might
be a HAL word and is documented in doc/hal.

## System variables and constants

### Constants

M fs/doc/hal.txt => fs/doc/hal.txt +205 -77
@@ 1,20 1,169 @@
# Harmonized Assembly Layer

TODO: description
The Harmonized Assembly Layer is a set of words implemented by all assemblers
which have the same semantics and compile native code that have consistent
results on all architectures. For example, "RSP) 2 +) 16b) +," will, on all
arches, compile a set of instructions that will result in the 16-bit addition
of RSP+2 into the Work register. On i386, this is the same as
"ax sp 2 d) 16b) add,".

This layer allows us to generate performant code in a cross-arch manner. It is
also what compilers such as the C compiler rely on to generate code.

Of course, as with any abstraction, we sometimes lose a little bit in speed and
binary space compared to direct assembler instructions, but in general, the
result is pretty good and direct assembler should be needed only in the tightest
of the loops.

At boot, bare Dusk system only has the Low HAL (see below) loaded. If you want
to load the High HAL, do "f<< asm/hal.fs". The will load the assembler for the
native architecture.

## Concepts

### Low vs High

The core code of DuskCC is implemented in HAL. If you look at xcomp/bootlo,
you'll see tons of HAL references. This allows core words to be fast without
having to implement them natively for each supported architecture.

This means that the HAL has to be implemented at the *kernel* level, a concept
which is completely wicked: the kernel isn't only a kernel, it's an assembler
implemented in native code.

The HAL, however, contains many concepts that aren't needed for xcomp/bootlo and
that if they were to be implemented in native code within the kernel, would
represent a burden too heavy to be worth it.

Therefore, we separate HAL implementations in two halves. The first half, the
"Low HAL", is implemented in native code in the kernel and is the strict minimum
needed to compile xcomp/bootlo. It is limited in the adressing modes it can
generate and has only the W register as a destination, with the exception of a
few specialized A-related words.

The "High HAL" is implemented in the assemblers and complete the API. The API
of each half is described in the sections below.

### Register allocation

The HAL has 4 virtual registers: W, A, PSP, RSP. Each architecture implementing
the HAL will need to map those virtual registers to actual registers. For
example, on i386, W=eax A=esi PSP=edi and RSP=esp.

### W and A registers

The HAL operate over 3 main locations: the W register, the A register, and
memory addresses.

The W register is the "work" register and the default destination of all HAL
instructions. When we say that "@," means "fetch", we mean "fetch into the
destination", which is the W register by default.

The A register is the "address" register. In the High HAL, it can be used in the
exact same manner as the W register, but in the Low HAL, it's limited in what it
can do. In xcomp/bootlo, it's used to hold address references and increment them
as needed.

### Operands

All HAL instructions take either no operand (inherent) or one operand parameter.
That operand parameter is a 32-bit number with an arch-specific bit structure
and that contains all the information the instruction needs to know the source
and destination of the instruction.

Operand words all end with ")". For example, "A) +," means "add 32-bit location
where the A register points to the W register".

Some operand words are not directly operands, but operand modifiers. For
example, "+)" adds a numerical offset to an operand. "W) 4 +)" refers to the
memory location where W points to, with a 4 bytes displacement. The "8b)"
modifier transforms the operand into a 8-bit operand.

By default, all operands refer to a memory location. Only through the "&)"
operand (see below) can we refer directly to a value in a register.

### &) operand modifier

The &) word takes an input operand and returns its reference counterpart. For
example, m) becomes i), W) becomes W&), etc. This also works with displacements.
For example, "RSP) 4 +) &)" yields an operand that points to RSP+4.

This operand might not be adressable directly by the host CPU. In that case, the
HAL operator will compile two instructions. For example, "RSP) 4 +) &) +," under
i386 would yield "bx sp 4 +) lea, ax bx add,". Only RSP) and PSP) can be
referenced with displacement.

The "&)" word never writes instructions directly, only operator words. The
"lea," above wouldn't be written when "&)" is called, but when "+," is.

If the &) word is called with an operand that can't be referenced, this word has
no effect. For example "i) &)" is the same as "i)".

### <>) operand modifiers

The <>) word inverts the destination of the HAL instruction, allowing
arithmetic result to be stored directly in memory. For example,
"$1234 m) 8b) <>) +," adds the 8-bit value at address $1234 to W and stores the
result directly in address $1234 without affecting W.

### 8b) and 16b) arithmetics

8b) and 16b) modifiers only apply to memory access and all arithmetics are
"upscaled" to 32-bit with regards to flags settings and carry management
(the C flag is never set in 16b) or 8b) mode).

This also applies to cmp, which means that, for example,
"$4242 LIT>W, RSP) 8b) cmp," will never set the Z flag because even if RSP) is
$42, comparison is done one the whole W register.

### Branching and flags

The HAL can generate branching, conditional or not, throufg its "branch"
instructions. "branchC,", the conditional branching generator, takes a "cond"
argument. This argument is generated by words like "Z)", ">)", etc. and the
number it yields is arch-specific. The idea is that through this number, the
"branchC," instruction knows the kind of native branch instruction to generate.

These conditions depend on flags being set or not and the conditions under which
these flags are set or not is not exactly the same across achitectures.

To be able to rely on consistant condition branching, HAL instructions make
guarantees on the flags set by certain instructions. If an instruction has a "Z"
next to it in the listing below, it's safe to conditionally branch using "Z)" or
"NZ)" right after having called it. Even if the native instruction for a
particular HAL word doesn't supply that flag, the HAL instruction will generate
the necessary native instructions to make it so, at the cost of speed. For this
reason, we minimize flag guarantees in HAL words.

Arithmetic conditions (">)", "<=)", etc.) have no associated flag and can only
be used after a "cmp,".

If you look at branching word signature, you'll notice something weird: the take
an address parameter and yield an address result. This is because those words
can be used for both backward branching or forward branching. What they do is to
write down a branch to the supplied address, but also yield an address to the
memory location that can then be used by "branch!".

Therefore, a backward branch looks like "begin .. branch, drop" and a forward
branch looks like "0 branch, .. here swap branch!"

All addresses passed to branching words are absolute addresses. If the native
instructions use relative branching addressing, the HAL takes care of the
translation.

## Low HAL

Operand words:

W)    -- op
A)    -- op
PSP)  -- op
RSP)  -- op
m)    addr -- op
+)    op disp -- op Can be applied multiple times
8b)   op -- op
16b)  op -- op
32b)  op -- op
W)    -- op          Indirect W register
A)    -- op          Indirect A register
PSP)  -- op          Indirect PSP register
RSP)  -- op          Indirect RSP register
m)    addr -- op     Absolute address
+)    op disp -- op  Apply displacement to op. Can be applied multiple times.
8b)   op -- op       Make op 8-bit
16b)  op -- op       Make op 16-bit
32b)  op -- op       Make op 32-bit (default)

Maximum displacement in Low HAL: 8-bit



@@ 24,96 173,75 @@ Z)
NZ)

W=0>Z,     --
  Sets Z according to whether W is zero.
A=0>Z,     --
  Sets Z according to whether A is zero.
C>W,       cond --
  If cond is met, W=1. Otherwise, W=0.

execute,   a --
  Call address a
branch,    a -- a
  Branch to address a
branchC,   a cond -- a
  Branch to address a if condition is met
branch!    tgtaddr braddr --
  Given "braddr" yielded by a previous "branch" instruction, change the
  reference at the address so that it targets "tgtaddr". Used for forward
  branching.
branchA,   --

Compiler words:

ps+,  n --
rs+,  n --
W+n,  n --  *Z*
A+n,  n --  *Z*
W>A,  --
A>W,  --
W<>A, --
lea,  op --
neg,  --
<<n,  n --
>>n,  n --

Width-aware compiler words:

@,    op --
!,    op --
@!,   op --
+,    op --   *Z*
[@],  op --
[!],  op --
cmp,  op --
[+n], n op -- *Z*

## 8b) and 16b) arithmetics

8b) and 16b) modifiers only apply to memory access and all arithmetics are
"upscaled" to 32-bit with regards to flags settings and carry management
(the C flag is never set in 16b) or 8b) mode).

This also applies to cmp, which means that, for example,
"$4242 LIT>W, RSP) 8b) cmp," will never set the Z flag because even if RSP) is
$42, comparison is done one the whole W register.
  Branch to the address held in the A register.

Instructions:

@,    op --   Read source into W
!,    op --   Write W to source
@!,   op --   Swap W and source
+,    op --   *Z* Add source to W
[@],  op --   Read indirect source into W
[!],  op --   Write indirect source into W
cmp,  op --   Compare source to W
[+n], n op -- *Z* Add n to indirect source without affecting W

ps+,  n --   Add n to PSP
rs+,  n --   Add n to RSP
W+n,  n --   *Z* Add n to W
A+n,  n --   *Z* Add n to W
W>A,  --     Copy W to A
A>W,  --     Copy A to W
W<>A, --     Swap W and A
lea,  op --  Store the effective address of the operand in W
neg,  --     W = -W
<<n,  n --   Shift W left by n
>>n,  n --   Shift W right by n

## High HAL

The "high" layer of the HAL is provided by the assembler.

Operand words:

i)
A>)   A register is the destination (instead of W)
<>)   Direction of the operation is inverted
&)    Reference to operand (see below)
i)    Immediate operand
A>)   A register is the destination instead of W
<>)   Direction of the operation is inverted (see above)
&)    Reference to operand (see above)

Branching and conditions:

C)
C)    Carry flag
NC)
<)
<=)
>)
>=)
s<)
s<)   Signed comparison
s<=)
s>)
s>=)

Width-aware compiler words:

-,    op --
*,    op --
/,    op --
%,    op --
<<,   op --
>>,   op --

### &) operand modifier

The &) word takes an input operand and returns its reference counterpart. For
example, m) becomes i), W) becomes W*), etc. This also works with displacements.
For example, "RSP) 4 +) &)" yields an operand that points to RSP+4.

This operand might not be adressable directly by the host CPU. In that case, the
HAL operator will compile two instructions. For example, "RSP) 4 +) &) +," under
i386 would yield "bx sp 4 +) lea, ax bx add,". Only RSP) and PSP) can be
referenced with displacement.

The "&)" word never writes instructions directly, only operator words. The
"lea," above wouldn't be written when "&)" is called, but when "+," is.
Instructions:

If the &) word is called with an operand that can't be referenced, this word has
no effect. For example "i) &)" is the same as "i)".
-,    op --  W - operand
*,    op --  W * operand
/,    op --  W / operand
%,    op --  W modulo operand
<<,   op --  W lshift operand
>>,   op --  W rshift operand

M fs/doc/index.txt => fs/doc/index.txt +1 -0
@@ 19,6 19,7 @@ about sys/io, you'll want to open doc/sys/io.

usage     General usage guide
dict      Dictionary of system word
hal       Harmonized Assembly Layer
deploy    Deploy Dusk to another machine
terms     Terminology
dirs      Directory structure