# Dusk OS C compiler
The C compiler is a central piece of Dusk OS. It's written in Forth and is
loaded very early in the boot process so that it can compile drivers we're
about to use.
This compiler needs to meet two primary design goals:
1. Be as elegant and expressive as possible in the context of a Forth, that is,
be an elegant fallback to Forth's shortcomings.
2. Minimize the work needed to port exiting C applications.
It is *not* a design goal of this C compiler to be able to compile POSIX
applications without changes. It is expected that a significant porting effort
will be needed each time.
Because of the first goal, we have to diverge from ANSI C. The standard library
will likely be significantly different, the macro system too. Both will
hopefully fit better with Forth than their ANSI counterpart.
But because of the second goal, we want to stay reasonably close to ANSI. The
idea is that the porting effort should be mostly a mechanical effort and
it should be as little prone as possible to subtle logic changes caused by the
porting.
For this reason, the core of the language is very close to ANSI.
## Differences in the core language
* no 64bit types
* no long, redundant with int
* no double, float is always 32b
* char is always 8b, short is always 16b, int is always 32b
* tightened parsing requirements for simplification purposes
* "unsigned" always goes first
* no "signed" (always default), no "auto"
* No logical shortcut guarantee. In (a && b), b will be executed even if a
yields 0.
* Number literals are the same as Dusk OS, so 12345, $1234 and 'X'. No 0x1234
or 0o777.
* string literals are not null-terminated, but "counted strings". The exact same
format as system strings.
* Added pspop() and pspush() built-in functions.
* By default, functions have internal (static) linkage. The "extern" keyword
gives them external linkage (an entry in the system dict).
## Caller save
Native words don't save registers they use. For Forth words, it doesn't matter
much because all words are "atomic". Once they return, register values don't
matter anymore. Some native words call each other and in this case, careful
threading is necessary, but otherwise, it works well as is.
When other languages are concerned, however, this attribute becomes important
because if a C expression calls a Forth word, then it loses register values.
Therefore, it's important to remember that in Dusk OS, it's the caller
responsibility to save/restore registers during a call.
## Function call stack frame
In C-compiled code, local variables and arguments being passed during function
calls are placed on something called a stack frame.
In Dusk, we have two stack frames. The "arguments" frame lives in PS and the
"local" frame (for local variables) lives on RS. The caller of a C function has
to allocate enough space to place the arguments it's passing to the function.
When you think about it, it's the same thing as with Forth words.
When the function begins, it allocates enough space for its local variables on
RS. Its arguments are already on PS, where they should be, so it does nothing.
When the function returns, it frees local frame from RS. It also adjusts PSP
according to the function's "argument balance". If it returns more arguments
than it received, PS will grow, if it returns less arguments than received, PS
shrinks. Then, we return.
Let's use an example:
int foobar(int a, int b) {
int x = 42;
return a+b+x;
}
For Forth, this function can be called like this:
1 2 foobar . \ prints "45"
Here's what PS and RS look like at the moment foobar is called:
|-----------| |-------------|
PSP+4 ->| $00000001 | RSP+0 -> | return addr |
PSP+0 ->| $00000002 | |-------------|
|-----------|
During the function prelude, PSP doesn't change, but the compiler assigns each
argument to its proper place in PS.
At the same time, the prelude also decreases RSP by 4 to make space for local
variables.
|-----------| |---------------|
PSP+4 ->| int a = 1 | RSP+4 -> | return addr |
PSP+0 ->| int b = 2 | RSP+0 -> | int x = undef |
|-----------| |---------------|
Then, we execute "int x = 42", which sets RSP+0.
|-----------| |---------------|
PSP+4 ->| int a = 1 | RSP+4 -> | return addr |
PSP+0 ->| int b = 2 | RSP+0 -> | int x = 42 |
|-----------| |---------------|
Then, "return a+b+x" does a "soft" push to PS, that is, it pushes to PS as if
the arguments had been popped during the prelude. The means that our return
value overwrites "a". PSP is increased by 4.
|-----------| |---------------|
PSP+0 ->| $0000002d | RSP+4 -> | return addr |
PSP-4 ->| int b = 2 | RSP+0 -> | int x = 42 |
|-----------| |---------------|
Then, before returning, we deallocate the local stack.
|-----------| |-------------|
PSP+0 ->| $0000002d | RSP+0 -> | return addr |
|-----------| |-------------|
## pspush() and pspop()
Builtin functions pspush() and pspop() allows for direct control over PS. This
gives you the ability to pop or push a variable number of arguments from/to PS.
These functions, however, are incompatible with arguments and return values. If
you use both at the same time, they'll mess your PS stack. You can only use them
in functions that have a "void (void)" signature (except what calling Forth
words, see below). Let's see an example.
void foobar() {
int b = pspop();
int a = pspop();
int x = 42;
pspush(a+b+x);
}
This function does the exact same thing as the previous "foobar()", but stacks
will look different. After function prelude:
|-----------| |---------------|
PSP+4 ->| $00000001 | RSP+12-> | return addr |
PSP+0 ->| $00000002 | RSP+8 -> | int b = undef |
|-----------| RSP+4 -> | int a = undef |
RSP+0 -> | int x = undef |
|---------------|
Then, after the first 3 lines:
PSP+0 ->|-----------| |---------------|
RSP+12-> | return addr |
RSP+8 -> | int b = 2 |
RSP+4 -> | int a = 1 |
RSP+0 -> | int x = 42 |
|---------------|
And right before returning:
|-----------| |-------------|
PSP+0 ->| $0000002d | RSP+0 -> | return addr |
|-----------| |-------------|
## Calling Forth words
Words from the system dictionary can be called. They are considered to have a
void return type and an unspecified number of arguments.
Arguments to Forth words can be passed normally, but return values have to be
handled with pspop() and pspush(). Whenever you call such a function, you should
return to "PS normality" before using one of your function arguments, because if
you don't, PS offsets for those arguments will be wrong.
For example, let's say that you want to call "max", a forth word with a
signature "a b -- n". You would do so like this:
int mymax(int a, int b) {
max(a, b);
// don't use a or b before having called pspop(), they're broken.
return pspop();
}
## Macros
Macros in Dusk's CC are simply markers inside which arbitrary Forth code is
interpreted. Those markers are #[ and ]#. Those markers are executed during the
AST generation phase, which means that you can arbitrarily modify the AST at any
point during parsing.
A common case with C macros is the definition and reuse of constants. That's how
it looks:
#[ 42 const FOOBAR ]#
int foo() {
return #[ FOOBAR Constant ]#;
}
AST nodes are created with "createnode" accompanied with one of the AST_*
constants, or with the use of a "node helper" word, such as "Constant".
Because macros can modify the AST, they can only be inserted at certain
designated places, known as "hash (#) bars". These are:
* In a Unit context (in between functions)
* Replacing a "factor" AST element, which are quite numerous. Some of them:
* A constant
* A Lvalue (AST_IDENT)
* A function call
* An expression
In any other place, "#[" will be a parse error.
In the first case, the signature of the macro is ( node -- node ). By using PS
TOS, you can add a node to the active Unit.
The second case has a signature ( -- node ), that is, you are expected to put a
node that is in the context you're putting it. It will then be added wherever
the factor was expected. It will even have postfix AST rules applied to it,
which opens nice doors. For example, if your macro returns a simple AST_IDENT,
then right after the macro you can add parens to make it into a function call.
When a macro begins, PS level is recorded. If it doesn't end with the correct
PS size, an error is raised.
Macro opening symbol, "#[", obeys C tokenization rules, but the closing one,
"]#", obeys Forth tokenization rules, so it has to be followed by a space.
There are "shortcut words" for closing a macro:
c]# --> Constant ]#
i]# --> Ident ]#
+]# --> over addnode ]#
## Linkage
By default, functions have internal linkage. You give a function external
linkage with "extern".
void foo() { }
extern void bar() { foo(); }
This unit will compile fine. Because "foo()" is in the same unit as "bar()",
"bar()" can call "foo()". However, that function can't be called from another
unit or from Forth. "bar()" can.