~vdupras/duskos

duskos/fs/doc/cc.txt -rw-r--r-- 9.3 KiB
68f5fa6fVirgil Dupras sys/file: introduce drive letters in paths 18 hours ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
# Dusk OS C compiler

The C compiler is a central piece of Dusk OS. It's written in Forth and is
loaded very early in the boot process so that it can compile drivers we're
about to use.

This compiler needs to meet two primary design goals:

1. Be as elegant and expressive as possible in the context of a Forth, that is,
   be an elegant fallback to Forth's shortcomings.
2. Minimize the work needed to port exiting C applications.

It is *not* a design goal of this C compiler to be able to compile POSIX
applications without changes. It is expected that a significant porting effort
will be needed each time.

Because of the first goal, we have to diverge from ANSI C. The standard library
will likely be significantly different, the macro system too. Both will
hopefully fit better with Forth than their ANSI counterpart.

But because of the second goal, we want to stay reasonably close to ANSI. The
idea is that the porting effort should be mostly a mechanical effort and
it should be as little prone as possible to subtle logic changes caused by the
porting.

For this reason, the core of the language is very close to ANSI.

## Differences in the core language

* no 64bit types
  * no long, redundant with int
  * no double, float is always 32b
  * char is always 8b, short is always 16b, int is always 32b
* tightened parsing requirements for simplification purposes
  * "unsigned" always goes first
  * no "signed" (always default), no "auto"
* No logical shortcut guarantee. In (a && b), b will be executed even if a
  yields 0.
* Number literals are the same as Dusk OS, so 12345, $1234 and 'X'. No 0x1234
  or 0o777.
* string literals are not null-terminated, but "counted strings". The exact same
  format as system strings.
* Added pspop() and pspush() built-in functions.
* By default, functions have internal (static) linkage. The "extern" keyword
  gives them external linkage (an entry in the system dict).

## Caller save

Native words don't save registers they use. For Forth words, it doesn't matter
much because all words are "atomic". Once they return, register values don't
matter anymore. Some native words call each other and in this case, careful
threading is necessary, but otherwise, it works well as is.

When other languages are concerned, however, this attribute becomes important
because if a C expression calls a Forth word, then it loses register values.

Therefore, it's important to remember that in Dusk OS, it's the caller
responsibility to save/restore registers during a call.

## Function call stack frame

In C-compiled code, local variables and arguments being passed during function
calls are placed on something called a stack frame.

In Dusk, we have two stack frames. The "arguments" frame lives in PS and the
"local" frame (for local variables) lives on RS.  The caller of a C function has
to allocate enough space to place the arguments it's passing to the function.
When you think about it, it's the same thing as with Forth words.

When the function begins, it allocates enough space for its local variables on
RS. Its arguments are already on PS, where they should be, so it does nothing.

When the function returns, it frees local frame from RS. It also adjusts PSP
according to the function's "argument balance". If it returns more arguments
than it received, PS will grow, if it returns less arguments than received, PS
shrinks.  Then, we return.

Let's use an example:

int foobar(int a, int b) {
    int x = 42;
    return a+b+x;
}

For Forth, this function can be called like this:

1 2 foobar . \ prints "45"

Here's what PS and RS look like at the moment foobar is called:

         |-----------|           |-------------|
 PSP+4 ->| $00000001 |  RSP+0 -> | return addr |
 PSP+0 ->| $00000002 |           |-------------|
         |-----------|

During the function prelude, PSP doesn't change, but the compiler assigns each
argument to its proper place in PS.

At the same time, the prelude also decreases RSP by 4 to make space for local
variables.

         |-----------|           |---------------|
 PSP+4 ->| int a = 1 |  RSP+4 -> | return addr   |
 PSP+0 ->| int b = 2 |  RSP+0 -> | int x = undef |
         |-----------|           |---------------|

Then, we execute "int x = 42", which sets RSP+0.

         |-----------|           |---------------|
 PSP+4 ->| int a = 1 |  RSP+4 -> | return addr   |
 PSP+0 ->| int b = 2 |  RSP+0 -> | int x = 42    |
         |-----------|           |---------------|

Then, "return a+b+x" does a "soft" push to PS, that is, it pushes to PS as if
the arguments had been popped during the prelude. The means that our return
value overwrites "a". PSP is increased by 4.

         |-----------|           |---------------|
 PSP+0 ->| $0000002d |  RSP+4 -> | return addr   |
 PSP-4 ->| int b = 2 |  RSP+0 -> | int x = 42    |
         |-----------|           |---------------|

Then, before returning, we deallocate the local stack.

         |-----------|           |-------------|
 PSP+0 ->| $0000002d |  RSP+0 -> | return addr |
         |-----------|           |-------------|

## pspush() and pspop()

Builtin functions pspush() and pspop() allows for direct control over PS. This
gives you the ability to pop or push a variable number of arguments from/to PS.

These functions, however, are incompatible with arguments and return values. If
you use both at the same time, they'll mess your PS stack. You can only use them
in functions that have a "void (void)" signature (except what calling Forth
words, see below). Let's see an example.

void foobar() {
    int b = pspop();
    int a = pspop();
    int x = 42;
    pspush(a+b+x);
}

This function does the exact same thing as the previous "foobar()", but stacks
will look different. After function prelude:

         |-----------|           |---------------|
 PSP+4 ->| $00000001 |  RSP+12-> | return addr   |
 PSP+0 ->| $00000002 |  RSP+8 -> | int b = undef |
         |-----------|  RSP+4 -> | int a = undef |
                        RSP+0 -> | int x = undef |
                                 |---------------|

Then, after the first 3 lines:

 PSP+0 ->|-----------|           |---------------|
                        RSP+12-> | return addr   |
                        RSP+8 -> | int b = 2     |
                        RSP+4 -> | int a = 1     |
                        RSP+0 -> | int x = 42    |
                                 |---------------|

And right before returning:

         |-----------|           |-------------|
 PSP+0 ->| $0000002d |  RSP+0 -> | return addr |
         |-----------|           |-------------|

## Calling Forth words

Words from the system dictionary can be called. They are considered to have a
void return type and an unspecified number of arguments.

Arguments to Forth words can be passed normally, but return values have to be
handled with pspop() and pspush(). Whenever you call such a function, you should
return to "PS normality" before using one of your function arguments, because if
you don't, PS offsets for those arguments will be wrong.

For example, let's say that you want to call "max", a forth word with a
signature "a b -- n". You would do so like this:

int mymax(int a, int b) {
    max(a, b);
    // don't use a or b before having called pspop(), they're broken.
    return pspop();
}

## Macros

Macros in Dusk's CC are simply markers inside which arbitrary Forth code is
interpreted. Those markers are #[ and ]#. Those markers are executed during the
AST generation phase, which means that you can arbitrarily modify the AST at any
point during parsing.

A common case with C macros is the definition and reuse of constants. That's how
it looks:

#[ 42 const FOOBAR ]#
int foo() {
    return #[ FOOBAR Constant ]#;
}

AST nodes are created with "createnode" accompanied with one of the AST_*
constants, or with the use of a "node helper" word, such as "Constant".

Because macros can modify the AST, they can only be inserted at certain
designated places, known as "hash (#) bars". These are:

* In a Unit context (in between functions)
* Replacing a "factor" AST element, which are quite numerous. Some of them:
  * A constant
  * A Lvalue (AST_IDENT)
  * A function call
  * An expression

In any other place, "#[" will be a parse error.

In the first case, the signature of the macro is ( node -- node ). By using PS
TOS, you can add a node to the active Unit.

The second case has a signature ( -- node ), that is, you are expected to put a
node that is in the context you're putting it. It will then be added wherever
the factor was expected. It will even have postfix AST rules applied to it,
which opens nice doors. For example, if your macro returns a simple AST_IDENT,
then right after the macro you can add parens to make it into a function call.

When a macro begins, PS level is recorded. If it doesn't end with the correct
PS size, an error is raised.

Macro opening symbol, "#[", obeys C tokenization rules, but the closing one,
"]#", obeys Forth tokenization rules, so it has to be followed by a space.

There are "shortcut words" for closing a macro:

c]# --> Constant ]#
i]# --> Ident ]#
+]# --> over addnode ]#

## Linkage

By default, functions have internal linkage. You give a function external
linkage with "extern".

void foo() { }
extern void bar() { foo(); }

This unit will compile fine. Because "foo()" is in the same unit as "bar()",
"bar()" can call "foo()". However, that function can't be called from another
unit or from Forth. "bar()" can.