~vdupras/duskos

duskos/fs/doc/usage.txt -rw-r--r-- 15.3 KiB
c515797bVirgil Dupras comp/c/vm/i386: fix integer promotion bug in logical ops 4 hours ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
# Dusk OS usage

Warning: this OS is not usable yet. It lacks many convenience words would make
it usable. But still, it can do many nice tricks...

Dusk OS is a Forth that generally follows conventions described in "Starting
Forth" by Leo Brodie, except that words are in lowercase. If you don't know
Forth, it's recommended that you start there.

Then, you can look at doc/dict to get an broad idea of the vocabulary that is
available to you. You will recognize many words in there from Starting Forth and
should be able to get started after that.

That being said, Dusk OS has some additional features that need explaining:

## Number literals

Dusk has no DEC/HEX mode. Number literals are parsed using a prefix system.

* "naked" numbers are parsed as decimal: 1234
* "$" is the prefix for hexadecimal notation: $12fe
* "'" is the prefix for a character literal and must be closed: 'A'

## Strings

A string is an address to an area in memory starting with a length byte followed
by that many characters. When we refer to a "string", we refer to that address.
For example, this code will yield a "hello" string to PS (Parameter Stack):

here 5 c, 'h' c, 'e' c, 'l' c, 'l' c, 'o' c,

The code above is the equivalent of:

S" hello"

## String literals

When a string literal word such as S" ." or ," is used, the following content
is parsed in an almost verbatim manner until the closing " is reached. We say
almost verbatim because we can write special characters with the '\' escape
character:

\n: newline ($0a)
\r: carriage return ($0d)
\0: 0
\\: '\' character
\": '"' character

Any other character following the '\' results in that character being parsed as-
is, the preceding '\' being ignored.

## Values, cells, constants, aliases

A "cell" is a word that refers to an area in memory. Calling this word yields
the address directly following it:

create mycell 5 c, 'h' c, 'e' c, 'l' c, 'l' c, 'o' c,

Calling "mycell" will yield the string "hello".

A "value" is a 4 byte area where a value is stored. It's a bit like a cell,
but calling the value dereferences its address.

42 value myvalue

Calling "myvalue" yields 42. Moreover, it obeys to "to" semantics (see below).

A constant is a read-only value that doesn't obey "to" semantics:

42 const myconst

An alias is a shortcut to another word:

alias noop myalias

Calling "myalias" is the same as calling "noop". Aliases obey "to" semantics and
can thus be changed.

## "to" semantics

Values and aliases are very similar to cells: they're a piece of memory attached
to a "handling" routine. With the cell, the routine is a noop, it returns the
address of the piece of memory.

With value and aliases, it's not a noop. The first fetches the value in memory,
the second jumps to the address contained by that memory.

These routines come with... side effects. How can you modify a value or an
alias? You need a "to" word.

The "to" words ("to", "to+", etc.) set a global variable with a pointer to an
alternate routine for value or alias words to execute. For example, the "to"
word makes that global variable point to "!".

This means that when you do "42 to myvalue", instead of "myvalue" executing the
equivalent of "addr-of-myvalue @", it executes "addr-of-myvalue !".

As soon as a "to" override is used, the global "to" pointer is reset to 0.

Refer to doc/dict for a complete list of "to" words.

Warning: this variable is global. any usage of "to" will affect the next value
or alias that pops up. To avoid problems, always put your "to" call very, very
close to your value/alias call.

## Chain

It's a common pattern to want to "chain" behaviors in aliases. For example, one
could want to set the "emit" alias to a word that calls the previous "emit"
routine and add behavior to it.

One could manually set up a variable holding the old value and then call it, but
this gets verbose after a while.

The "chain" word does so with minimal boilerplate:

    chain emit myemitroutine

This reads the current target of the alias "emit" and writes a new word that
calls "myemitroutine" with this address at the top of PS. Then, it sets the
"emit" alias' target to "myemitroutine". If, for example, you wanted to override
"emit" so that every character emitted was preceded by 'X', you would write:

    : myemitroutine ( c 'emit -- ) 'X' over execute execute ;
    chain emit myemitroutine

This word works both in compiling mode and outside of it. For example, this
would work too:

    : moduleinit chain emit myemitroutine ;

## Linked lists

Linked lists are a fundamental data structure in Dusk. They are simply addresses
in memory each pointing to the next element, with the last element of the list
pointing to 0. The first field of a LL (linked list) element is always the
pointer field.

Iterating a LL is easy, it's as simple as reading next with @ ("llnext" makes
the intent clearer).

When you want to append a new element to the list, you can call "lladd", which
makes the list's last element point to "here". You can then write your new
element.

## Dictionary entry metadata

Each entry in the dictionary can have metadata linked to it in the form of a
linked list. The pointer to the first element (or 0 if none) for an entry is
given by the word "emeta".

Each metadata element has this structure:

4b link to next
4b type ID
( any other type-specific data )

## Local variables

It's a common pattern, to avoid PS juggling, to place an element on RS (Return
Stack) and recall that element with r@. It works well, but unfortunately, this
only works with a single element. Another drawback is that this element can be
"buried" by a pattern (a "next" loop for example) that needs to place something
on RS.

A workaround to this is to declare a "private" value near the word and store
temporary values in there. It works too, but because those values are statically
allocated, this can't be done in words that require recursion. Also, it's a bit
verbose.

Local variables are there to the rescue! There are 4 of them: V1 V2 V3 V4. These
words do a simple thing: they reference an element on RS with "to" semantics.
They are also impervious to being "buried" because they remember their "slot" at
compile time and adjust their compilation accordingly.

In other words, if RSTART is the address in memory of RSP when the word starts
executing and considering the fact that RSP grows "downwards" in memory,
V1=RSTART-4 and V4=RSTART-16.

When using local variables, you are responsible for pushing and popping to/from
RS. All those variables give "to" semantics to an "RS slot". Example:

: foo ( a b c -- ) >r >r >r V1 . spc> V2 . spc> V3 . rfree ;

1 2 3 foo \ prints "3 2 1"

: inc ( a -- a+1 ) >r 1 to+ V1 V1 rdrop ;
42 inc . \ prints 43

\ this works too
: inc5 ( a -- a+5 ) >r 5 >r begin 1 to+ V1 next V1 rdrop ;
42 inc5 . \ prints 47

### rfree

What's this "rfree" used above? It's an automatic RS adjuster. It looks at the
"R counter" and emits an RS adjustment equivalent to its current level, and then
sets this level to 0. In the example above, it's equivalent to "rdrop rdrop
rdrop".

Be aware that the "R counter" is not always accurate! If you have conditional
modifications to RS levels, "rfree" is going to be broken. See section below.

### Manual [rcnt] adjustments

The "R counter" that determines local variable slots is oblivious to conditional
codes. It's not common to have code that conditionally maintain separate RS
levels (they always need to stay balanced, of course), but it can happen. For
example, in early "exit" paths, we often have to include a few "rdrop" before
the "exit" call. This messes up the "R counter". You can manually adjust it
through the [rcnt] variable. For example, if you want your next ">r" to push to
V1, you would precede it with:

    [ 0 [rcnt] ! ]

## Binary width modulation

In a 32-bit system, it is frequent to want to access memory in 3 widths: 32-bit,
16-bit and 8-bit.

Traditionally, Forths have the "c" variant of words for accessing memory in
8-bit. For example, "@" fetches a 32-bit number and "c@" fetches a 8-bit one.

Dusk has a few of these words, for convenience, as well as a "w" variant for
16-bit.

However, Dusk has a wide selection of useful, native memory-related word:
@ ! +! @! @+ !+ @@+ @!+ ,. These all can all be useful in their 16-bit and 8-bit
variants. If we add a "c" and "w" variant to all these words, that's a lot of
noise.

But there's more! "to" semantics are built upon those words, so we'd need to add
a "c" and "w" variant to all "to" words too? That's heavy.

To make things lighter, Dusk has "8b" and "16b" modulator words. A memory word,
when preceded by one of these modulator word, will execute its 8-bit or 16-bit
variant. These words are immediate and work during compilation too.

Better yet: the choice of variant is applied at compile time, which means that
this modulation system has no runtime cost in terms of speed.

Those modulator words can only be used in front of one of the "memory" words
listed above or in front of a "to" word. It *doesn't* work in front of a "naked"
(no "to") value reference.

Do not use those modulators in front of other words, you'll crash the system.

TODO: allow creation of width-modulable words in Forth. Something like:

: .myx ( n ) .x ; :16b .x2 ; :8b .x1 ;

TODO: add dictionary entry flag to indicate that the word is binary modulable.
this way, we can avoid crashes, making the system a bit easier to debug.

## Structures

Structures are an effective way to address offsets from base addresses while
keeping the general namespace clean. Structures have a name and a list of fields
and are declared thus:

    struct[ Foo
      sfield bar
      sfield baz
      smethod :bleh
    ]struct

This describes an 12 byte wide struct with 3 fields.

Anything goes inside of a struct. Whatever word you define there will be
included in the struct's namespace. Those words will not be present in the
system dictionary. While inside a struct definition, however, you can access
words inside the struct directly.

"sfield" and "smethod" have a special struct-specific behavior as they
automatically place themselves inside the struct at the correct offset and
increase the struct's size.

A struct size can be obtained with the "SZ" word automatically added to every
struct. It returns the size, in bytes, of the fields included in the struct. It
can also be used inside a struct definition to get the struct size "up until
now". This can be useful for initialization methods:

    struct[ Foo
      sfield bar
      sfield baz
      smethod :bleh
      : :new ( -- 'foo ) here SZ allot0 ;
    ]struct

A struct holds no data by itself and can't be used directly to access fields
from memory. You refer to fields in a struct by supplying it with a source
pointer, like this:

    create data1 1 , 2 , ' mybleh ,
    create data2 3 , 4 , ' mybleh ,
    data1 Foo bar . \ prints 1
    data2 Foo baz . \ prints 4

Fields obey "to" semantics:

    42 to+ data1 Foo bar
    data1 Foo bar . --> prints 43

Field access can be compiled:

    : foobar data2 Foo baz ;
    foobar . \ prints 4

A method is an alias to a word reference inside a struct. When the method is
invoked, it dereferences the alias and calls it, but it also pushes a reference
of the data structure on top of PS so that the method can work with its data.
By convention, method names start with ":", but nothing forces you to have it.
Example of a method that adds bar to baz:

    : mybleh ( 'data -- n ) dup Foo bar swap Foo baz + ;

defined before the previous examples, then you could do:

    data1 Foo :bleh . \ prints 3
    data2 Foo :bleh . \ prints 7

You will often want to bind data to structs. You can do so with "structbind":

    data1 structbind Foo MyData1
    MyData1 :bleh . \ prints 3
    : someword MyData1 bar ;
    someword . \ prints 2

A structbind is compiled with a level of indirection that allows it to be
rebound. So, you can rebind structbinds with the word "rebind" (we can't use
"to" because that applies to the original bind's field):

    data2 ' MyData1 rebind
    MyData1 :bleh . \ prints 7
    someword . \ prints 4

All structs have a ":self" method which is a noop, and thus returns a reference
to the associated data. This can be used to get a structbind's data reference:

    data2 :self \ MyData1 is on PS TOS

There are several kinds of fields:

* sfield: a 4 byte field
* sfieldw: a 2 byte field
* sfieldb: a 1 byte field
* sconst: a 4 byte field that doesn't obey "to" semantics
* sfield': a field that yields its address instead of a value. Useful for
           buffers. It must be called with a size argument.

You can also create gaps in the struct with "sallot":

struct[ Gaps
  sfield foo
  42 sallot
  sfield bar
]struct

In this struct, foo's offset is 0 and bar's is 46. You can also extend a
previous struct with a new struct:

    extends Foo struct[ Bar
      sfield bazooka
    ]struct
    create data3 1 , 2 , ' mybleh , 1234
    data3 Bar bazooka . \ prints 1234
    data3 Bar bar . \ prints 1

Extended structs will have their "running size" pick up where the extended
struct left. They also inherit their whole namespace, which means that any word
in the extended namespace can be accessed directly without prefixing it with
the struct's name. Extending a struct does not modify the extended struct.

If, instead of extending a struct, you want to "augment it", that is, to
supplement a struct that was defined earlier with new elements, you can use
struct+[ in this fashion:

    struct+[ Foo
      sfield bazooka
    ]struct

This does not create a new struct, but rather adds a new field to the existing
struct. It is useful for struct that require partial declaration because of
inter-dependency with other pieces of code.

Warning: do not augment a struct with new fields if it has already been extended
by another struct because this will generate slot conflicts. You can augment a
struct that has been extended, that will work, but only with non-field words.

## Input/Output

The system interpret loop, at boot time, feeds itself from "key", a ( -- c )
word which spits characters to be interpreted one at a time. This word is an
alias in which drivers plug themselves.

System output happens through "emit", a ( c -- ) alias which spits characters to
the system's "console", whatever that happens to be on the system.

Things could theoretically stay as simple as this, but two requirements lead us
to add some complexity on the I/O front: the need to have redirectable and
abstract I/Os for applications and the need to have a console that reads line by
line.

In the middle of the boot process, sys/io is loaded (see doc/io). This subsystem
introduces the StdIn and StdOut structures (with their "stdin" ( -- c ) and
"stdout" ( c -- ) proxies) and, by default, plugs new ConsoleIn and ConsoleOut
structures into them. These structures simply plug themselves into "key" and
"emit".

From that point on, the system interpret loop feeds itself from "stdin". "emit"
and "stdout" become synonymous, the latter being preferred. "emit" becomes the
"console only" output word and "stdout" is the "generally console, but
redirectable output" word.

The basic Dusk console, the sys/rdln subsystem, inserts itself between "key" and
"stdin". It feeds itself from key and provides line editing capabilities. When
a whole line is ready to be interpreted, that is fed to stdin.

## Loading files

Dusk's interpreter can be fed with the contents of files through various words
such as :fload, f<< and ?f<<. Refer to doc/file.

If you want to compile C source files, you'll want to look at doc/cc.