# Dusk OS usage
Warning: this OS is not usable yet. It lacks many convenience words would make
it usable. But still, it can do many nice tricks...
Dusk OS is a Forth that generally follows conventions described in "Starting
Forth" by Leo Brodie, except that words are in lowercase. If you don't know
Forth, it's recommended that you start there.
Then, you can look at doc/dict to get an broad idea of the vocabulary that is
available to you. You will recognize many words in there from Starting Forth and
should be able to get started after that.
That being said, Dusk OS has some additional features that need explaining:
## Number literals
Dusk has no DEC/HEX mode. Number literals are parsed using a prefix system.
* "naked" numbers are parsed as decimal: 1234
* "$" is the prefix for hexadecimal notation: $12fe
* "'" is the prefix for a character literal and must be closed: 'A'
A string is an address to an area in memory starting with a length byte followed
by that many characters. When we refer to a "string", we refer to that address.
For example, this code will yield a "hello" string to PS (Parameter Stack):
here 5 c, 'h' c, 'e' c, 'l' c, 'l' c, 'o' c,
The code above is the equivalent of:
## String literals
When a string literal word such as S" ." or ," is used, the following content
is parsed in an almost verbatim manner until the closing " is reached. We say
almost verbatim because we can write special characters with the '\' escape
\n: newline ($0a)
\r: carriage return ($0d)
\\: '\' character
\": '"' character
Any other character following the '\' results in that character being parsed as-
is, the preceding '\' being ignored.
## Values, cells, constants, aliases
A "cell" is a word that refers to an area in memory. Calling this word yields
the address directly following it:
create mycell 5 c, 'h' c, 'e' c, 'l' c, 'l' c, 'o' c,
Calling "mycell" will yield the string "hello".
A "value" is a 4 byte area where a value is stored. It's a bit like a cell,
but calling the value dereferences its address.
42 value myvalue
Calling "myvalue" yields 42. Moreover, it obeys to "to" semantics (see below).
A constant is a read-only value that doesn't obey "to" semantics:
42 const myconst
An alias is a shortcut to another word:
alias noop myalias
Calling "myalias" is the same as calling "noop". Aliases obey "to" semantics and
can thus be changed.
## "to" semantics
Values and aliases are very similar to cells: they're a piece of memory attached
to a "handling" routine. With the cell, the routine is a noop, it returns the
address of the piece of memory.
With value and aliases, it's not a noop. The first fetches the value in memory,
the second jumps to the address contained by that memory.
These routines come with... side effects. How can you modify a value or an
alias? You need a "to" word.
The "to" words ("to", "to+", etc.) set a global variable with a pointer to an
alternate routine for value or alias words to execute. For example, the "to"
word makes that global variable point to "!".
This means that when you do "42 to myvalue", instead of "myvalue" executing the
equivalent of "addr-of-myvalue @", it executes "addr-of-myvalue !".
As soon as a "to" override is used, the global "to" pointer is reset to 0.
Refer to doc/dict for a complete list of "to" words.
Warning: this variable is global. any usage of "to" will affect the next value
or alias that pops up. To avoid problems, always put your "to" call very, very
close to your value/alias call.
It's a common pattern to want to "chain" behaviors in aliases. For example, one
could want to set the "emit" alias to a word that calls the previous "emit"
routine and add behavior to it.
One could manually set up a variable holding the old value and then call it, but
this gets verbose after a while.
The "chain" word does so with minimal boilerplate:
chain emit myemitroutine
This reads the current target of the alias "emit" and writes a new word that
calls "myemitroutine" with this address at the top of PS. Then, it sets the
"emit" alias' target to "myemitroutine". If, for example, you wanted to override
"emit" so that every character emitted was preceded by 'X', you would write:
: myemitroutine ( c 'emit -- ) 'X' over execute execute ;
chain emit myemitroutine
This word works both in compiling mode and outside of it. For example, this
would work too:
: moduleinit chain emit myemitroutine ;
## Linked lists
Linked lists are a fundamental data structure in Dusk. They are simply addresses
in memory each pointing to the next element, with the last element of the list
pointing to 0. The first field of a LL (linked list) element is always the
Iterating a LL is easy, it's as simple as reading next with @ ("llnext" makes
the intent clearer).
When you want to append a new element to the list, you can call "lladd", which
makes the list's last element point to "here". You can then write your new
## Dictionary entry metadata
Each entry in the dictionary can have metadata linked to it in the form of a
linked list. The pointer to the first element (or 0 if none) for an entry is
given by the word "emeta".
Each metadata element has this structure:
4b link to next
4b type ID
( any other type-specific data )
## Local variables
It's a common pattern, to avoid PS juggling, to place an element on RS (Return
Stack) and recall that element with r@. It works well, but unfortunately, this
only works with a single element. Another drawback is that this element can be
"buried" by a pattern (a "next" loop for example) that needs to place something
A workaround to this is to declare a "private" value near the word and store
temporary values in there. It works too, but because those values are statically
allocated, this can't be done in words that require recursion. Also, it's a bit
Local variables are there to the rescue! There are 4 of them: V1 V2 V3 V4. These
words do a simple thing: they reference an element on RS with "to" semantics.
They are also impervious to being "buried" because they remember their "slot" at
compile time and adjust their compilation accordingly.
In other words, if RSTART is the address in memory of RSP when the word starts
executing and considering the fact that RSP grows "downwards" in memory,
V1=RSTART-4 and V4=RSTART-16.
When using local variables, you are responsible for pushing and popping to/from
RS. All those variables give "to" semantics to an "RS slot". Example:
: foo ( a b c -- ) >r >r >r V1 . spc> V2 . spc> V3 . rfree ;
1 2 3 foo \ prints "3 2 1"
: inc ( a -- a+1 ) >r 1 to+ V1 V1 rdrop ;
42 inc . \ prints 43
\ this works too
: inc5 ( a -- a+5 ) >r 5 >r begin 1 to+ V1 next V1 rdrop ;
42 inc5 . \ prints 47
What's this "rfree" used above? It's an automatic RS adjuster. It looks at the
"R counter" and emits an RS adjustment equivalent to its current level, and then
sets this level to 0. In the example above, it's equivalent to "rdrop rdrop
Be aware that the "R counter" is not always accurate! If you have conditional
modifications to RS levels, "rfree" is going to be broken. See section below.
### Manual [rcnt] adjustments
The "R counter" that determines local variable slots is oblivious to conditional
codes. It's not common to have code that conditionally maintain separate RS
levels (they always need to stay balanced, of course), but it can happen. For
example, in early "exit" paths, we often have to include a few "rdrop" before
the "exit" call. This messes up the "R counter". You can manually adjust it
through the [rcnt] variable. For example, if you want your next ">r" to push to
V1, you would precede it with:
[ 0 [rcnt] ! ]
## Binary width modulation
In a 32-bit system, it is frequent to want to access memory in 3 widths: 32-bit,
16-bit and 8-bit.
Traditionally, Forths have the "c" variant of words for accessing memory in
8-bit. For example, "@" fetches a 32-bit number and "c@" fetches a 8-bit one.
Dusk has a few of these words, for convenience, as well as a "w" variant for
However, Dusk has a wide selection of useful, native memory-related word:
@ ! +! @! @+ !+ @@+ @!+ ,. These all can all be useful in their 16-bit and 8-bit
variants. If we add a "c" and "w" variant to all these words, that's a lot of
But there's more! "to" semantics are built upon those words, so we'd need to add
a "c" and "w" variant to all "to" words too? That's heavy.
To make things lighter, Dusk has "8b" and "16b" modulator words. A memory word,
when preceded by one of these modulator word, will execute its 8-bit or 16-bit
variant. These words are immediate and work during compilation too.
Better yet: the choice of variant is applied at compile time, which means that
this modulation system has no runtime cost in terms of speed.
Those modulator words can only be used in front of one of the "memory" words
listed above or in front of a "to" word. It *doesn't* work in front of a "naked"
(no "to") value reference.
Do not use those modulators in front of other words, you'll crash the system.
TODO: allow creation of width-modulable words in Forth. Something like:
: .myx ( n ) .x ; :16b .x2 ; :8b .x1 ;
TODO: add dictionary entry flag to indicate that the word is binary modulable.
this way, we can avoid crashes, making the system a bit easier to debug.
Structures are an effective way to address offsets from base addresses while
keeping the general namespace clean. Structures have a name and a list of fields
and are declared thus:
This describes an 12 byte wide struct with 3 fields.
Anything goes inside of a struct. Whatever word you define there will be
included in the struct's namespace. Those words will not be present in the
system dictionary. While inside a struct definition, however, you can access
words inside the struct directly.
"sfield" and "smethod" have a special struct-specific behavior as they
automatically place themselves inside the struct at the correct offset and
increase the struct's size.
A struct size can be obtained with the "SZ" word automatically added to every
struct. It returns the size, in bytes, of the fields included in the struct. It
can also be used inside a struct definition to get the struct size "up until
now". This can be useful for initialization methods:
: :new ( -- 'foo ) here SZ allot0 ;
A struct holds no data by itself and can't be used directly to access fields
from memory. You refer to fields in a struct by supplying it with a source
pointer, like this:
create data1 1 , 2 , ' mybleh ,
create data2 3 , 4 , ' mybleh ,
data1 Foo bar . \ prints 1
data2 Foo baz . \ prints 4
Fields obey "to" semantics:
42 to+ data1 Foo bar
data1 Foo bar . --> prints 43
Field access can be compiled:
: foobar data2 Foo baz ;
foobar . \ prints 4
A method is an alias to a word reference inside a struct. When the method is
invoked, it dereferences the alias and calls it, but it also pushes a reference
of the data structure on top of PS so that the method can work with its data.
By convention, method names start with ":", but nothing forces you to have it.
Example of a method that adds bar to baz:
: mybleh ( 'data -- n ) dup Foo bar swap Foo baz + ;
defined before the previous examples, then you could do:
data1 Foo :bleh . \ prints 3
data2 Foo :bleh . \ prints 7
You will often want to bind data to structs. You can do so with "structbind":
data1 structbind Foo MyData1
MyData1 :bleh . \ prints 3
: someword MyData1 bar ;
someword . \ prints 2
A structbind is compiled with a level of indirection that allows it to be
rebound. So, you can rebind structbinds with the word "rebind" (we can't use
"to" because that applies to the original bind's field):
data2 ' MyData1 rebind
MyData1 :bleh . \ prints 7
someword . \ prints 4
All structs have a ":self" method which is a noop, and thus returns a reference
to the associated data. This can be used to get a structbind's data reference:
data2 :self \ MyData1 is on PS TOS
There are several kinds of fields:
* sfield: a 4 byte field
* sfieldw: a 2 byte field
* sfieldb: a 1 byte field
* sconst: a 4 byte field that doesn't obey "to" semantics
* sfield': a field that yields its address instead of a value. Useful for
buffers. It must be called with a size argument.
You can also create gaps in the struct with "sallot":
In this struct, foo's offset is 0 and bar's is 46. You can also extend a
previous struct with a new struct:
extends Foo struct[ Bar
create data3 1 , 2 , ' mybleh , 1234
data3 Bar bazooka . \ prints 1234
data3 Bar bar . \ prints 1
Extended structs will have their "running size" pick up where the extended
struct left. They also inherit their whole namespace, which means that any word
in the extended namespace can be accessed directly without prefixing it with
the struct's name. Extending a struct does not modify the extended struct.
If, instead of extending a struct, you want to "augment it", that is, to
supplement a struct that was defined earlier with new elements, you can use
struct+[ in this fashion:
This does not create a new struct, but rather adds a new field to the existing
struct. It is useful for struct that require partial declaration because of
inter-dependency with other pieces of code.
Warning: do not augment a struct with new fields if it has already been extended
by another struct because this will generate slot conflicts. You can augment a
struct that has been extended, that will work, but only with non-field words.
The system interpret loop, at boot time, feeds itself from "key", a ( -- c )
word which spits characters to be interpreted one at a time. This word is an
alias in which drivers plug themselves.
System output happens through "emit", a ( c -- ) alias which spits characters to
the system's "console", whatever that happens to be on the system.
Things could theoretically stay as simple as this, but two requirements lead us
to add some complexity on the I/O front: the need to have redirectable and
abstract I/Os for applications and the need to have a console that reads line by
In the middle of the boot process, sys/io is loaded (see doc/io). This subsystem
introduces the StdIn and StdOut structures (with their "stdin" ( -- c ) and
"stdout" ( c -- ) proxies) and, by default, plugs new ConsoleIn and ConsoleOut
structures into them. These structures simply plug themselves into "key" and
From that point on, the system interpret loop feeds itself from "stdin". "emit"
and "stdout" become synonymous, the latter being preferred. "emit" becomes the
"console only" output word and "stdout" is the "generally console, but
redirectable output" word.
The basic Dusk console, the sys/rdln subsystem, inserts itself between "key" and
"stdin". It feeds itself from key and provides line editing capabilities. When
a whole line is ready to be interpreted, that is fed to stdin.
## Loading files
Dusk's interpreter can be fed with the contents of files through various words
such as :fload, f<< and ?f<<. Refer to doc/file.
If you want to compile C source files, you'll want to look at doc/cc.