~vdupras/duskos

duskos/fs/doc/design/simple.txt -rw-r--r-- 2.9 KiB
c515797bVirgil Dupras comp/c/vm/i386: fix integer promotion bug in logical ops 4 hours ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Dusk OS is simple

Many software projects claim to be simple. I believe that Dusk's claim to
simplicity is stronger than most non-Forth software projects.

Dusk's simplicity comes from it being a Forth. Forth's approach to simplicity is
revolutionary, but it's difficult to comprehend without a good hands on
experience with it. To describe this simplicity to the uninitiated, I'd say that
Forth's approach to complexity is to sidestep it.

Let's try to illustrate this simplicity with examples. Because Dusk's main
innovation compared to other Forths is to include a C compiler, my main
reference when comparing complexity is Fabrice Bellard's Tiny C Compiler.

Tcc enjoys a very good reputation among geeks, and Fabrice Bellard is generally
considered to be a genius. Nevertheless, Dusk's C compiler, excluding backends,
is 1200 lines of code and tcc, excluding backend is roughly 30,000 lines of
code. At the time of this writing, Dusk CC isn't quite completed yet, but there
isn't much left to add, I don't think it will exceed 2000 lines by much.

The i386 backend of Dusk CC, including its assembler, is 600 lines of code. In
tcc, the i386 backend weighs in at 3800 lines of code.

How can we explain this difference? It's true that Forth code is generally
denser than C, but not by a factor of 15. It's true that I'm sometimes clever,
but not more than Fabrice. There are multiple reasons for these differences and
they all have to do with Forth's habit of side-stepping complexity.

First of all, there's relocation and binary format. tcc produces ELF binaries
where Dusk CC compiles code in memory designed to run where its written. Tcc
dedicates 10,000 lines of code to only file format logic. That's huge, but if
you want to build a compiler in the UNIX world, you *have* to do this.

Forth is a memory oriented system. It comes with constraints, but it also comes
with simplicity benefits. As UNIX users, we're blind to these benefits and
see some of the complexity associated with computing as unavoidable. It's not.
That is why Forth's approach to simplicity is revolutionary, because it removes
a blindfold.

(TODO: there used to be a comparison between DuskCC's macro system and tcc's
pre-processor, but the macro system since changed significantly and that
comparison didn't hold. Re-compare when the new macro system is completed.)

A third simplicity factor is parsing boilerplate. Tcc's assembler's input is
text formatted in GNU assembler format. This parsing boilerplate is a
significant part of tcc assembler-related complexity. This contraint in UNIX is
inevitable because inter-process communication in UNIX generally has to be done
through streams, usually in text format for maximum compatibility. This implies
serialization and deserialization boilerplate at multiple levels. In Forth,
memory is shared and no such constraint exists. Words communicate through
structured memory. We can thus afford to sidestep this complexity and use
regular Forth words to assemble binaries.