Make the assembler self-contained.
Split program load from input run.
Link pass; VM no longer uses symbol table.
This is a simple compiler based on and grown from the Meta-II meta-compiler. It's been adjusted to use Unicode symbols instead of ASCII keywords, and it has been cross-compiled to a Python implementation. The Python version was originally built using James M. Neighbors' Metacompilers Tutorial.
This version includes the indentation control operators to permit output of structured code (as contrasted with linear lists of assembly "order codes") like Python or C. (This is a little counterproductive because the whole point of the compiler was to go from a higher level description to lower level labels and jumps. Figuring out the loop and branch patterns automatically is kind of the point, eh?)
Makefile - demonstrate circular metacompilation.
newlang5.py.newlang5 - compiler self-description.
newlang5.py.py - Metacompiler in Python
If you remove the output clauses from the description file you get a pure (pseudo-declarative) grammar:
▶ PROGRAM
PROGRAM → '▶' ● ★ST '◀' ▪
ST → ● '→' EX1 '▪' ▪
EX1 → EX2 ★('|' EX2) ▪
EX2 → (EX3 | OUTPUT) ★(EX3 | OUTPUT) ▪
EX3 → ● | ≋ | '●' | 'ℕ' | '≋' | '(' EX1 ')' | '∅' | '★' EX3 ▪
OUTPUT → '«' ★ OUT1 '»' ▪
OUT1 → '⊙' | ≋ | '#' | '↵' | '⇤' | '⇥' | '↦' | '↤' ▪
◀
A PROGRAM
is a named sequence of zero or more ST
statements.
Statements are names followed by →
, an EX1
expression, and terminated
by ▪
, see the ST
grammar.
EX1
implements alternation of terms with the |
symbol.
EX2
implements sequences of terms
EX3
implements parentheses for grouping / precedence and the Kleene
Star operator.
Structure operations:
▶
and ◀
- These symbols bracket the whole grammar.→
and ▪
- These define ST
statements, named rules (each of which becomes a subroutine.)★
- Kleene Star operator (zero or more repetitions.)|
- Alternatives (tried in order, left-to-right.)Matching operations:
●
- Matches an identifier.≋
- Matches a string literal.ℕ
- Matches a numeric literal. (Not used.)∅
- Matches the null set (use this with the '|' alternation operator to make optional terms: "match this or nothing". Not used.)Output operations:
«
and »
- Bracket sequences of output terms.#
- Generate and output a label for branches (one per ST
statement. Not used.)⊙
- Outputs the last matched item (identifier or literal.)↵
- Output the current buffer with a newline, clear the buffer, add indention if any to buffer for next line.⇤
- Clear the buffer (no indentation will be added.)⇥
- Add one indentation to the output buffer.↦
- Increase auto-indentation level by one.↤
- Decrease auto-indentation level by one.Armed with this legend of symbols you should be able to study the pure
grammar above and discern how it recognizes itself. Then proceed to study
the newlang5.py.newlang5
grammar file to see how the output terms in
the grammar build the meta-compiler. The actual newlang5.py.py
meta-compiler source code is a little long-winded (mostly due to the
verbose output system) but it's worth examining to see how the grammar
statements look when they are converted into source code.
This implementation is circular in (at least) two ways. First, of course, it compiles itself (the infamous meta-circular compiler, or circular meta-compiler.) Second, it compiles from one high-level language (the grammar above) into another high-level language (Python). In doing so, it obscures the other interesting thing about the META-II compiler (beside the fact that it can describe and regenerate itself) which is that it converts the high-level grammar into a low-level machine code which has only branch and call operations for control-flow, no looping nor if..then..else built-in.
This is one of the core tasks of a compiler: to figure out the jump labels and branch instructions to implement the higher-level constructs (e.g. for loops or if..then..else) on behalf of the programmer. (We used to have to do this by hand, on paper!)
The META-II machine instructions ("order codes" in the old fashioned parlance) do a good job of illustrating this flow-control arrangement computation while letting most of the rest of the details (matching, output, conditionals, and so on) be abridged.
In order to see that in action, and to show how to adjust the grammar
file to generate a different but equivalent output there is a version of
the virtual machine form of META-II in the p
subdirectory.
This machine outputs (and is described by) a kind of assembly language that recapitulates the original META-II "order codes" with some of the output improvements of Neighbors in a format that resembles a series of function calls. The assembler reads this "assembly code" and builds a simple representation ("machine code") of the program in a "RAM" (Python list) with a symbol table (Python dict) that maps labels to the indices of instructions in RAM. Then a very simple virtual machine runs the program. If the input is the original grammar then the output is the identical "assembly code" of the program.
Unlike the newlang5.py.py
this virtual machine does not emit it's own
code. It's like the CPU, the program that runs on the CPU emits a copy
of itself, but it doesn't include the VM code (just as newlang5.py.py
doesn't include the code for the Python interpreter.)