~sforman/newlang5

Fun with META-II
Make the assembler self-contained.
Split program load from input run.
Link pass; VM no longer uses symbol table.

refs

trunk
browse  log 

clone

read-only
https://git.sr.ht/~sforman/newlang5
read/write
git@git.sr.ht:~sforman/newlang5

You can also use your local clone with git send-email.

#Very Simple Compiler based on Meta-II

This is a simple compiler based on and grown from the Meta-II meta-compiler. It's been adjusted to use Unicode symbols instead of ASCII keywords, and it has been cross-compiled to a Python implementation. The Python version was originally built using James M. Neighbors' Metacompilers Tutorial.

This version includes the indentation control operators to permit output of structured code (as contrasted with linear lists of assembly "order codes") like Python or C. (This is a little counterproductive because the whole point of the compiler was to go from a higher level description to lower level labels and jumps. Figuring out the loop and branch patterns automatically is kind of the point, eh?)

#Contents

Makefile - demonstrate circular metacompilation.
newlang5.py.newlang5 - compiler self-description.
newlang5.py.py - Metacompiler in Python

#Grammar

If you remove the output clauses from the description file you get a pure (pseudo-declarative) grammar:

▶ PROGRAM

PROGRAM → '▶' ● ★ST '◀' ▪

ST → ● '→' EX1 '▪' ▪

EX1 → EX2 ★('|' EX2) ▪

EX2 → (EX3 | OUTPUT) ★(EX3 | OUTPUT) ▪

EX3 → ● | ≋ | '●' | 'ℕ' | '≋' | '(' EX1 ')' | '∅' | '★' EX3 ▪

OUTPUT → '«' ★ OUT1 '»' ▪

OUT1 → '⊙' | ≋ | '#' | '↵' | '⇤' | '⇥' | '↦' | '↤' ▪

◀

#Statements

A PROGRAM is a named sequence of zero or more ST statements. Statements are names followed by , an EX1 expression, and terminated by , see the ST grammar.

EX1 implements alternation of terms with the | symbol.

EX2 implements sequences of terms

EX3 implements parentheses for grouping / precedence and the Kleene Star operator.

#Grammar Symbols and their Meaning

Structure operations:

  • and - These symbols bracket the whole grammar.
  • and - These define ST statements, named rules (each of which becomes a subroutine.)
  • - Kleene Star operator (zero or more repetitions.)
  • | - Alternatives (tried in order, left-to-right.)

Matching operations:

  • - Matches an identifier.
  • - Matches a string literal.
  • - Matches a numeric literal. (Not used.)
  • - Matches the null set (use this with the '|' alternation operator to make optional terms: "match this or nothing". Not used.)

Output operations:

  • « and » - Bracket sequences of output terms.
  • # - Generate and output a label for branches (one per ST statement. Not used.)
  • - Outputs the last matched item (identifier or literal.)
  • - Output the current buffer with a newline, clear the buffer, add indention if any to buffer for next line.
  • - Clear the buffer (no indentation will be added.)
  • - Add one indentation to the output buffer.
  • - Increase auto-indentation level by one.
  • - Decrease auto-indentation level by one.

Armed with this legend of symbols you should be able to study the pure grammar above and discern how it recognizes itself. Then proceed to study the newlang5.py.newlang5 grammar file to see how the output terms in the grammar build the meta-compiler. The actual newlang5.py.py meta-compiler source code is a little long-winded (mostly due to the verbose output system) but it's worth examining to see how the grammar statements look when they are converted into source code.

#The Task of the Compiler

This implementation is circular in (at least) two ways. First, of course, it compiles itself (the infamous meta-circular compiler, or circular meta-compiler.) Second, it compiles from one high-level language (the grammar above) into another high-level language (Python). In doing so, it obscures the other interesting thing about the META-II compiler (beside the fact that it can describe and regenerate itself) which is that it converts the high-level grammar into a low-level machine code which has only branch and call operations for control-flow, no looping nor if..then..else built-in.

This is one of the core tasks of a compiler: to figure out the jump labels and branch instructions to implement the higher-level constructs (e.g. for loops or if..then..else) on behalf of the programmer. (We used to have to do this by hand, on paper!)

The META-II machine instructions ("order codes" in the old fashioned parlance) do a good job of illustrating this flow-control arrangement computation while letting most of the rest of the details (matching, output, conditionals, and so on) be abridged.

In order to see that in action, and to show how to adjust the grammar file to generate a different but equivalent output there is a version of the virtual machine form of META-II in the p subdirectory.

This machine outputs (and is described by) a kind of assembly language that recapitulates the original META-II "order codes" with some of the output improvements of Neighbors in a format that resembles a series of function calls. The assembler reads this "assembly code" and builds a simple representation ("machine code") of the program in a "RAM" (Python list) with a symbol table (Python dict) that maps labels to the indices of instructions in RAM. Then a very simple virtual machine runs the program. If the input is the original grammar then the output is the identical "assembly code" of the program.

Unlike the newlang5.py.py this virtual machine does not emit it's own code. It's like the CPU, the program that runs on the CPU emits a copy of itself, but it doesn't include the VM code (just as newlang5.py.py doesn't include the code for the Python interpreter.)