~quf/stackell

idea for an { experimental, concatenative, stack-based, pattern matching } programming language
Add trivial parser tests
Fix GHC warnings
remove megaparsec wrapper for testing

refs

trunk
browse  log 

clone

read-only
https://git.sr.ht/~quf/stackell
read/write
git@git.sr.ht:~quf/stackell

You can also use your local clone with git send-email.

#stackell - an { experimental, concatenative, stack-based, pattern matching } programming language

#Current status

A proof of concept in form of a Python program is available.

#Introduction

Each stackell program is a text file of sentences. Each sentence spans one line and is either a word definition or a list of words. stackell is stack-based, which means that it manipulates a stack of values. Words are functions which manipulate the stack (usually only a few values at the top). stackell is a concatenative programming language, which means that the program is defined in terms of functions, not values; functions are composed left-to-right without an explicit composition operator. Put in a different way, stackell sentences are either word definitions, or word/function compositions.

Here is an example of a sentence which is not a definition; these are read left-to-right:

1 2 + 5 * .

This sentence consists of 4 words, two times 2, +, and . (a dot). The words are interpreted one by one left to right (alternatively, the functions corresponding to the words are concatenated left-to-right, then the resulting function is executed). Numerals are functions which push their value on the stack. In this example, the following happens:

  • 1 is pushed on the stack
  • 2 is pushed on the stack
  • + is a word that takes the top two values of the stack and places their sum instead; in this case, 2 and 1 are removed from the stack, and 3 is placed on the stack.
  • 5 is pushed on the stack
  • * works like +, but for the product instead; here, 5 and 3 are removed from the stack, and 15 is placed on top.
  • . is a word that takes the top value of the stack and outputs it, followed by a line feed; in this case, it removes 15 from the stack, outputs 15, and a line feed.

Definitions can be written in the point-free style of Forth:

square := dup * # a -- b ; squares the top value on the stack

The := (pronounced "definition" or "assignment") sign marks a definition. The part after # is a comment and is ignored by the interpreter. The comment a -- b is a stack-comment in the style of Forth. The first word on the left hand side of the := sign is the word being defined (aka definiendum). On the right hand side is the word's assigned meaning (aka definiens). The words on the right hand side are composed left-to-right; when interpreted, square would first duplicate the top item on the stack, then multiply (*) the top two values - effectively computing the square of the top value on the stack.

More generally, definitions may use pattern matching. For example, the dup function may be defined as follows:

dup a := a a  # a -- a a ; duplicates the top value of the stack
dup @ := @    # -- ; if the stack is empty, do nothing

These two sentences contains an := sign, which means they are definitions. The first word on the left hand side of the := sign is the word being defined (aka definiendum). On the right hand side of the := sign is its assigned meaning (aka definiens). On the right hand side, we have a bunch of words (two times a) delineated by spaces; these words are composed to give a new function. After the definiens, we have another word, a or []. These are matched (symbolically) against the top of the stack. a is a variable standing for any value; @ is a special symbol for the bottom of the stack. Once dup is interpreted:

  • The definitions are checked one by one for a match against the stack. In the above example, there is only one symbol to match. In general, the symbols are matched right-to-left (starting at the := sign) corresponding to top-down for the stack. On the first match, the right hand side is interpreted.

  • Before the right hand side is interpreted, the matching values are removed from the stack.

  • In the sentence on the right hand side, the matching symbols on the left hand side (like a in the above example) may be used to push the corresponding value on the stack again. For a different example, consider swap:

    swap @   := @   # @ -- @
    swap @ a := a   # a -- a
    swap a b := b a # a b -- b a
                    # before the match, b (closest to the left of :=) is the top value of the stack, a is the second from the top.
                    # After removing them, we push the (previous) top value first, then the other one.
                    # Afterwards, the previous second value is now on the top.
    

Word definitions may recur (and mutually recur). For example, this is a function to empty the stack:

flush @ := @     #   @ -- @
flush x := flush # [a] -- @

Matches may also match against values, and contain boolean guards in parantheses just before the := sign. For example, the absolute value function may be defined like this (assuming an appropriate definition of negate):

abs @           := @
abs 0           := 0
abs n ( n 0 > ) := n negate
abs n           := n

Guard clauses are arbitrary stackell sentences that should evaluate to a boolean (0 being false and nonzero values being true). To evaluate them, a sub-interpreter (with a copied stack and dictionary) is spawned.

#Implementation

The interpreter maintains:

  • a stack of values ("the" stack)
  • an instruction pointer that points to the next word to be interpreted and a return stack (either explicitely or implicitely through recursion) that points to the word to return to when we're done.
  • a dictionary, mapping words to a list of definitions (in the order they appear in). Each dictionary entry would be a list of tuples (left-hand side stack pattern, guard clause, right-hand side) The dictionary could possibly be (partially) compiled for performance: We could assign each word a unique number; then the guard clauses and right hand sides (aka definiens) would be represented as just arrays of numbers, giving faster lookups (array lookup versus string table lookup) and faster advancement to the next word.

Input file parsing should be pretty straightforward and can be done as a pass before evaluation since metaprogramming is not a goal. This enables mutual recursion because word definitions may reference other words before they have been defined.

When interpreted words are interleaved with definitions, the interpretation is performed once all definitions in the file have been noted down. This is inconsistent with interactive use (you cannot interpret a word before you define it), but seems to match the behaviour for mutually recursive definitions better.

#More ideas and random thoughts

  • There probably should be a heap or some other persistent data structure. Or maybe stackell should be a completely pure language (except input/output; not sure how to deal with this one yet). Making stackell words impure would probably encumber optimization (which may become a goal in the future) and cause problems for guard clauses (we really don't want to mutate arguments for failed matches in guard clauses...).

  • There probably should be more data types (at least arrays/lists and floating point numbers; perhaps also strings and booleans, or even abstract data types).

  • There should be some words that control the interpreter. For example, for interactive use, there should be a way to tell the interpreter to forget all definitions of a word; there should also be a way to import definitions from other files for better organization. Perhaps a convenient way to implement this would be to use word cases as differentiator; something like: Words that consist only of { symbols, lower case letters } are definable words; words that consist only of upper case characters are special words that cannot be used as variables (say, IMPORT xyz.stackell to import a module); words that consist of CamelCase are abstract data types or values.

  • Pattern matching based on types. Perhaps something like (Num a), (x :: Bool), "", ("Hello, "++xs), Red, Blue? Now we're getting into territory where we actually have to think about proper design.

  • Static type checking/type inference of stack comments. For example dup should be known to increment the stack size by one (and the type of the top two words is the same after dup, if that matters): However, that might be problem for operators such as ?dup:

    ?dup 0           := 0    # :: x -- x
    ?dup a ( a < 0 ) := a a  # :: x -- x x
    ?dup @           := @    # :: @ -- @
    
  • Even in interpreter mode, mutual recursion should be possible using forward declarations with false guard patterns. A useless example (where this is not even needed, because the non-recursive cases could simply be given first):

    isodd  ( false ) = 0 # or whatever; this definition _never_ applies
    iseven ( false ) = 0 # same.
    
    isodd @ = @
    isodd 0 = 0
    isodd n = decrement iseven
    
    iseven @ = @
    iseven 0 = 1
    iseven n = decrement isodd
    
  • It should be pretty easy to have reasonable stack traces (for both stacks) and error messages in case of problems (mainly inapplicable patterns). Maybe break points would be nice as well. There should probably be an explicit (small! max size ~3) stack of interpreters/interpreter states spawned for guard clauses. I'd like to have a reasonable stack trace and error message for the following diverging example:

    f ( f ) = 0
    
  • Matches against values could mostly be implemented as syntactic sugar for guard clauses (except if ADTs are in play). For example, abs 0 := 0 could be syntactic sugar for abs a ( a 0 = ) := 0, except that a is not bound on the right hand side. On the other hand, luma (ColorRGB r g b) := 0.2126 r * 0.7152 g * + 0.0722 b * + could not be implemented this way for an abstract data type Color Float Float Float.

  • If additional types are added: inheritance system and multiple dispatch?

  • lambdas/higher level programming: are they necessary/useful?

  • Compilation to a VM: Each definiendum and lambda is assigned a number; each definition is assigned an array of numbers (basically threading). Polymorphism can be dealt with by symbolic execution (given 3 dup and 2 definitions of dup (one of them polymorphic), compile two functions dup :: Int -> Int). Stack pattern matches (and using them on the right hand side of definitions) can probably be replaced by using functions such as rot, drop and dup. However, I have no idea how to do this algorithmically, and it's not clear that this is necessarily faster (especially if many variables are at play). Using a resizable array for the stack may or may not be a good idea (we would need to copy the whole array for every guard clause).

  • Let bindings? We're beginning to stray from being stack based. On the other hand, egcd from examples/egcd.stackell is extremely ugly.