~aathn/kalpis

A partially invertible functional language with locally invertible semantics
Minor fixes in README.md
Add Makefile and instructions for non-interactive usage
Add dune to list of dependencies

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~aathn/kalpis
read/write
git@git.sr.ht:~aathn/kalpis

You can also use your local clone with git send-email.

#Kalpis: an Arrow-based Locally and Partially Invertible System

Kalpis is a small functional invertible programming language inspired by Sparcl (Matsuda and Wang 2020).

Like Sparcl, Kalpis allows partially invertible programming, which means that invertible computations can be parameterized by unidirectional ones. Unlike Sparcl, Kalpis does not have invertible or linear data types---instead, the language distinguishes between unidirectional and invertible contexts. This makes for a simpler foundation, while retaining key features such as mixing higher-order and invertible programming.

The name Kalpis is an acronym for the title above, alluding to the fact that the language can be given a locally invertible semantics by translation to an arrow combinator language.

#Getting Started

The project is written in OCaml, and can be built using dune. To start,

  1. install opam first following the instruction in the official site,
  2. check if you are using ocaml >= 5.0.0, and
  3. run opam install dune linenoise pp_loc menhir to install the project dependencies.

#Non-interactive Usage

The project includes a Makefile which can be used to build the project and evaluate all included examples non-interactively (supposing make is available on the system). To use it, simply run make at the project root.

Alternatively, to run the corresponding commands manually, you can invoke dune build && dune exec kalpis -- examples/*.kalpis in a bash-like shell.

The commands will output the types and values of all definitions in the example files. In either case, if the exit code is zero, the evaluation finished successfully.

#Interactive Usage

Invoke dune build (or make build) at the project root to build the project, and dune exec kalpis (or make run) to run the interpreter.

Once you run the second command, you will be presented with a command prompt for the Kalpis REPL. At the REPL, input arbitrary expressions to evaluate them, or run :l "<filename>" to load an example. For example, starting the REPL at the project root and running :l "examples/nat.kalpis" should give the following output.

Kalpis> :l "examples/nat.kalpis"
addn : nat -> nat <-> nat = <function>
mul : nat -> nat <-> nat = <function>

The function addn implements partially invertible addition on natural numbers. You can try it by running the following command in the same session.

Kalpis> let x = run (addn (S (S Z))) (S (S (S Z)))
x : nat = S (S (S (S (S Z))))

Here, addn (S (S Z)) produces a bijection which will add or subtract two to its argument. The built-in function run : ('a <-> 'b) -> 'a -> 'b allows running a bijection in the forward direction, in this case on the input three.

To run the inverse, the built-in inv : ('a <-> 'b) -> ('b <-> 'a) can be used.

Kalpis> run (inv (addn (S (S Z)))) x
- : nat = S (S (S Z))

As you can see, the original input of three reappears.

#Programming Introduction

The syntax of Kalpis is very similar to that of OCaml, with the extra addition of invertible programming constructs. As an example, consider the function addn (available in examples/nat.kalpis), which we define in Kalpis as follows.

type nat = Z | S of nat

let addn : nat -> nat <-> nat =
 fun* n m ->
  match n with
  | Z -> m
  | S n' -> S (addn n' <> m)

As showcased above, addn performs partially invertible addition on natural numbers.

The key differences from an ordinary OCaml program are:

  1. the bijection type <->
  2. bijection construction fun*
  3. bijection application <>

The arrows associate to the right, so the type of addn indicates a function taking a nat to produce a bijection nat <-> nat. We construct addn using the syntax fun* n m -> ..., which is parsed as fun n -> fun* m -> .... Here, n is an ordinary function parameter and m is a bijection parameter---the main difference between the two is that a bijection parameter must be used linearly (exactly once in every branch of the program), and can only appear in invertible contexts (explained in more detail below). Ordinary function parameters on the other hand may be used any number of times, but only in a unidirectional context. To apply addn, we use the syntax addn n' <> m, which should be parsed as (addn n') <> m---the symbol <> denotes applying the bijection addn n' to the input m.

Also, note that all (and only) top-level let-bindings are recursive.

As another example, consider the function autokey (available in examples/encryption.kalpis).

let autokey : char -> char list <-> char list =
 fun* k l ->
  match* l with
  | [] -> []    @ function [] -> true | _ -> false
  | h :: t ->
      let* (h, r) = pin (fun h' -> autokey h') <> (h, t) in
      shift (sub (int_of_char k) 65) <> h :: r
    @ function [] -> false | _ -> true

This function takes a primer character k which it interprets as an integer ('A' ↦ 0, 'B' ↦ 1, ..., 'Z' ↦ 25) determining a number of steps to shift the first character of the input. Each consecutive character is then similarly shifted by the amount given by their predecessor. For example, here is the result of autokey 'F' applied to "HELLO".

Kalpis> run (autokey 'F') "HELLO"
- : char list = "CXHAD"

The letter F represents a (cyclic left) shift of 5 characters, mapping H in the input to C in the output. Next, H represents a shift of 7 characters, mapping E in the input to X in the output, and so on.

This example features invertible pattern matching, marked by the keyword match*. When matching on the input to a bijection, we must ensure that the inverse execution can also determine which branch to take. For this purpose, postconditions marked by the symbol @ can be attached to each branch---here, they distinguish the base case where the empty list is returned. Each postcondition is a boolean function that must return true for any result of its branch and false for any result of the branches below it (this is checked at runtime following the symmetric first-match policy of (Yokoyama et al. 2011)). The inverse execution will check each condition in turn, selecting the first branch whose condition returns true. Whenever the branches are syntactically distinct (as in autokey), these postconditions can be omitted.

The syntax let* p = e1 in e2 is short for match* e1 with p -> e2. The definition of autokey uses the built-in pin : ('c -> 'a <-> 'b) -> 'c * 'a <-> 'c * 'b to allow passing the variable h (which comes from the invertible input) as (non-invertible) parameter to the recursive call. This works by returning a copy of h, which ensures that the same value is available during inverse execution. The function shift : int -> char <-> char is defined in examples/encryption.kalpis, and sub : int -> int -> int and int_of_char : char -> int are built-in.

See examples for more example programs. Some examples are derived from those at https://github.com/kztk-m/sparcl (see below).

#Correspondence to the Paper and Repository Structure

This repository implements a prototype of the Kalpis language presented in Sections 2 and 3 of the ESOP24 paper Reconciling Partial and Local Invertibility (Ågren Thuné, Matsuda and Wang 2024). The syntax in the repository is slightly different, closely adhering to OCaml rather than the Haskell-like style of the paper. The table below describes the correspondence between the syntax of the paper and the implementation.

Paper Implementation
def let
def• let*
sig (type signature)
λx.u fun x -> u
λ•x.r fun* x -> r
u† inv u
u₁ ◆ u₂ run u₁ u₂
u ◆ r u <> r
pin pin
case ... of match ... with
case• ... of match* ... with
with @

Formally, the main difference to the paper is that this interpreter implements a lightweight variant of let-polymorphism.

The repository contains two main folders: src/ and examples/. The src folder contains the (OCaml) implementation of the Kalpis interpreter, which roughly implements the semantics of Section 3 in the paper. The examples folder contains example Kalpis programs, including all examples presented or mentioned in Section 2 of the paper. The tables below describe the contents of each file and list the corresponding sections in the paper, if any.

#Contents of src/

File Description
builtin.ml builtin function types and definitions
eval.ml operational semantics (Section 3.5)
interpreter.ml main interpreter/REPL machinery
kalpis.ml interpreter entrypoint
lexer.mll lexer definition
message.ml error message machinery
parser.mly parser definition
syntax.ml syntax definitions (Sections 3.1, 3.2)
types.ml type checking (Section 3.4)
value.ml value definitions (Section 3.5)

#Contents of examples/

File Description
addlsb.kalpis binary addition
encryption.kalpis Caesar, Vigenère, and Autokey ciphers (Sections 1, 2.3, 2.4)
fib.kalpis Fibonacci numbers (Section 2.2)
huff.kalpis Huffman coding (mentioned in Section 2.4)
loop.kalpis invertible looping (trace combinator)
lz.kalpis LZ-style sliding-window compression (mentioned in Section 2.4)
nat.kalpis natural number addition and multiplication (Section 2.1)
pi.kalpis encoding of trees using pre- and inorder traversals
reverse.kalpis multiple implementations of list reversal
runlength.kalpis run-length encoding
subs.kalpis consecutive differences

The examples addlsb, fib, huff, loop, pi, reverse, and subs were adapted from the existing Sparcl implementation. The example lz also has a Sparcl counterpart; both are given simultaneously by the same authors.

#Documentation

#Unidirectional and Invertible Terms and Contexts

A core concept to be able to program in Kalpis is that of unidirectional and invertible terms and contexts. This is what allows the language to ensure that all well-typed bijections can be inverted. Some terms, like function abstractions (fun x -> ...), bijection abstractions (fun* x -> ...), and function applications (f x) are unidirectional and must only be used in unidirectional contexts. For example, the body of a function fun x -> ... is unidirectional, whereas the body of a bijection fun* x -> ... is invertible. This distinction is part of the syntax of the language, so for example the bijection fun* x -> x 0 is not a type error, but a syntax error. A variable introduced by an ordinary function binder fun x -> ... can only be used in a unidirectional context, whereas a variable introduced by a bijection binder fun* x -> ... may only (and must) be used in an invertible context.

This is not to say that unidirectional variables cannot appear inside invertible computations at all---a key point is that unidirectional contexts are embedded inside some invertible terms, such as the left-hand side of a bijection application f <> x. This allows referring to previously defined bijections, and applying partially invertible functions, like we saw with addn n' <> m in the introduction above.

The default interpreter context is unidirectional; this is why bijections must be applied using the run primitive at the top level---the syntax f <> x is only valid in an invertible context. One thing to note is that unidirectional and invertible variables live in distinct namespaces, so attempting to write the function fun* x y -> (x, y) (where x is unidirectional) will give an error Reference to unbound reversible variable x, rather than complaining that x is used in an invertible context.

The table below describes unidirectional and invertible terms in detail. The terms of the language are listed, with each subterm denoted U or I depending on whether it is a unidirectional or invertible context, and the term itself similarly marked depending on whether it is unidirectional or invertible. The metavariable X is used to denote terms which can be used either invertibly or unidirectionally.

Term U / I
x (unidirectional variable) U
fun x -> U U
fun* x -> I U
U₁ U₂ U
match U with {p -> U} U
x (invertible variable) I
U <> I I
match U with {p -> I} I
match* I with {p -> I @ U} I
(X₁, ..., Xₙ) X
Con X X
[] X
X₁ :: X₂ X
(integer) X
(character) X
(boolean) X

It may seem surprising at first that bijections fun* x -> ... are unidirectional, but while the result of the program fun* x -> ... is an invertible function, the program itself is not necessarily invertible. For example, if bijections were treated as invertible, the program let f = fun* x -> fun* y -> (x, y) would be (syntactically) valid, but its type would have to be 'a <-> ('b <-> ('a * 'b)). Then, the inverse execution of f would have to be able to extract an 'a value from an arbitrary bijection of type 'b <-> 'a * 'b, which would require a more advanced semantics if at all possible.

#Builtin Functions

Kalpis currently features a limited amount of builtin functions, defined in src/builtin.ml. No special syntax is supported for e.g., arithmetic operations. This table lists the builtin functions and their types, and briefly describes their behavior.

Name Type Description
pin ('c -> 'a <-> 'b) -> 'c * 'a <-> 'c * 'b convert a function to a bijection by returning a copy of the first input
inv ('a <-> 'b) -> 'b <-> 'a invert a bijection
run ('a <-> 'b) -> 'a -> 'b run the forward mapping of a bijection
lift ('a -> 'b) -> ('b -> 'a) -> 'a <-> 'b lift a pair of functions forming a bijection (unsafe)
new 'a -> () <-> 'a create a constant bijection whose inverse is well-defined only at the given point
run_weak ('a <-> 'b) -> 'a -> 'b run using a weaker semantics which does not ensure invertibility (experimental)
equal 'a -> 'a -> bool polymorphic equality
add int -> int -> int integer addition
sub int -> int -> int integer subtraction
div int -> int -> int integer division
lt_int int -> int -> bool integer comparison
lt_char char -> char -> char character comparison
int_of_char char -> int char to int conversion
char_of_int int -> char int to char conversion

#Syntactic Sugar

The core syntax of Kalpis is essentially limited to those terms described in the paper and listed the section above (Unidirectional and Invertible Terms and Contexts), but a wide range of syntactic sugar is supported to allow greater programming convenience. Many forms are similar to OCaml's syntax, while others are particular to invertible programming. This section describes the various forms of sugar.

#Function sugar
Sugar Meaning
fun x y z -> ... fun x -> fun y -> fun z -> ...
fun* x y z -> ... fun x -> fun y -> fun* z -> ...
fun p₁ p₂ -> ... fun x y -> match (x, y) with (p₁, p₂) -> ...
fun* p₁ p₂ -> ... fun* x y -> match x with p₁ -> match* y with p₂ -> ...
function {p -> U} fun x -> match x with {p -> U}
function* {p -> I @ U} fun* x -> match* x with {p -> I @ U}
let f x y z = ... let f = fun x y z -> ...
let* f x y z = ... let f = fun* x y z -> ...
#Match sugar
Sugar Meaning
let p = U₁ in U₂ match U₁ with p -> U₂
let* p = I₁ in I₂ match* I₁ with p -> I₂

When a postcondition (@ f) is omitted for an invertible match branch (p -> I), we automatically generate one based on the syntactic shape of the body I. This is done as part of parsing, as an advanced form of syntactic sugar. To be precise, a list of patterns pats_of_expr I is generated from I by inspecting its shape, representing possible return values from executing I. A postcondition is then generated which returns true if its input matches any of the patterns, and false otherwise. The definition of pats_of_expr can be found in src/syntax.ml.

#String literals

There is no builtin string type, but string literals are supported as syntactic sugar for character lists. For example, "hello" corresponds to the sequence ['h';'e';'l';'l';'o'].

#Limitations

Known current limitations of the language implementation are:

  • Only top-level let-bindings are polymorphic and recursive.
  • User-defined binary operators are not supported.

Some possible improvements for more convenient programming:

  • The pin syntax could be made implicit.