A partially invertible functional language with locally invertible semantics

Minor fixes in README.md

Add Makefile and instructions for non-interactive usage

Add dune to list of dependencies

- read-only
- https://git.sr.ht/~aathn/kalpis
- read/write
- git@git.sr.ht:~aathn/kalpis

Kalpis is a small functional invertible programming language inspired by Sparcl (Matsuda and Wang 2020).

Like Sparcl, Kalpis allows partially invertible programming, which means that invertible computations can be parameterized by unidirectional ones. Unlike Sparcl, Kalpis does not have invertible or linear data types---instead, the language distinguishes between unidirectional and invertible contexts. This makes for a simpler foundation, while retaining key features such as mixing higher-order and invertible programming.

The name Kalpis is an acronym for the title above, alluding to the fact that the language can be given a locally invertible semantics by translation to an arrow combinator language.

The project is written in OCaml, and can be built using `dune`

.
To start,

- install
`opam`

first following the instruction in the official site, - check if you are using
`ocaml`

>= 5.0.0, and - run
`opam install dune linenoise pp_loc menhir`

to install the project dependencies.

The project includes a `Makefile`

which can be used to build the project and evaluate all included examples non-interactively (supposing `make`

is available on the system).
To use it, simply run `make`

at the project root.

Alternatively, to run the corresponding commands manually, you can invoke `dune build && dune exec kalpis -- examples/*.kalpis`

in a bash-like shell.

The commands will output the types and values of all definitions in the example files. In either case, if the exit code is zero, the evaluation finished successfully.

Invoke `dune build`

(or `make build`

) at the project root to build the project, and `dune exec kalpis`

(or `make run`

) to run the interpreter.

Once you run the second command, you will be presented with a command prompt for the Kalpis REPL.
At the REPL, input arbitrary expressions to evaluate them, or run `:l "<filename>"`

to load an example.
For example, starting the REPL at the project root and running `:l "examples/nat.kalpis"`

should give the following output.

```
Kalpis> :l "examples/nat.kalpis"
addn : nat -> nat <-> nat = <function>
mul : nat -> nat <-> nat = <function>
```

The function `addn`

implements partially invertible addition on natural numbers.
You can try it by running the following command in the same session.

```
Kalpis> let x = run (addn (S (S Z))) (S (S (S Z)))
x : nat = S (S (S (S (S Z))))
```

Here, `addn (S (S Z))`

produces a bijection which will add or subtract two to its argument.
The built-in function `run : ('a <-> 'b) -> 'a -> 'b`

allows running a bijection in the forward direction, in this case on the input three.

To run the inverse, the built-in `inv : ('a <-> 'b) -> ('b <-> 'a)`

can be used.

```
Kalpis> run (inv (addn (S (S Z)))) x
- : nat = S (S (S Z))
```

As you can see, the original input of three reappears.

The syntax of Kalpis is very similar to that of OCaml, with the extra addition of invertible programming constructs.
As an example, consider the function `addn`

(available in `examples/nat.kalpis`

), which we define in Kalpis as follows.

```
type nat = Z | S of nat
let addn : nat -> nat <-> nat =
fun* n m ->
match n with
| Z -> m
| S n' -> S (addn n' <> m)
```

As showcased above, `addn`

performs partially invertible addition on natural numbers.

The key differences from an ordinary OCaml program are:

- the bijection type
`<->`

- bijection construction
`fun*`

- bijection application
`<>`

The arrows associate to the right, so the type of `addn`

indicates a function taking a `nat`

to produce a bijection `nat <-> nat`

.
We construct `addn`

using the syntax `fun* n m -> ...`

, which is parsed as `fun n -> fun* m -> ...`

.
Here, `n`

is an ordinary function parameter and `m`

is a bijection parameter---the main difference between the two is that a bijection parameter must be used linearly (exactly once in every branch of the program), and can only appear in *invertible* contexts (explained in more detail below).
Ordinary function parameters on the other hand may be used any number of times, but only in a unidirectional context.
To apply `addn`

, we use the syntax `addn n' <> m`

, which should be parsed as `(addn n') <> m`

---the symbol `<>`

denotes applying the bijection `addn n'`

to the input `m`

.

Also, note that all (and only) top-level `let`

-bindings are recursive.

As another example, consider the function `autokey`

(available in `examples/encryption.kalpis`

).

```
let autokey : char -> char list <-> char list =
fun* k l ->
match* l with
| [] -> [] @ function [] -> true | _ -> false
| h :: t ->
let* (h, r) = pin (fun h' -> autokey h') <> (h, t) in
shift (sub (int_of_char k) 65) <> h :: r
@ function [] -> false | _ -> true
```

This function takes a primer character `k`

which it interprets as an integer ('A' ↦ 0, 'B' ↦ 1, ..., 'Z' ↦ 25) determining a number of steps to shift the first character of the input.
Each consecutive character is then similarly shifted by the amount given by their predecessor.
For example, here is the result of `autokey 'F'`

applied to `"HELLO"`

.

```
Kalpis> run (autokey 'F') "HELLO"
- : char list = "CXHAD"
```

The letter `F`

represents a (cyclic left) shift of 5 characters, mapping `H`

in the input to `C`

in the output.
Next, `H`

represents a shift of 7 characters, mapping `E`

in the input to `X`

in the output, and so on.

This example features invertible pattern matching, marked by the keyword `match*`

.
When matching on the input to a bijection, we must ensure that the inverse execution can also determine which branch to take.
For this purpose, postconditions marked by the symbol `@`

can be attached to each branch---here, they distinguish the base case where the empty list is returned.
Each postcondition is a boolean function that must return `true`

for any result of its branch and `false`

for any result of the branches below it (this is checked at runtime following the symmetric first-match policy of (Yokoyama et al. 2011)).
The inverse execution will check each condition in turn, selecting the first branch whose condition returns true.
Whenever the branches are syntactically distinct (as in `autokey`

), these postconditions can be omitted.

The syntax `let* p = e1 in e2`

is short for `match* e1 with p -> e2`

.
The definition of `autokey`

uses the built-in `pin : ('c -> 'a <-> 'b) -> 'c * 'a <-> 'c * 'b`

to allow passing the variable `h`

(which comes from the invertible input) as (non-invertible) parameter to the recursive call.
This works by returning a copy of `h`

, which ensures that the same value is available during inverse execution.
The function `shift : int -> char <-> char`

is defined in `examples/encryption.kalpis`

, and `sub : int -> int -> int`

and `int_of_char : char -> int`

are built-in.

See `examples`

for more example programs. Some examples are derived from those at https://github.com/kztk-m/sparcl (see below).

This repository implements a prototype of the Kalpis language presented in Sections 2 and 3 of the ESOP24 paper *Reconciling Partial and Local Invertibility* (Ågren Thuné, Matsuda and Wang 2024).
The syntax in the repository is slightly different, closely adhering to OCaml rather than the Haskell-like style of the paper.
The table below describes the correspondence between the syntax of the paper and the implementation.

Paper | Implementation |
---|---|

`def` |
`let` |

`def•` |
`let*` |

`sig` |
(type signature) |

`λx.u` |
`fun x -> u` |

`λ•x.r` |
`fun* x -> r` |

`u†` |
`inv u` |

`u₁ ◆ u₂` |
`run u₁ u₂` |

`u ◆ r` |
`u <> r` |

`pin` |
`pin` |

`case ... of` |
`match ... with` |

`case• ... of` |
`match* ... with` |

`with` |
`@` |

Formally, the main difference to the paper is that this interpreter implements a lightweight variant of `let`

-polymorphism.

The repository contains two main folders: `src/`

and `examples/`

.
The `src`

folder contains the (OCaml) implementation of the Kalpis interpreter, which roughly implements the semantics of Section 3 in the paper.
The `examples`

folder contains example Kalpis programs, including all examples presented or mentioned in Section 2 of the paper.
The tables below describe the contents of each file and list the corresponding sections in the paper, if any.

`src/`

File | Description |
---|---|

`builtin.ml` |
builtin function types and definitions |

`eval.ml` |
operational semantics (Section 3.5) |

`interpreter.ml` |
main interpreter/REPL machinery |

`kalpis.ml` |
interpreter entrypoint |

`lexer.mll` |
lexer definition |

`message.ml` |
error message machinery |

`parser.mly` |
parser definition |

`syntax.ml` |
syntax definitions (Sections 3.1, 3.2) |

`types.ml` |
type checking (Section 3.4) |

`value.ml` |
value definitions (Section 3.5) |

`examples/`

File | Description |
---|---|

`addlsb.kalpis` |
binary addition |

`encryption.kalpis` |
Caesar, Vigenère, and Autokey ciphers (Sections 1, 2.3, 2.4) |

`fib.kalpis` |
Fibonacci numbers (Section 2.2) |

`huff.kalpis` |
Huffman coding (mentioned in Section 2.4) |

`loop.kalpis` |
invertible looping (`trace` combinator) |

`lz.kalpis` |
LZ-style sliding-window compression (mentioned in Section 2.4) |

`nat.kalpis` |
natural number addition and multiplication (Section 2.1) |

`pi.kalpis` |
encoding of trees using pre- and inorder traversals |

`reverse.kalpis` |
multiple implementations of list reversal |

`runlength.kalpis` |
run-length encoding |

`subs.kalpis` |
consecutive differences |

The examples `addlsb`

, `fib`

, `huff`

, `loop`

, `pi`

, `reverse`

, and `subs`

were adapted from the existing Sparcl implementation. The example `lz`

also has a Sparcl counterpart; both are given simultaneously by the same authors.

A core concept to be able to program in Kalpis is that of unidirectional and invertible terms and contexts.
This is what allows the language to ensure that all well-typed bijections can be inverted.
Some terms, like function abstractions (`fun x -> ...`

), bijection abstractions (`fun* x -> ...`

), and function applications (`f x`

) are unidirectional and must only be used in unidirectional contexts.
For example, the body of a function `fun x -> ...`

is unidirectional, whereas the body of a bijection `fun* x -> ...`

is invertible.
This distinction is part of the syntax of the language, so for example the bijection `fun* x -> x 0`

is not a type error, but a syntax error.
A variable introduced by an ordinary function binder `fun x -> ...`

can only be used in a unidirectional context, whereas a variable introduced by a bijection binder `fun* x -> ...`

may only (and must) be used in an invertible context.

This is not to say that unidirectional variables cannot appear inside invertible computations at all---a key point is that unidirectional contexts are embedded inside some invertible terms, such as the left-hand side of a bijection application `f <> x`

.
This allows referring to previously defined bijections, and applying partially invertible functions, like we saw with `addn n' <> m`

in the introduction above.

The default interpreter context is unidirectional; this is why bijections must be applied using the `run`

primitive at the top level---the syntax `f <> x`

is only valid in an invertible context.
One thing to note is that unidirectional and invertible variables live in distinct namespaces, so attempting to write the function `fun* x y -> (x, y)`

(where `x`

is unidirectional) will give an error `Reference to unbound reversible variable x`

, rather than complaining that `x`

is used in an invertible context.

The table below describes unidirectional and invertible terms in detail. The terms of the language are listed, with each subterm denoted U or I depending on whether it is a unidirectional or invertible context, and the term itself similarly marked depending on whether it is unidirectional or invertible. The metavariable X is used to denote terms which can be used either invertibly or unidirectionally.

Term | U / I |
---|---|

`x (unidirectional variable)` |
U |

`fun x -> U` |
U |

`fun* x -> I` |
U |

`U₁ U₂` |
U |

`match U with {p -> U}` |
U |

`x (invertible variable)` |
I |

`U <> I` |
I |

`match U with {p -> I}` |
I |

`match* I with {p -> I @ U}` |
I |

`(X₁, ..., Xₙ)` |
X |

`Con X` |
X |

`[]` |
X |

`X₁ :: X₂` |
X |

`(integer)` |
X |

`(character)` |
X |

`(boolean)` |
X |

It may seem surprising at first that bijections `fun* x -> ...`

are unidirectional, but while the *result* of the program `fun* x -> ...`

is an invertible function, the program itself is not necessarily invertible.
For example, if bijections were treated as invertible, the program `let f = fun* x -> fun* y -> (x, y)`

would be (syntactically) valid, but its type would have to be `'a <-> ('b <-> ('a * 'b))`

.
Then, the inverse execution of `f`

would have to be able to extract an `'a`

value from an arbitrary bijection of type `'b <-> 'a * 'b`

, which would require a more advanced semantics if at all possible.

Kalpis currently features a limited amount of builtin functions, defined in `src/builtin.ml`

.
No special syntax is supported for e.g., arithmetic operations.
This table lists the builtin functions and their types, and briefly describes their behavior.

Name | Type | Description |
---|---|---|

`pin` |
`('c -> 'a <-> 'b) -> 'c * 'a <-> 'c * 'b` |
convert a function to a bijection by returning a copy of the first input |

`inv` |
`('a <-> 'b) -> 'b <-> 'a` |
invert a bijection |

`run` |
`('a <-> 'b) -> 'a -> 'b` |
run the forward mapping of a bijection |

`lift` |
`('a -> 'b) -> ('b -> 'a) -> 'a <-> 'b` |
lift a pair of functions forming a bijection (unsafe) |

`new` |
`'a -> () <-> 'a` |
create a constant bijection whose inverse is well-defined only at the given point |

`run_weak` |
`('a <-> 'b) -> 'a -> 'b` |
run using a weaker semantics which does not ensure invertibility (experimental) |

`equal` |
`'a -> 'a -> bool` |
polymorphic equality |

`add` |
`int -> int -> int` |
integer addition |

`sub` |
`int -> int -> int` |
integer subtraction |

`div` |
`int -> int -> int` |
integer division |

`lt_int` |
`int -> int -> bool` |
integer comparison |

`lt_char` |
`char -> char -> char` |
character comparison |

`int_of_char` |
`char -> int` |
`char` to `int` conversion |

`char_of_int` |
`int -> char` |
`int` to `char` conversion |

The core syntax of Kalpis is essentially limited to those terms described in the paper and listed the section above (*Unidirectional and Invertible Terms and Contexts*), but a wide range of syntactic sugar is supported to allow greater programming convenience.
Many forms are similar to OCaml's syntax, while others are particular to invertible programming.
This section describes the various forms of sugar.

Sugar | Meaning |
---|---|

`fun x y z -> ...` |
`fun x -> fun y -> fun z -> ...` |

`fun* x y z -> ...` |
`fun x -> fun y -> fun* z -> ...` |

`fun p₁ p₂ -> ...` |
`fun x y -> match (x, y) with (p₁, p₂) -> ...` |

`fun* p₁ p₂ -> ...` |
`fun* x y -> match x with p₁ -> match* y with p₂ -> ...` |

`function {p -> U}` |
`fun x -> match x with {p -> U}` |

`function* {p -> I @ U}` |
`fun* x -> match* x with {p -> I @ U}` |

`let f x y z = ...` |
`let f = fun x y z -> ...` |

`let* f x y z = ...` |
`let f = fun* x y z -> ...` |

Sugar | Meaning |
---|---|

`let p = U₁ in U₂` |
`match U₁ with p -> U₂` |

`let* p = I₁ in I₂` |
`match* I₁ with p -> I₂` |

When a postcondition (`@ f`

) is omitted for an invertible match branch (`p -> I`

), we automatically generate one based on the syntactic shape of the body `I`

.
This is done as part of parsing, as an advanced form of syntactic sugar.
To be precise, a list of patterns `pats_of_expr I`

is generated from `I`

by inspecting its shape, representing possible return values from executing `I`

.
A postcondition is then generated which returns `true`

if its input matches any of the patterns, and `false`

otherwise.
The definition of `pats_of_expr`

can be found in `src/syntax.ml`

.

There is no builtin string type, but string literals are supported as syntactic sugar for character lists.
For example, `"hello"`

corresponds to the sequence `['h';'e';'l';'l';'o']`

.

Known current limitations of the language implementation are:

- Only top-level let-bindings are polymorphic and recursive.
- User-defined binary operators are not supported.

Some possible improvements for more convenient programming:

- The
`pin`

syntax could be made implicit.