Minor fixes in README.md
Add Makefile and instructions for non-interactive usage
Add dune to list of dependencies
Kalpis is a small functional invertible programming language inspired by Sparcl (Matsuda and Wang 2020).
Like Sparcl, Kalpis allows partially invertible programming, which means that invertible computations can be parameterized by unidirectional ones. Unlike Sparcl, Kalpis does not have invertible or linear data types---instead, the language distinguishes between unidirectional and invertible contexts. This makes for a simpler foundation, while retaining key features such as mixing higher-order and invertible programming.
The name Kalpis is an acronym for the title above, alluding to the fact that the language can be given a locally invertible semantics by translation to an arrow combinator language.
The project is written in OCaml, and can be built using dune
.
To start,
opam
first following the instruction in the official site,ocaml
>= 5.0.0, andopam install dune linenoise pp_loc menhir
to install the project dependencies.The project includes a Makefile
which can be used to build the project and evaluate all included examples non-interactively (supposing make
is available on the system).
To use it, simply run make
at the project root.
Alternatively, to run the corresponding commands manually, you can invoke dune build && dune exec kalpis -- examples/*.kalpis
in a bash-like shell.
The commands will output the types and values of all definitions in the example files. In either case, if the exit code is zero, the evaluation finished successfully.
Invoke dune build
(or make build
) at the project root to build the project, and dune exec kalpis
(or make run
) to run the interpreter.
Once you run the second command, you will be presented with a command prompt for the Kalpis REPL.
At the REPL, input arbitrary expressions to evaluate them, or run :l "<filename>"
to load an example.
For example, starting the REPL at the project root and running :l "examples/nat.kalpis"
should give the following output.
Kalpis> :l "examples/nat.kalpis"
addn : nat -> nat <-> nat = <function>
mul : nat -> nat <-> nat = <function>
The function addn
implements partially invertible addition on natural numbers.
You can try it by running the following command in the same session.
Kalpis> let x = run (addn (S (S Z))) (S (S (S Z)))
x : nat = S (S (S (S (S Z))))
Here, addn (S (S Z))
produces a bijection which will add or subtract two to its argument.
The built-in function run : ('a <-> 'b) -> 'a -> 'b
allows running a bijection in the forward direction, in this case on the input three.
To run the inverse, the built-in inv : ('a <-> 'b) -> ('b <-> 'a)
can be used.
Kalpis> run (inv (addn (S (S Z)))) x
- : nat = S (S (S Z))
As you can see, the original input of three reappears.
The syntax of Kalpis is very similar to that of OCaml, with the extra addition of invertible programming constructs.
As an example, consider the function addn
(available in examples/nat.kalpis
), which we define in Kalpis as follows.
type nat = Z | S of nat
let addn : nat -> nat <-> nat =
fun* n m ->
match n with
| Z -> m
| S n' -> S (addn n' <> m)
As showcased above, addn
performs partially invertible addition on natural numbers.
The key differences from an ordinary OCaml program are:
<->
fun*
<>
The arrows associate to the right, so the type of addn
indicates a function taking a nat
to produce a bijection nat <-> nat
.
We construct addn
using the syntax fun* n m -> ...
, which is parsed as fun n -> fun* m -> ...
.
Here, n
is an ordinary function parameter and m
is a bijection parameter---the main difference between the two is that a bijection parameter must be used linearly (exactly once in every branch of the program), and can only appear in invertible contexts (explained in more detail below).
Ordinary function parameters on the other hand may be used any number of times, but only in a unidirectional context.
To apply addn
, we use the syntax addn n' <> m
, which should be parsed as (addn n') <> m
---the symbol <>
denotes applying the bijection addn n'
to the input m
.
Also, note that all (and only) top-level let
-bindings are recursive.
As another example, consider the function autokey
(available in examples/encryption.kalpis
).
let autokey : char -> char list <-> char list =
fun* k l ->
match* l with
| [] -> [] @ function [] -> true | _ -> false
| h :: t ->
let* (h, r) = pin (fun h' -> autokey h') <> (h, t) in
shift (sub (int_of_char k) 65) <> h :: r
@ function [] -> false | _ -> true
This function takes a primer character k
which it interprets as an integer ('A' ↦ 0, 'B' ↦ 1, ..., 'Z' ↦ 25) determining a number of steps to shift the first character of the input.
Each consecutive character is then similarly shifted by the amount given by their predecessor.
For example, here is the result of autokey 'F'
applied to "HELLO"
.
Kalpis> run (autokey 'F') "HELLO"
- : char list = "CXHAD"
The letter F
represents a (cyclic left) shift of 5 characters, mapping H
in the input to C
in the output.
Next, H
represents a shift of 7 characters, mapping E
in the input to X
in the output, and so on.
This example features invertible pattern matching, marked by the keyword match*
.
When matching on the input to a bijection, we must ensure that the inverse execution can also determine which branch to take.
For this purpose, postconditions marked by the symbol @
can be attached to each branch---here, they distinguish the base case where the empty list is returned.
Each postcondition is a boolean function that must return true
for any result of its branch and false
for any result of the branches below it (this is checked at runtime following the symmetric first-match policy of (Yokoyama et al. 2011)).
The inverse execution will check each condition in turn, selecting the first branch whose condition returns true.
Whenever the branches are syntactically distinct (as in autokey
), these postconditions can be omitted.
The syntax let* p = e1 in e2
is short for match* e1 with p -> e2
.
The definition of autokey
uses the built-in pin : ('c -> 'a <-> 'b) -> 'c * 'a <-> 'c * 'b
to allow passing the variable h
(which comes from the invertible input) as (non-invertible) parameter to the recursive call.
This works by returning a copy of h
, which ensures that the same value is available during inverse execution.
The function shift : int -> char <-> char
is defined in examples/encryption.kalpis
, and sub : int -> int -> int
and int_of_char : char -> int
are built-in.
See examples
for more example programs. Some examples are derived from those at https://github.com/kztk-m/sparcl (see below).
This repository implements a prototype of the Kalpis language presented in Sections 2 and 3 of the ESOP24 paper Reconciling Partial and Local Invertibility (Ågren Thuné, Matsuda and Wang 2024). The syntax in the repository is slightly different, closely adhering to OCaml rather than the Haskell-like style of the paper. The table below describes the correspondence between the syntax of the paper and the implementation.
Paper | Implementation |
---|---|
def |
let |
def• |
let* |
sig |
(type signature) |
λx.u |
fun x -> u |
λ•x.r |
fun* x -> r |
u† |
inv u |
u₁ ◆ u₂ |
run u₁ u₂ |
u ◆ r |
u <> r |
pin |
pin |
case ... of |
match ... with |
case• ... of |
match* ... with |
with |
@ |
Formally, the main difference to the paper is that this interpreter implements a lightweight variant of let
-polymorphism.
The repository contains two main folders: src/
and examples/
.
The src
folder contains the (OCaml) implementation of the Kalpis interpreter, which roughly implements the semantics of Section 3 in the paper.
The examples
folder contains example Kalpis programs, including all examples presented or mentioned in Section 2 of the paper.
The tables below describe the contents of each file and list the corresponding sections in the paper, if any.
src/
File | Description |
---|---|
builtin.ml |
builtin function types and definitions |
eval.ml |
operational semantics (Section 3.5) |
interpreter.ml |
main interpreter/REPL machinery |
kalpis.ml |
interpreter entrypoint |
lexer.mll |
lexer definition |
message.ml |
error message machinery |
parser.mly |
parser definition |
syntax.ml |
syntax definitions (Sections 3.1, 3.2) |
types.ml |
type checking (Section 3.4) |
value.ml |
value definitions (Section 3.5) |
examples/
File | Description |
---|---|
addlsb.kalpis |
binary addition |
encryption.kalpis |
Caesar, Vigenère, and Autokey ciphers (Sections 1, 2.3, 2.4) |
fib.kalpis |
Fibonacci numbers (Section 2.2) |
huff.kalpis |
Huffman coding (mentioned in Section 2.4) |
loop.kalpis |
invertible looping (trace combinator) |
lz.kalpis |
LZ-style sliding-window compression (mentioned in Section 2.4) |
nat.kalpis |
natural number addition and multiplication (Section 2.1) |
pi.kalpis |
encoding of trees using pre- and inorder traversals |
reverse.kalpis |
multiple implementations of list reversal |
runlength.kalpis |
run-length encoding |
subs.kalpis |
consecutive differences |
The examples addlsb
, fib
, huff
, loop
, pi
, reverse
, and subs
were adapted from the existing Sparcl implementation. The example lz
also has a Sparcl counterpart; both are given simultaneously by the same authors.
A core concept to be able to program in Kalpis is that of unidirectional and invertible terms and contexts.
This is what allows the language to ensure that all well-typed bijections can be inverted.
Some terms, like function abstractions (fun x -> ...
), bijection abstractions (fun* x -> ...
), and function applications (f x
) are unidirectional and must only be used in unidirectional contexts.
For example, the body of a function fun x -> ...
is unidirectional, whereas the body of a bijection fun* x -> ...
is invertible.
This distinction is part of the syntax of the language, so for example the bijection fun* x -> x 0
is not a type error, but a syntax error.
A variable introduced by an ordinary function binder fun x -> ...
can only be used in a unidirectional context, whereas a variable introduced by a bijection binder fun* x -> ...
may only (and must) be used in an invertible context.
This is not to say that unidirectional variables cannot appear inside invertible computations at all---a key point is that unidirectional contexts are embedded inside some invertible terms, such as the left-hand side of a bijection application f <> x
.
This allows referring to previously defined bijections, and applying partially invertible functions, like we saw with addn n' <> m
in the introduction above.
The default interpreter context is unidirectional; this is why bijections must be applied using the run
primitive at the top level---the syntax f <> x
is only valid in an invertible context.
One thing to note is that unidirectional and invertible variables live in distinct namespaces, so attempting to write the function fun* x y -> (x, y)
(where x
is unidirectional) will give an error Reference to unbound reversible variable x
, rather than complaining that x
is used in an invertible context.
The table below describes unidirectional and invertible terms in detail. The terms of the language are listed, with each subterm denoted U or I depending on whether it is a unidirectional or invertible context, and the term itself similarly marked depending on whether it is unidirectional or invertible. The metavariable X is used to denote terms which can be used either invertibly or unidirectionally.
Term | U / I |
---|---|
x (unidirectional variable) |
U |
fun x -> U |
U |
fun* x -> I |
U |
U₁ U₂ |
U |
match U with {p -> U} |
U |
x (invertible variable) |
I |
U <> I |
I |
match U with {p -> I} |
I |
match* I with {p -> I @ U} |
I |
(X₁, ..., Xₙ) |
X |
Con X |
X |
[] |
X |
X₁ :: X₂ |
X |
(integer) |
X |
(character) |
X |
(boolean) |
X |
It may seem surprising at first that bijections fun* x -> ...
are unidirectional, but while the result of the program fun* x -> ...
is an invertible function, the program itself is not necessarily invertible.
For example, if bijections were treated as invertible, the program let f = fun* x -> fun* y -> (x, y)
would be (syntactically) valid, but its type would have to be 'a <-> ('b <-> ('a * 'b))
.
Then, the inverse execution of f
would have to be able to extract an 'a
value from an arbitrary bijection of type 'b <-> 'a * 'b
, which would require a more advanced semantics if at all possible.
Kalpis currently features a limited amount of builtin functions, defined in src/builtin.ml
.
No special syntax is supported for e.g., arithmetic operations.
This table lists the builtin functions and their types, and briefly describes their behavior.
Name | Type | Description |
---|---|---|
pin |
('c -> 'a <-> 'b) -> 'c * 'a <-> 'c * 'b |
convert a function to a bijection by returning a copy of the first input |
inv |
('a <-> 'b) -> 'b <-> 'a |
invert a bijection |
run |
('a <-> 'b) -> 'a -> 'b |
run the forward mapping of a bijection |
lift |
('a -> 'b) -> ('b -> 'a) -> 'a <-> 'b |
lift a pair of functions forming a bijection (unsafe) |
new |
'a -> () <-> 'a |
create a constant bijection whose inverse is well-defined only at the given point |
run_weak |
('a <-> 'b) -> 'a -> 'b |
run using a weaker semantics which does not ensure invertibility (experimental) |
equal |
'a -> 'a -> bool |
polymorphic equality |
add |
int -> int -> int |
integer addition |
sub |
int -> int -> int |
integer subtraction |
div |
int -> int -> int |
integer division |
lt_int |
int -> int -> bool |
integer comparison |
lt_char |
char -> char -> char |
character comparison |
int_of_char |
char -> int |
char to int conversion |
char_of_int |
int -> char |
int to char conversion |
The core syntax of Kalpis is essentially limited to those terms described in the paper and listed the section above (Unidirectional and Invertible Terms and Contexts), but a wide range of syntactic sugar is supported to allow greater programming convenience. Many forms are similar to OCaml's syntax, while others are particular to invertible programming. This section describes the various forms of sugar.
Sugar | Meaning |
---|---|
fun x y z -> ... |
fun x -> fun y -> fun z -> ... |
fun* x y z -> ... |
fun x -> fun y -> fun* z -> ... |
fun p₁ p₂ -> ... |
fun x y -> match (x, y) with (p₁, p₂) -> ... |
fun* p₁ p₂ -> ... |
fun* x y -> match x with p₁ -> match* y with p₂ -> ... |
function {p -> U} |
fun x -> match x with {p -> U} |
function* {p -> I @ U} |
fun* x -> match* x with {p -> I @ U} |
let f x y z = ... |
let f = fun x y z -> ... |
let* f x y z = ... |
let f = fun* x y z -> ... |
Sugar | Meaning |
---|---|
let p = U₁ in U₂ |
match U₁ with p -> U₂ |
let* p = I₁ in I₂ |
match* I₁ with p -> I₂ |
When a postcondition (@ f
) is omitted for an invertible match branch (p -> I
), we automatically generate one based on the syntactic shape of the body I
.
This is done as part of parsing, as an advanced form of syntactic sugar.
To be precise, a list of patterns pats_of_expr I
is generated from I
by inspecting its shape, representing possible return values from executing I
.
A postcondition is then generated which returns true
if its input matches any of the patterns, and false
otherwise.
The definition of pats_of_expr
can be found in src/syntax.ml
.
There is no builtin string type, but string literals are supported as syntactic sugar for character lists.
For example, "hello"
corresponds to the sequence ['h';'e';'l';'l';'o']
.
Known current limitations of the language implementation are:
Some possible improvements for more convenient programming:
pin
syntax could be made implicit.