~subsetpark/subsetpark

7120767fd0192904825296a9c05e6af5fd07ffe4 — Zach Smith 11 months ago 16a9b30
2 first drafts
4 files changed, 497 insertions(+), 8 deletions(-)

M .gitignore
D all.do
A posts/janet_c.md
A posts/lisp-syntax.md
M .gitignore => .gitignore +2 -1
@@ 26,4 26,5 @@ subsetpark-*.tar
/site/

# Local Netlify folder
.netlify
\ No newline at end of file
.netlify
/tag

D all.do => all.do +0 -7
@@ 1,7 0,0 @@
redo-ifchange ../whist/whist index.janet pages/* posts/* temple/* front.md assets/* notes/*

if [ -f "../whist/whist.html" ]; then
    mv ../whist/whist.html root/whist.html
fi

bag index.janet

A posts/janet_c.md => posts/janet_c.md +285 -0
@@ 0,0 1,285 @@
{:title "Writing a Janet Module in C"
 :date "2020-08-17"
 :status :draft}

%%%

After a long time managing to stay at a safe, high level in my adventures with
[Janet][janet], I've decided it's finally time for me to descend into the world
of [native modules][native]. To quote the manual:

[janet]: https://janet-lang.org/
[native]: https://janet-lang.org/capi/index.html

> One of the fundamental design goals of Janet is to extend it via the C
> programming language, as well as embed the interpreter in a larger program.
> To this end, Janet exposes most of its low level functionality in well
> defined C API. 

What this means is that Janet has a robust and extensive C API that you can use
to write C code that can be imported as a Janet function, and jpm, the Janet
build tool, has the ability to compile that code and make it available in a
pure-Janet context. This has two main applications:

- Implementation of algorithms or data structures with a finer grain of
  control, and better performance, than if they were written in Janet;
- Making any existing C-compatible library available to the Janet ecosystem.

This second application is particularly common. The [sqlite][], [postgres][],
and [markdown][] libraries are all written this way. And now it's my turn.
Ultimately, I'd like to write bindings for Janet to the [J Programming
Language][j], using the same native module technique. First, though, I have to
remember how to write C and write a toy Janet module.

[sqlite]: https://github.com/janet-lang/sqlite3
[postgres]: https://github.com/andrewchambers/janet-pq
[markdown]: https://github.com/pyrmont/markable
[j]: https://www.jsoftware.com/#/

## a simple Janet C module

There's a fair amount of documentation in the Janet manual about writing a C
module, including some reference code to start with. I figured it would be
worthwhile to get a *little* more detailed, recording the things I've learned
hacking a module together, for those who---like me---never write code as
low-level as C, or with manual memory management, in their ordinary programming
practice.

The module will expose a single function, `concat`, which will join two strings
together. This is available in the standard library of course, but I found it
to be a good demo function since it includes three major areas: 

- handling arguments from, and returning them to, Janet
- ordinary procedural business logic
- memory allocation

## programming environment

I don't have a terrifically built-out C programming environment. However, one
feature I do have set up is the [clangd][] language server. It's a little out
of scope of this article to explain the language server protocol, but if you're
starting from zero I think clangd is a good starting point. In particular, it
automatically compiles my C file and lints my buffer with the output. That
makes for a very rapid feedback loop. 

## concat.c

### including the header

```plaintext
#include <janet.h>
```

The `janet.h` header file exposes the entire Janet C API: all the functions for
handling Janet datatypes, exposing C functions as Janet functions, managing
garbage collection, et cetera.

By default this will produce a compile error. When building with jpm, the build
tool will ensure the correct compiler flags are set so that that header is
available. However, to get clangd to be able to compile, you need to point it
to the location of the file. I added a file `compile_flags.txt`, which clangd
will recognize, with the contents:

```plaintext
-I/usr/include/janet
```

Thus when it compiles `concat.c`, it will automatically include Janet's headers directory.

### defining the `concat` function

```c
static Janet concat(int32_t argc, Janet *argv) 
```

This is the declaration for the definition of the function `concat`. It's
important to understand the function signature as all Janet functions defined
in C will have the same signature.

`concat` takes two arguments:

- `argc`, an integer containing the number of arguments it was called with;
- `argv`, a Janet array containing all of its arguments.

## Janet types in C

To quote the manual:

> Janet has several built-in fundamental data types. These data types are the
> bread and butter of the Janet C API, and most functions and macros in the API
> expect at least some of their arguments to be of these types. However, as
> Janet is a dynamically typed language, Janet internally uses a boxed
> representation of these types, simply called `Janet`. 

To elaborate, we can say that under the hood, there is a *three-part continuum*
of types spanning from pure C to pure Janet.

At the bottom are the pure C types; to perform the basic underlying business
logic of your function, you will use ordinary C logic. For instance, since
we're dealing with strings, we'll be dealing with the `uint8_t *` type. 

In the middle are the `Janet<foo>` types; `JanetArray`, `JanetKV`, et cetera.
For complex data structures like arrays and tables, those types are C structs.
For simpler data structures that can be directly represented in C, those types
are aliases to raw C types. Strings are of the simpler case; `JanetString` is
an alias to `const uint8_t *`. However, it's useful to think of this level as a
consistent set of types, because all the business logic functions exposed in
the Janet C API target these types. And because a `JanetString` is `const`,
it's useful to have a distinction between mutable `uint8_t *` and `const
uint8_t *`.

At the top is the `Janet` type. Everything going in and out of the C interface
will be of this type. It's obviously more difficult to work on the underlying
values inside of the C, but it performs the crucial translation between the
static typing of C and the dynamic typing of Janet.

In other words, there is a distinct purpose for each level:

- C types: all Janet-agnostic business logic/interoperation with other
  compatible libraries
- `Janet*` types: C representations of each Janet data type with accompanying
  type-specific business logic
- `Janet`: receiving and returning arguments between Janet and C

## `concat`

Thus, `argv` is a `Janet`, though conventionally we know it will contain a
`JanetArray`.

The return type of `concat` is also `Janet`, because we'll box our return value
after we've constructed it.

```c
{
  janet_fixarity(argc, 2);
  JanetString s1 = janet_getstring(argv, 0);
  JanetString s2 = janet_getstring(argv, 1);

...
```

There's some boilerplate in the argument handling. Because we should always
have two arguments, we call `janet_fixarity` to validate that it was called
with two arguments and will fail otherwise.

We then call `janet_getstring`, which will do the work of unwrapping `argv`,
getting an element, and then unwrapping that and validating its type. At this
point we have moved below the Janet layer into the static types of the Janet C
API.

```c
...

  int s1_length = janet_string_length(s1);
  int s2_length = janet_string_length(s2);
  int32_t strlength = s1_length + s2_length;

...
```

To get the total length of the result of concatenation, we'll just add the size
of the two arguments together. Having unwrapped the arguments, we can use the
Janet datatype business logic exposed in `janet.h`, in this case the function
`janet_string_length()`.

```c
...

  char *newstring = janet_smalloc(strlength);

...
```

To create a new string, we allocate one on the heap.[^vla] 

[^vla]: Here's something I didn't know about C; modern C has support for
  dynamically allocating on the stack as well as the heap.
  \
  \
  If I had written `char newstring[strlength];` instead, the C compiler would
  have created a [variable-length
  array](https://en.wikipedia.org/wiki/Variable-length_array) and it would have
  allocated a new string to the stack (with all of the attendant hazards and
  benefits of stack allocation).
  \
  \
  My thanks to [Andrew Chambers](https://github.com/andrewchambers) for that
  info!

It's important to remember that any term produced in pure Janet is subject to
garbage collection and thus doesn't have to be manually allocated or freed. In
the C API, the garbage collector doesn't, by default, have any awareness of
memory allocated with the built-in function `malloc`, and thus will leak if not
manually freed. `janet_smalloc` (the `s` stands for "scratch") behaves just
like `malloc` *except* it registers the allocated memory with the GC, ensuring
it will be freed[^gc]. Otherwise, allocating the new string is performed the
same as in vanilla C.

[^gc]: There are some details about *when* a C-allocated piece of memory will
  be freed that aren't relevant to simple invocations like this, but in C
  functions that might call other Janet functions, there may well be more
  detailed memory management necessary.

```c
...

  int i;
  for (i = 0; i < s1_length; i++) {
    newstring[i] = s1[i];
  }

  for (i = 0; i < s2_length; i++) {
    newstring[i + s1_length] = s2[i];
  }

...
```

Here we actually populate the new string. We are "below" the Janet datatypes'
level here, working with normal C strings.

```c
...

  JanetString res = janet_string((const uint8_t *)newstring, strlength);

...
```

Once the string has been constructed, we prepare the return value by
"ascending" the type layers. `janet_string()` takes a `const uint_ *` and returns
a JanetString: this is a bit confusing, because those are the same type.
However, `janet_string()` also does a bunch of background memory management and
hashing and stuff. This is why it's good to maintain a conceptual distinction
between `const uint8_t *` and `JanetString`.

This also includes a manual cast of `newstring` to a `const` type.  This is
safe for us to do as we know we won't edit the string again after that cast.

```c
...

  janet_sfree(newstring);

...
```

`janet_smalloc()` has a companion function, `janet_sfree()`, which frees the
allocated memory. I don't believe this line is strictly necessary, because the
memory will be GCed, but we can make the GC's job a little easier and be
slightly more efficient by manually freeing it once we know we're finished with
`newstring`. The creation of the JanetString allocates new memory, so
`newstring` is not going to be used after that.

```c
...

  Janet wrapped = janet_wrap_string(res);
  return wrapped;
}
```

`janet_wrap_string()` goes from the middle layer to the top layer, `Janet`,
boxing the static type with a dynamic one.

Finally, we return the boxed element.

A posts/lisp-syntax.md => posts/lisp-syntax.md +210 -0
@@ 0,0 1,210 @@
{:title "lisp syntax"
:status :draft
:date "2021-09-01"}

%%%

Who cares about Lisp, exactly? Many different groups, apparently. Lisp---the
family of languages in some way diachronically related to the language, LISP,
or synchronically built around s-expressions[^sexps]---continues to inspire
thumbsuckers, as my old music journalism professor taught me to call them. When
you wrote a column, you had to allow yourself only a certain abstemious ratio
of thumbsuckers to actual, bread and butter news and reviews. Of course, I was
only interested in the conceptual stuff. I had absolutely no journalistic
gumption. Hence, I am a Lisp programmer.

[^sexps]: The first heresy!

One of the problems is quite simply that Lisp is a complex thing with many
interesting bits to it. We've already touched on the fact that it is *at least*
two things. These two things are already more than enough source of confusion:
at this point you don't know if I'm going to wax poetic about John McCarthy's
singular primeval genius, or if I'm going to try to convince you that all
modern business programming is properly done in Clojure.

Let's be more specific. Here are some of the reasons why a reasonable person on
the internet might be convinced that Lisp is Important and Worth Understanding:

- The singularity and genius of the
  [artifact](http://jmc.stanford.edu/articles/lisp.html) that was invented by
  McCarthy in 1958;
- The realization of the [Lambda
  Calculus](http://www.cs.cornell.edu/courses/cs312/2008sp/recitations/rec26.html)
  in computer program form;
- The conceptual integrity of the [Metacircular
  Evaluator](https://mitpress.mit.edu/sites/default/files/sicp/full-text/sicp/book/node76.html)
  and the attendent minimalism of the language it requires/implements;
- The unreasonable effectiveness of [macros](https://letoverlambda.com/);
- [Lisp Machines](https://wiki.c2.com/?LispMachine);
- The
  [various](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science))
  [future](https://lispcast.com/what-are-first-class-functions/)
  [technologies](https://mikelevins.github.io/posts/2020-12-18-repl-driven/)
  [rattling](https://www.adamtornhill.com/reviews/amop.htm)
  [around](https://www.reddit.com/r/lisp/comments/go3dzr/what_is_image_based_programming/)
  the Steel Bank Common Lisp implementation, which are still being rediscovered
  by other, newer languages, and of which some will remain forever undiscovered
  except by those Lispers who have been using them daily since 1980;
- The practice of [writing
  languages](https://beautifulracket.com/appendix/domain-specific-languages.html)
  to solve problems rather than simply writing programs;
- et al.

The point is that none of these things are *essentially related*. They're
related by virtue of being found in the same language, or family of language,
yes; and many of them share in common the features of that language which make
them possible. But no two of them rely on exactly the same set of features. Let
alone the same essential, mystical Lispiness.

So let me pick a particular slice to talk about. I want to stress that I am not
articulating the *soul of Lisp*; I'm carving out a particular bit, which bit
will surely be related to other bits, make them more possible, make them more
ergonomic, et cetera.

# The Veil of Syntax

> The reliable facts are not that which is subjective, not that which is
> objective, not a mixture of the two, but never unreasonable at all.
>
> Phenomena are just recognized as they are, and existence as we conceive it
> never exists anywhere; it is just nonexistence.

- Nagarjuna, *Fundamental Wisdom of the Middle Way*, tr. Gudo Wafu Nishijima

I have to continually resist the urge to essentialism. I nearly began this
section, "the central insight of Lisp is..."

Instead: one of the insights that will rattle around your head after you've
been using Lisp for a while, and change the way you experience other
programming language is: **syntax** is a largely unuseful fiction in
programming.

## ASTs

Those of us who have ever tried to reach a little ways into the guts of the
programming languages we use have probably encountered an *Abstract Syntax
Tree*.

The Abstract Syntax Tree, or AST, is a data structure describing a tree-like
structure which relates the different words in the source code of a program to
each other. Let us observe some:

*A Python program*

```python
def id(x):
  return x

```

```python
>>> import ast
>>> print(ast.dump(ast.parse("def id(x):\n  return x\n")))
Module(body=[FunctionDef(name='id', 
                         args=arguments(posonlyargs=[], 
                                        args=[arg(arg='x')], 
                                        kwonlyargs=[], 
                                        kw_defaults=[], 
                                        defaults=[]), 
                         body=[Return(value=Name(id='x', ctx=Load()))], 
                         decorator_list=[])], 
       type_ignores=[])
>>>
```

*An Elixir program*

```elixir
def id(x) do
  x
end
```

```elixir
iex(1)> quote do
...(1)>   def id(x) do
...(1)>     x
...(1)>   end
...(1)> end
{:def, [context: Elixir, import: Kernel],
 [{:id, [context: Elixir], [{:x, [], Elixir}]}, [do: {:x, [], Elixir}]]}
iex(2)>
```

Above, there are four things:
- Two examples of *program syntax*, one for a Python program that defines the
  identity function and one for an Elixir program that does the same;
- Two examples of *ASTs*, the representation of those programs using the data
  structures of the language to store terms representing the tokens (I'm going
  to keep saying "words" for simplicity) in a relationship to each other that
  mirrors the way the words relate to each other syntactically.

The syntaxes of the two languages are a little different. The ASTs are quite
different; Python uses objects, instances of specialized classes, and Elixir
uses tuples, lists, and keywords.

The important thing that both ASTs convey is the *structure* of the words, how
they relate to each other. And they relate to each other as a tree. There is a
root node, representing the whole program, and it has 0 or more children, which
are other nodes, representing parts of the program. Those children have their
own children until we hit the bottom. In both cases, for instance, a `def` form
is a node which has more-or-less three children: 

1. a name;
2. arguments;
3. a function body.

The children, in order to be useful, will probably be complex nodes themselves,
with their own children---especially the function body.

Thus is the nature of all source code, quite nearly[^allsource]. It's all
trees. A parser takes source code, produces an abstract syntax tree, which is
then interpreted, evaluated, or analyzed.

[^allsource]: Except maybe Forth.

### Taking the Syntax Out

The motto of the LFE (Lisp Flavoured Erlang) programming language is, ["Taking
the syntax out of distributed systems programming"](https://lfe.io/). This
seems like something of an odd boast. Generally, syntax of some kind would seem
to be a virtue. Otherwise, you're just programming directly with those trees.

Well, yes, quite. Let's repeat the above AST exercise in a Lisp. I'll pick
[Janet](janet-lang.org/).

*A Janet program*

```clj 
(defn id
 [x]
 x)
```

```clj
repl:1:> (quote (defn id [x] x))
(defn id [x] x)
```

In this case, the program is the familiar moonscape of parentheses, denuded of
colons and curly braces and do/ends. And the AST of that program is trivial:
it's the exact same thing.

In Janet, a tuple is a sequence of tokens bounded by parentheses or square
braces, and a symbol is a bare word without any quotes around it. So the AST
for my program is simply a tuple with four elements: a symbol, a symbol, a
tuple, and a symbol. That inner tuple has a single element: a symbol.

The really important thing here is that lists (or tuples), too, can be trees.
Let us decide to treat the first element of a list as the "type" of a node, and
all the subsequent elements as the children of that node.

We have just invented s-expressions, the core syntactic construct at the heart
of every Lisp (I *do* feel quite comfortable being essentialist here).

The only use of s-expressions is that they give us something that's quite easy
to write and quite easy to parse that *directly expresses a tree structure.*

In other words, they allow us to skip the syntax step. And it is one of Lisp's
core contentions that, while a pleasant syntax is superficially appealing, you
won't miss it.