~abcdw/trop.in

faa92a3c73f38e8b2b6d58472d09547d41580f24 — jgart 1 year, 4 months ago 13687bc
Various fixes for scheme ssgs review
1 files changed, 116 insertions(+), 118 deletions(-)

M pages/posts/2023-05-10-scheme-ssgs-review.md
M pages/posts/2023-05-10-scheme-ssgs-review.md => pages/posts/2023-05-10-scheme-ssgs-review.md +116 -118
@@ 32,7 32,7 @@ and provides a lot of functionality outside of SSG scope, so we don't
cover it in this writing.

Basically we have only one option left at the moment: Haunt and the
further discussion will be ralated to it, but before exploring it, we
further discussion will be related to it, but before exploring it, we
need to get to common ground and cover the topic of different markup
languages.



@@ 40,7 40,7 @@ languages.
Markup languages are used for defining documentation structure,
formatting, and relationship between its parts.  They play an
important role in SSGs, different languages can suite better for
different tasks: simple and expressive for human convinience, powerful
different tasks: simple and expressive for human convenience, powerful
and capable for intermediate representation and manipulation,
compatible and wide-spread for distribution.



@@ 61,9 61,9 @@ it's defined in the SGML Doctype language.  Often used for
representing and exchange data.

HTML is a more user friendly markup language, it's defined in plain
english, has more forgiving parses and interpreters, allows things
english, has more forgiving parsers and interpreters, allows things
like uppercased tags, tags without matching closing tag.  Such
flexibilities can be convinient for users, but it makes it harder to
flexibilities can be convenient for users, but it makes it harder to
programmaticaly operate on it (parse, process and serialize).

XHTML (XML serialization of HTML) is a version of HTML, which is


@@ 75,7 75,7 @@ relationship between them and tools for XML can't be used for HTML in
general case.

### Lightweight Markup Languages
This is another family of markup languages, which are simplier, less
This is another family of markup languages, which are simpler, less
verbose, and more human-oriented in general.  The notable members are
Wiki, Markdown, Org-mode, reStructuredText, BBCode, AsciiDoc.



@@ 85,9 85,9 @@ final output is produced, usually in the form of (X)HTML documents.

### Other Markup Languages
There are a number of languages and typesetting systems, which are not
covered by previous two sections: Texinfo, LaTeX, Skribe, Hiccup,
covered by the previous two sections: Texinfo, LaTeX, Skribe, Hiccup,
SXML.  The goals for them can be different: preparing hardcopies,
using as intermediate format, or just more suitable for specific needs
use as an intermediate format, or better suitability for specific needs
like writing documentation.

## Haunt Overview


@@ 100,13 100,13 @@ HTML.  Let's discuss various parts of this process in more details.

### SXML
SXML is a representation of XML using S-expressions: lists, symbols
and strings, which can be less verbose than original representation
and strings, which can be less verbose than the original representation
and much easier to work with in Scheme.

[SXML](https://okmij.org/ftp/Scheme/xml.html#SXML-spec) is used as
intermediate format for pages and their parts in Haunt, which
[SXML](https://okmij.org/ftp/Scheme/xml.html#SXML-spec) is used as an
intermediate format for pages and their parts in Haunt, which is
relatively easy to process, manipulate and later serialize to target
formats like XHTML.  It can be crafted by creating s-expression from
formats like XHTML.  It can be crafted by creating s-expressions from
Scheme code manually, or programmatically, or with a mix of both.  It
looks like this:



@@ 126,7 126,7 @@ looks like this:
As it was mentioned in the introduction there is no direct
relationship between XML and HTML, and while we usually can parse
arbitrary HTML and convert it to SXML without losing significant
information, we can't directly use XML parses for that.  For example
information, we can't directly use XML parsers for that.  For example
this HTML is not valid XML:

```html


@@ 136,28 136,28 @@ this HTML is not valid XML:
Luckily, we can present boolean attributes in full form as
`hidden="hidden"`, which is valid both in HTML[^4] and XML.

Most lightweight markup languages as well as SSGs usually targeting
HTML, but SSG needs to combine the content, templates and data from
various sources and merge them together, so SXML looks as a solid
choice for intermediate representation.
Most lightweight markup languages as well as SSGs usually target HTML,
but SSG needs to combine the content, templates and data from various
sources and merge them together, so SXML looks like a solid choice for
intermediate representation.

### The Transformation Workflow
Each site page is built out of a series of consequently applied
trasformations, the transformation is basically a function, which
Each site page is built out of a series of subsequently applied
trasformations. The transformation is basically a function, which
accepts some metadata and data and returns another data (usually SXML)
and sometimes additional metadata.  Because transformation is a basic
pure function, a few transformations can be composed in one bigger
and sometimes additional metadata.  Because this transformation is
a pure function, a few transformations can be composed in one bigger
transformation.

We will cover it in more details in the next section, but readers,
templates, layouts, serializers, builders are all just
transformations.  For example the top level template, called layout
just produces SXML for the final page, which can be serialized to the
target format.  To demonstrate the workflow we will go bottom up.
templates, layouts, serializers, builders are all just transformations.
For example the top level template, called layout just produces SXML
for the final page, which can be serialized to the target format.
To demonstrate the workflow we will take a bottom-up approach.

Let's take a simple Markdown file, where one wants to write the
content of a blog post in human-friendly markup langugage and let's
add a metadata to the top of this file: title, publish date, tags.
add metadata to the top of this file: title, publish date, tags.

```Markdown
title: Hello, CommonMark!


@@ 168,7 168,7 @@ tags: markdown, commonmark
## This is a CommonMark post

CommonMark is a **strongly** defined, *highly* compatible
specification of Markdown, learn more about CommomMark
specification of Markdown. Learn more about CommomMark
[here](http://commonmark.org/).
```



@@ 190,29 190,30 @@ data (SXML).
Metadata+data representing one post is a good unit of operation.  With
one more transformation (it can be just a template, function adding
`html`, `head`, `body` tags and a few more minor things) SSG can
produce almost ready for serialization SXML.  Decide on resulting file
name, one more serialization step and final HTML is here.
produce almost ready for serialization SXML.  After deciding on the
resulting file name and serialization step, the final HTML is produced.

Some additional transformation can be desirable in between: substitute
relative links to source markup files to finally generated html files
or something else, but overall it fits this general trasformation
Some additional transformations can be desirable. For example,
substituting relative links to source markup files in the generated html
files or something else, but overall it fits this general trasformation
workflow well.

Let's zoom out a little and take a look at the directory, rather than
a single file.  Usually, SSGs operate on a number of files and in
addition to simple pages can generate composite pages like a list of
articles, rss feeds or something else.  For this purpose our unit of
operation becomes a list of data+metadata objects: instead of parsing
one markup file SSG traverses the whole directory and generates a list
of objects for future transformation, overall idea still the same, but
instead many output files for many input files, SSG produces a list
containing only a few or even one output file.
Let's zoom out a little and take a look at the directory structure,
rather than a single file.  Usually, SSGs operate on a number of files
and in addition to simple pages can generate composite pages like a
list of articles, rss feeds, etc.  For this purpose our unit of
operation become a list of data+metadata objects: instead of parsing
one markup file, SSGs traverses the whole directory and generate a
list of objects for future transformation. The overall idea is still
the same, but instead many output files get produced from many input
files. SSGs produces a list containing only a few or even one output
file.

### The Implementation

#### The Entry Point
The entry point is a `site` record, which can be created with a
function having the following docstring:
The entry point in haunt is a `site` record, which can be created with
a function that has the following docstring:

```
Create a new site object.  All arguments are optional:


@@ 232,35 233,33 @@ READERS: A list of reader objects for processing posts
BUILDERS: A list of procedures for building pages from posts
```

The primary thing here is a list of builders, as previously mentioned
a builder is a special case of complex transformation, it's a thing,
which do all the work including parsing, templating, generating
collections, serialization, etc.
The primary thing here is a list of builders. As previously mentioned,
a builder is a special case of complex transformation, which does all the
work of parsing, templating, generating collections, serialization, etc.

The rest of the list is basically metadata or auxiliary functions,
while many of those values can be useful, almost none of them are
needed in many cases.  `scheme` and `domain` used for rss/atom feeds,
which are rare for personal or landing pages, the similiar logic is
applicable for the rest of function arguments, except maybe
`build-directory`, which almost always make sense.
The rest of the list is basically metadata or auxiliary functions.
While many of those values can be useful, almost none of them are needed
in many cases.  `scheme` and `domain` are used for rss/atom feeds, which
are rare for personal or landing pages. Similiar logic is applicable
to the rest of the function arguments, except for maybe `build-directory`,
which almost always make sense.

Providing default values for them is convinient, but making them to be
fields of `site` records incorporates unecessary assumptions about
blog nature of the site, which can negatively impact the rest of the
implementation by adding unwanted coupling and reducing composability.
One of the options to avoid it is to make them to be values in
default-metadata rather than fields in the record.
Providing default values for them is convenient, but making them fields
of `site` records incorporates unecessary assumptions about the nature
of the blog and can negatively impact the rest of the implementation by
adding unwanted coupling as well as reducing its composability.  One of
the options to avoid it is to make them values in the default-metadata
rather than fields in the record.


#### Builders, Themes and Readers
Builders are functions, which accept `site` and `posts`, apply series
of transformations and returns a list of artifacts.  Themes and
Readers are basically transformations used somewhere in the build
process.  Artifacts are records, which have `artifact-writer` field,
containing a closure writing actual output file.  There are a number
of different builders provided out of the box, but the most basic one
(static-page) is missing, luckily it's not hard to implement it, so
let's do it.
of transformations and returns a list of artifacts.  Themes and Readers
are basically transformations used in the build process.  Artifacts are
records, which have `artifact-writer` field, containing a closure writing
the actual output file.  There are a number of different builders provided
out of the box, but the most basic one (static-page) is missing, luckily
it's not hard to implement it, so let's do it.

```scheme
(define* (page-theme #:key (footer %default-footer))


@@ 296,26 295,26 @@ path."
```

As described in a section about transformations, the series of
transmorations happens here:
- `read-post` basically prases markdown and returns SXML + metadata.
transformations happens here:
- `read-post` basically parses markdown and returns SXML + metadata.
- `render-post` uses post-template from `theme` to produce SXML post body.
- `render-post` uses layout from `theme` to produce SXML post body.
- `serialized-artifact` creates a closure, which wraps `sxml->html`
  and will later serialize obtained SXML for the page to HTML.

The implementation using already existing API is quite easy, but
The implementation using already existing APIs is quite easy, but
unfortunately not perfect.  While functions and records are composable
enough to produce desired results, names are quite confusing and
tightly related to blogs, but doesn't make much sense in the context
of other site types.

Every builder always accepts a list of posts, which were read and
transformed into sxml before ahead, this is imlpicit and again blog
related, which makes implementation less generic.  It could be
implemented in the `blog` builder, but this way other builders like
atom-feed won't be able to reuse readed posts from from `blog` builder
and would need to read them again.  This is due to the fact, that
build process has 3 primary steps and looks like this:
transformed into sxml ahead of time. This transformation is implicit
and again blog related, which makes the implementation less generic.
It could be implemented in the `blog` builder, but this way other
builders like atom-feed won't be able to reuse readed posts from from
`blog` builder and would need to read them again.  This is due to the
fact, that the build process has three primary steps and looks like this:

```scheme
;; 1. Prepare site and posts


@@ 351,7 350,7 @@ this:
```

Just a series of transformations, which enriches one associative data
structures.  Moreover it makes the implementation of such
structure.  Moreover it makes the implementation of such
transformations much more composable:

```scheme


@@ 409,81 408,80 @@ page, which relies on the content of previous steps, for example a
collection of generated rss/atom links.

However, such implementation has its own flaws: more flexibility and
less rigid structure can lead to more user mistakes and steeper
learning curve, original implementation theoretically could run
builders in parallel, but here one will need to implement it on the
less rigid structure can lead to more user mistakes and a steeper
learning curve. The original implementation could theoretically run
builders in parallel, but one will need to implement it on the
user or builder side.

### Readers
As a component of the build process we encountered a step, where file
with in markup language is read by readers.  There are two parts for
it: reading metadata and reading actual content.  Let's cover
As a component of the build process we encountered a step, where the
file within the markup language is read by readers.  There are two
parts for it: reading metadata and reading actual content.  Let's cover
implementation details for them.

#### Metadata
As show in the example code snippet in the section related to
transformation, one can provide additional metadata in simple
As shown in the example code snippet in the section related to
transformation, one can provide additional metadata in a simple
key-value format delimited by `---` from the content of the markup
file.  There are two main issues with the implementation, let's
discuss them.

The metadata is required for built-in readers and even if one don't
The metadata is required for built-in readers and even if one doesn't
want to set any values, they have to add `---` at the beginning of the
file.  This requirement is not needed and could be easily avoided.

Metadata reader accepts only simple `:` delimited key-value pairs.  It
maybe not as flexible as yaml frontmatter.  Metadata in such format
usually is not a part of the markup grammar and that means files are
written in the invalid markup.  However, it's not a big deal, as
readers can use custom metadata parsers.
The metadata reader simply accepts colon-delimited key-value pairs.
It is potentially not be as flexible as yaml frontmatter.  Metadata in
such format usually is not a part of the markup grammar and that means
files are written in an invalid markup.  However, it's not a big deal,
as readers can use custom metadata parsers.

#### Guile-Commonmark and Tree-Sitter
Guile-Commonmark is used in Haunt by default to parse markdown files
in SXML, it doesn't support embeded html, tables, footnotes and
comments, so it can be quite inconvinient for many use cases.  It's
in SXML, it doesn't support embedded html, tables, footnotes and
comments, so it can be quite inconvenient for many use cases.  It
somehow works and serves basic needs and more advanced use cases can
be potentially implemented with more feature full libraries like
hypotetical `guile-ts-markdown`
a hypothetical `guile-ts-markdown`
([tree-sitter](https://tree-sitter.github.io/) based markdown parser).

## Conclusion
Haunt is the primary player in Scheme static site generators field at
the moment of writing.  It gives all the basics to get up and running.
The number of available learning resources in the wild much smaller
than for similiar solutions from other languages ecosystems, but
provided documentation and source code is enough for seasoned schemer
to start with it and more importantly to learn everything about it in
a matter of hours, which is not possible for projects like `hugo`,
`jekyll`.

The functionality can be lacking in some cases, but due to hackable
nature of the project it's possible to gradually build upon basics and
add all the things needed.  Unfortunatelly, the current state of
Scheme ecosystem and Guile in particular feels to be behind more
mainstream languages, but luckily the popularity of Guile reached the
critical level and the ecosystem will start growing in the nearest
future.
Haunt is the primary player in the Scheme static site generators arena
at the moment of this writing.  It gives all the basics to get up and
running.  The number of available learning resources in the wild are much
smaller than for similiar solutions from other languages ecosystems, but
provided documentation and source code is enough for a seasoned schemer
to start with learn in just a matter of hours. This is not possible with
projects like `hugo`, `jekyll`.

The functionality can be lacking in some cases, but due to the hackable
nature of the project, it is possible to gradually build upon the basics
and as well as any future needs.  Unfortunately, the current state of the
Scheme ecosystem and Guile in particular feels behind more mainstream
languages, but hopefully the popularity of Guile will reach a higher
level and the ecosystem will start growing in the nearest future.

### Future Work
There is a number of improvements points for Haunt in particular and
Guile and Scheme in general.  More complete tooling for working with
markup languages: org, md, html, yaml, etc.  As a generic solution
tree-sitter seems a good candidate to quickly cover this huge area.
There are a number of improvement points for Haunt in particular, and
Guile Scheme in general. We need more complete tooling for working with
markup languages like org, md, html, yaml, etc.  As a generic solution,
tree-sitter seems like a good candidate to quickly cover this huge area.

More streamlined and composable build process for Haunt described in
Builders section could be a good thing as well to make SSG to be more
flexible and components more reusable.
More streamlined and composable build processes for Haunt as described
in the Builders section could add to haunt's flexibility in general as
well as encouraging the use of reusable components.

Possible integrations with other tools like Guix, REPL, Emacs for
easier deployment, better caching, more interactive development and
other goodies.

More documenation, materials and tool for possible workflows and use
More documenation, materials and tools for possible workflows and use
cases from citation capabilites and automatic url resolution to
on-huge-file workflows and org-roam integration.

**Aknowledgments.** Kudos to [David
Thompson](https://dthompson.us/about.html) for making Haunt.
**Acknowledgments.** Kudos to [David
Thompson](https://dthompson.us/about.html) for making Haunt and [Erik
Edrosa](http://www.erikedrosa.com/) for making guile-commonmark.


[^1]: https://jamstack.org/generators/haunt/