~abcdw/trop.in

13687bce92eeac91b25b0891fcd334cef5534640 — Andrew Tropin 9 days ago 0b569ec
Publish scheme ssgs review
1 files changed, 175 insertions(+), 54 deletions(-)

M pages/posts/2023-05-10-scheme-ssgs-review.md
M pages/posts/2023-05-10-scheme-ssgs-review.md => pages/posts/2023-05-10-scheme-ssgs-review.md +175 -54
@@ 1,6 1,6 @@
title: Scheme Static Site Generators Review
date: 2023-05-10 12:00
tags: architecture, tech
date: 2023-05-23 12:00
tags: architecture, tech, scheme
abstract: An overview of Scheme ecosystem in the field of static site generators (SSGs), review of the Haunt SSG architecture and possible ways to improve it and Guile ecosystem.
---



@@ 248,16 248,19 @@ Providing default values for them is convinient, but making them to be
fields of `site` records incorporates unecessary assumptions about
blog nature of the site, which can negatively impact the rest of the
implementation by adding unwanted coupling and reducing composability.
One of the options to avoid it is to make them to be values in
default-metadata rather than fields in the record.

TODO: Write about possible alternative for site fields.

#### Builders
Builders are functions, which accept `site` and `posts` and returns a
list of artifacts.  Artifacts are records, which have
`artifact-writer` field, containing a closure writing actual output
file.  There are a number of different builders provided out of the
box, but the most basic one (static-page) is missing, luckily it's not
hard to implement it, so let's do it.
#### Builders, Themes and Readers
Builders are functions, which accept `site` and `posts`, apply series
of transformations and returns a list of artifacts.  Themes and
Readers are basically transformations used somewhere in the build
process.  Artifacts are records, which have `artifact-writer` field,
containing a closure writing actual output file.  There are a number
of different builders provided out of the box, but the most basic one
(static-page) is missing, luckily it's not hard to implement it, so
let's do it.

```scheme
(define* (page-theme #:key (footer %default-footer))


@@ 300,68 303,186 @@ transmorations happens here:
- `serialized-artifact` creates a closure, which wraps `sxml->html`
  and will later serialize obtained SXML for the page to HTML.

### Readers
There is a concept of readers, small functions


### Guile-Commonmark
It used in haunt by default to parse markdown files in SXML, it
doesn't support embeded html, tables, footnotes and comments, so it
can be quite inconvinient for many use cases.

TODO: It was something else important that was missing in
guile-commonmark?

### Mix of imlicit and explicit things
The implementation using already existing API is quite easy, but
unfortunately not perfect.  While functions and records are composable
enough to produce desired results, names are quite confusing and
tightly related to blogs, but doesn't make much sense in the context
of other site types.

### Metadata
Accepts only one-line metadata.  Doesn't accept files without metadata.
Metadata is not a part of the html grammar -> post is not a valid html.

register-metadata-parser! is a reimplementation of multimethods.
Every builder always accepts a list of posts, which were read and
transformed into sxml before ahead, this is imlpicit and again blog
related, which makes implementation less generic.  It could be
implemented in the `blog` builder, but this way other builders like
atom-feed won't be able to reuse readed posts from from `blog` builder
and would need to read them again.  This is due to the fact, that
build process has 3 primary steps and looks like this:

```scheme
;; 1. Prepare site and posts

### Site Alist
;; 2. Build artifacts
(builder1 site posts) ;; => artifacts-1
(builder2 site posts) ;; => artifacts-2
(builder3 site posts) ;; => artifacts-3

;; 3. Produce actual site:
(serialize-artifacts
 (append artifacts-1 artifacts-2 artifacts-3))
```
`((posts-directory . "pages/posts")
  (build-directory . "target/site"))
```

### Theme
Layout for posts and collections is the same. the same layout is
coupled to both of them.

### TODOs
Linking to md files are not converted to apropriate urls, do we want
to implement it and if want then how?
It makes a build process rigid and makes it harder to compose
procedures.  The alternative more streamlined process could look like
this:

Org-roam workflow
```scheme
(define readers (list ...))
;; threading macro passes the result of the form
;; as a first argument to the next form
(->
 (make-site ...) ;;=> ((site . <site-record>))
 (read-posts
  "posts/" readers) ;;=> ((posts . <list-of-posts>) (site . <site-record>))
 (static-page "index.md" "index.html") ;;=> ((artifacts <index-artifact>) ...)
 (blog-posts theme) ;;=> ((artifacts <post1-artifact> <index-artifact>) ...)
 (collection "main") ;;=> ((artifacts <coll-artifact> <post1-artifact> ...) ...)
 (atom) ;; takes value from posts and appends a few more artifacts
 (atom-by-tags)
 (serialize-artifacts!))
```

### Workflows
- ox-haunt
- md
- citations
- one file multiple post
Just a series of transformations, which enriches one associative data
structures.  Moreover it makes the implementation of such
transformations much more composable:

### Build/Deploy
Rebuild and redeploy do it for the whole site every time.
```scheme
(define (read-posts o dir readers)
  (let* (;; (dir (site-posts-dir (assoc-ref x 'site))) ; could be
         (posts (map (read-with-readers readers) (files-in dir))))
    (alist-update o 'posts (lambda (x) (append x posts)))))

(define* (static-page o file destination
                      #:key
                      (reader commonmark-reader)
                      (page-layout default-page-layout))
  (let* ((sxml-body (get-sxml (reader from)))
         (sxml-page (page-layout sxml-body))
         (page (serialized-artifact destination sxml-page sxml->html)))
    (alist-update o 'artifacts (lambda (x) (append x (list page))))))

(define* (blog-posts o destination-dir
                     #:key
                     (page-layout default-page-layout)
                     (post-layout post-layout))
  "Implementation for the first posts here is to clearer demonstrate the
idea of reusability."
  (let* ((post (first (assoc-ref o 'posts)))
         (destination (string-append destination-dir (post-file post)))
         (sxml-content (get-sxml post))
         (sxml-body (post-layout from))
         (sxml-page (page-layout sxml-body))
         (page (serialized-artifact destination sxml-page sxml->html)))
    (alist-update o 'artifacts (lambda (x) (append x (list page))))))

(define* (collection o name
                     #:key
                     (filter-function identity)
                     (collection-generator default-collection-generator)
                     (page-layout default-page-layout))
  (let* ((posts (filter-function (assoc-ref o 'posts)))
         (file (string-append name ".html"))
         (sxml-body (collection-generator posts))
         (sxml-page (page-layout sxml-body))
         (collection (serialized-artifact file sxml-page sxml->html)))
    (alist-update o 'artifacts (lambda (x) (append x (list collection))))))
```

TODO:
The naming of intermediate transformations is much more suitable (no
notion of the post in static-page builder), the transformations are
more atomic and it's easier to reuse them (page-layout and similiar)
and there is no need to combine them into records like `theme`, it's
easier to restructure complex transformations, for example there is an
option to make a collection a part of `blog` builder or be a separate
step as in example above, there is no need to special case read and
serialize steps, the read step can skip posts, which are flagged as
drafts or have some other advanced logic, now it's possible to build a
page, which relies on the content of previous steps, for example a
collection of generated rss/atom links.

However, such implementation has its own flaws: more flexibility and
less rigid structure can lead to more user mistakes and steeper
learning curve, original implementation theoretically could run
builders in parallel, but here one will need to implement it on the
user or builder side.

How to build page with links to rss?
How to build a collection with different template?
### Readers
As a component of the build process we encountered a step, where file
with in markup language is read by readers.  There are two parts for
it: reading metadata and reading actual content.  Let's cover
implementation details for them.

#### Metadata
As show in the example code snippet in the section related to
transformation, one can provide additional metadata in simple
key-value format delimited by `---` from the content of the markup
file.  There are two main issues with the implementation, let's
discuss them.

The metadata is required for built-in readers and even if one don't
want to set any values, they have to add `---` at the beginning of the
file.  This requirement is not needed and could be easily avoided.

Metadata reader accepts only simple `:` delimited key-value pairs.  It
maybe not as flexible as yaml frontmatter.  Metadata in such format
usually is not a part of the markup grammar and that means files are
written in the invalid markup.  However, it's not a big deal, as
readers can use custom metadata parsers.

#### Guile-Commonmark and Tree-Sitter
Guile-Commonmark is used in Haunt by default to parse markdown files
in SXML, it doesn't support embeded html, tables, footnotes and
comments, so it can be quite inconvinient for many use cases.  It's
somehow works and serves basic needs and more advanced use cases can
be potentially implemented with more feature full libraries like
hypotetical `guile-ts-markdown`
([tree-sitter](https://tree-sitter.github.io/) based markdown parser).

## Conclusion
The conclusion here
Haunt is the primary player in Scheme static site generators field at
the moment of writing.  It gives all the basics to get up and running.
The number of available learning resources in the wild much smaller
than for similiar solutions from other languages ecosystems, but
provided documentation and source code is enough for seasoned schemer
to start with it and more importantly to learn everything about it in
a matter of hours, which is not possible for projects like `hugo`,
`jekyll`.

The functionality can be lacking in some cases, but due to hackable
nature of the project it's possible to gradually build upon basics and
add all the things needed.  Unfortunatelly, the current state of
Scheme ecosystem and Guile in particular feels to be behind more
mainstream languages, but luckily the popularity of Guile reached the
critical level and the ecosystem will start growing in the nearest
future.

### Future Work
Possible future steps are improving SXML/HTML ecosystem in Guile,
producing tree-sitter based parsers for various formats
There is a number of improvements points for Haunt in particular and
Guile and Scheme in general.  More complete tooling for working with
markup languages: org, md, html, yaml, etc.  As a generic solution
tree-sitter seems a good candidate to quickly cover this huge area.

More streamlined and composable build process for Haunt described in
Builders section could be a good thing as well to make SSG to be more
flexible and components more reusable.

Possible integrations with other tools like Guix, REPL, Emacs for
easier deployment, better caching, more interactive development and
other goodies.

More documenation, materials and tool for possible workflows and use
cases from citation capabilites and automatic url resolution to
on-huge-file workflows and org-roam integration.

**Aknowledgments.** Thank you to [David
**Aknowledgments.** Kudos to [David
Thompson](https://dthompson.us/about.html) for making Haunt.