~fosskers/transducers.el

Ergonomic, efficient data processing for Emacs Lisp.
release: 1.1.0
docs: minor docstring corrections
fix: `t-for-each` yields `t`

clone

read-only
https://git.sr.ht/~fosskers/transducers.el
read/write
git@git.sr.ht:~fosskers/transducers.el

You can also use your local clone with git send-email.

Transducers: Ergonomic, efficient data processing

Table of Contents

I think Transducers are a fundamental primitive that decouples critical logic from list/sequence processing, and if I had to do Clojure all over I would put them at the bottom.

– Rich Hickey

Transducers are an ergonomic and extremely memory-efficient way to process a data source. Here "data source" means simple collections like Lists or Vectors, but also potentially large files or generators of infinite data.

Transducers…

  • allow the chaining of operations like map and filter without allocating memory between each step.
  • aren't tied to any specific data type; they need only be implemented once.
  • vastly simplify "data transformation code".
  • have nothing to do with "lazy evaluation".
  • are a joy to use!

Example: While skipping every second line of a file, sum the lengths of only evenly-lengthed lines.

(t-transduce
  ;; How do we want to process each element?
  (t-comp (t-step 2) (t-map #'length) (t-filter #'cl-evenp))
  ;; How do we want to combine all the elements together?
  #'+
  ;; What's our original data source?
  (t-file-read "README.org"))
2120

Looking for Transducers in other Lisps? Check out the Common Lisp and Fennel implementations!

1. History and Motivation

Originally invented in Clojure and adapted to Scheme as SRFI-171, Transducers are an excellent way to think about - and efficiently operate on - collections or streams of data. Transduction operations are strict and don't involve "laziness" or "thunking" in any way, yet only process the exact amount of data you ask them to.

This library is mostly a port of the Common Lisp implementation, with a few alterations to account for the minor differences between Common Lisp and Emacs Lisp.

2. Installation

This package is not (yet) available on MELPA.

3. Usage

3.1. Importing

Since this is just a library, you can import it as usual:

(require 'transducers)

Every function in the library is prefixed by transducers- but you're encouraged to use read-symbol-shorthands to shorten this to t-. This can be done interactively in your own files via add-file-local-variable-prop-line, which you can use to set your top line to:

;; -*- read-symbol-shorthands: (("t-" . "transducers-")); -*-

or via add-file-local-variable, which results in:

;; Local Variables:
;; read-symbol-shorthands: (("t-" . "transducers-"))
;; End:

at the bottom of the file. After this, you can make relatively clean calls like:

(t-transduce (t-map #'1+) #'t-vector '(1 2 3))
[2 3 4]

This can also be done in .org files, so that Transducers can be used in their short forms even in Babel source blocks. That's exactly what this README does!

The remaining examples below use t- for brevity.

3.2. Transducers, Reducers, and Sources

;; The fundamental pattern.
(t-transduce <transducer-chain> <reducer> <source>)

Data processing largely has three concerns:

  1. Where is my data coming from? (sources)
  2. What do I want to do to each element? (transducers)
  3. How do I want to collect the results? (reducers)

Each full "transduction" requires all three. We pass one of each to the t-transduce function, which drives the process. It knows how to pull values from the source, feed them through the transducer chain, and wrap everything together via the reducer.

  • Typical transducers are t-map, t-filter, and t-take.
  • Typical reducers are +, t-count, t-cons, and t-fold.
  • Typical sources are lists, vectors, strings, hash tables, and files.

Generators are a special kind of source that yield infinite data. Typical generators are t-repeat, t-cycle, and t-ints.

Let's sum the squares of the first 1000 odd integers:

(t-transduce
 (t-comp (t-filter #'cl-oddp)          ;; (2) Keep only odd numbers.
         (t-take 1000)                 ;; (3) Keep the first 1000 filtered odds.
         (t-map (lambda (n) (* n n)))) ;; (4) Square those 1000.
 #'+         ;; (5) Reducer: Add up all the squares.
 (t-ints 1)) ;; (1) Source: Generate all positive integers.
1333333000

Two things of note here:

  1. t-comp is used here to chain together different transducer steps. Notice that the order appears "backwards" from usual function composition. It may help to imagine that t-comp is acting like the thread-last macro here.
  2. The reduction via + is listed as Step 5, but really it's occuring throughout the transduction process. Each value that makes it through the composed transducer chain is immediately added to an internal accumulator.

Explore the other transducers and reducers to see what's possible! You'll never write a loop again.

3.3. Using the t-fold Reducer

A reducer is a function that "reduces" or "folds" the results of the transducer chain into some single value. This could be a collection or some scalar. Some reducers can even short-circuit, yielding a desired value early.

t-fold is the ultimate reducer, and thus deserves special attention. t-fold creates an ad-hoc reducer based on a given 2-argument function. An optional seed can be given as the initial accumulator value, which also becomes the return value in case there were no input left in the transduction.

The normal CL functions + and * are automatically valid reducers, because they yield sane values even when given 0 or 1 arguments. Other functions like max cannot be used as-is as reducers since they can't be called without arguments. For functions like this, t-fold is appropriate.

;; The length of the longest word in this README.
(let ((xf (t-comp (t-map #'split-string)
                  #'t-concatenate
                  (t-filter (lambda (w) (string-match-p "^[a-zA-Z]+$" w)))
                  (t-map #'length))))
  (t-transduce xf (t-fold #'max) (t-file-read "README.org")))
14

In Clojure this function is called completing.

4. Example Gallery

4.1. Words in a File

(t-transduce (t-comp (t-map #'split-string)
                     #'t-concatenate)
             #'t-count
             (t-file-read "README.org"))
1101

4.2. Splitting a string by its lines

Transducing over a string yields its characters:

(t-transduce #'t-pass #'t-count "hello\nworld!")
12

If you want to transduce over its lines instead, create a temporary buffer first:

(with-temp-buffer
  (insert "hello\nworld!")
  (t-transduce #'t-pass #'t-cons (current-buffer)))
hello world!

4.3. Reading and Writing CSV data

This library also provides two transducers for processing CSV data: t-from-csv and t-into-csv. The original data can come from any source, like a file, open buffer, or raw string.

t-from-csv reads the data into a stream of Hash Tables with each value keyed to the fields provided in the first line. t-into-csv reverses the process, given a sequence of headers to select.

(t-transduce (t-comp #'t-from-csv
                     (t-into-csv ["Age" "Name"]))
             #'t-cons
             ["Name,Age,Hair" "Alice,35,Blond" "Bob,26,Black"])
("Age,Name" "35,Alice" "26,Bob")

Here we're immediately converting back into CSV strings, but with t-comp we're free to add as many intermediate steps as we like.

4.4. Reading and Writing JSON data

It is also possible to read from and write to JSON buffers. Reading assumes that the buffer contains a top-level array. By default, yielded objects are plists, but this can be customized via the :object-type keyword.

(with-temp-buffer
  (insert "[{\"name\": \"Colin\"},{\"name\": \"Jack\"}]")
  (t-transduce #'t-pass #'t-cons (t-from-json-buffer (current-buffer))))
((:name "Colin") (:name "Jack"))

Note that t-from-json-buffer is a "source".

Likewise, t-into-json-buffer is a reducer that writes a stream of lisp values back into the current buffer.

(with-temp-buffer
  (t-transduce #'t-pass #'t-into-json-buffer '((:name "Colin") (:name "Jack")))
  (buffer-string))
[{"name":"Colin"},{"name":"Jack"}]

Note that t-into-json-buffer makes no assumptions about where point initially is the current buffer, nor is the buffer automatically saved. Such concerns are the responsibility of the user.

4.5. Reducing into Property Lists and Assocation Lists

There is no special reducer function for plists, because none is needed. If you have a stream of cons cells, you can break it up with t-uncons and then collect with t-cons as usual:

(t-transduce (t-comp (t-map (lambda (pair) (cons (car pair) (1+ (cdr pair)))))
                     #'t-uncons)
             #'t-cons
             (t-plist '(:a 1 :b 2 :c 3)))
(:a 2 :b 3 :c 4)

Likewise, Association Lists are already lists-of-cons-cells, so no special treatment is needed:

(t-transduce #'t-pass #'t-cons '((:a . 1) (:b . 2) (:c . 3)))
((:a . 1) (:b . 2) (:c . 3))

5. Issue Tracker and Mailing List

6. Resources

Created: 2023-12-24 Sun 23:59

Validate