~fosskers/cl-transducers

Ergonomic, efficient data processing for Common Lisp.
docs: mention `completing`
test: improve coverage
release: 1.0.0

clone

read-only
https://git.sr.ht/~fosskers/cl-transducers
read/write
git@git.sr.ht:~fosskers/cl-transducers

You can also use your local clone with git send-email.

Transducers: Ergonomic, efficient data processing

Table of Contents

I think Transducers are a fundamental primitive that decouples critical logic from list/sequence processing, and if I had to do Clojure all over I would put them at the bottom.

– Rich Hickey

Transducers are an ergonomic and extremely memory-efficient way to process a data source. Here "data source" means simple collections like Lists or Vectors, but also potentially large files or generators of infinite data.

Transducers…

  • allow the chaining of operations like map and filter without allocating memory between each step.
  • aren't tied to any specific data type; they need only be implemented once.
  • vastly simplify "data transformation code".
  • have nothing to do with "lazy evaluation".
  • are a joy to use!

Example: While skipping every second line of a file, sum the lengths of only evenly-lengthed lines.

(transduce
  ;; How do we want to process each element?
  (comp (step 2) (map #'length) (filter #'evenp))
  ;; How do we want to combine all the elements together?
  #'+
  ;; What's our original data source?
  #p"giant-file.txt")

This library has been confirmed to work with SBCL, CCL, ECL, and Clasp.

⚠ ABCL cannot be used due to lack of support for tail-call elimination and tail-recursive labels.

1. History and Motivation

Originally invented in Clojure and adapted to Scheme as SRFI-171, Transducers are an excellent way to think about - and efficiently operate on - collections or streams of data. Transduction operations are strict and don't involve "laziness" or "thunking" in any way, yet only process the exact amount of data you ask them to.

This library draws inspiration from both the original Clojure and SRFI-171, while adding many other convenient operations commonly found in other languages.

2. Installation

This library is available on Ultralisp. With Ultralisp installed as a distribution, you can simply run the following to download the main system:

(ql:quickload :transducers)

For the JSON extensions:

(ql:quickload :transducers-jzon)

3. Usage

3.1. Importing

Since this library reuses some symbol names also found in :cl, it is expected that you import transducers as follows in your defpackage:

(:local-nicknames (#:t #:transducers))

You can then make relatively clean calls like:

(t:transduce (t:map #'1+) #'t:vector '(1 2 3))
;; => #(2 3 4)

3.2. Transducers, Reducers, and Sources

;; The fundamental pattern.
(t:transduce <transducer-chain> <reducer> <source>)

Data processing largely has three concerns:

  1. Where is my data coming from? (sources)
  2. What do I want to do to each element? (transducers)
  3. How do I want to collect the results? (reducers)

Each full "transduction" requires all three. We pass one of each to the transduce function, which drives the process. It knows how to pull values from the source, feed them through the transducer chain, and wrap everything together via the reducer.

  • Typical transducers are map, filter, and take.
  • Typical reducers are +, count, t:cons, and fold.
  • Typical sources are lists, vectors, strings, hash tables, and files.

Generators are a special kind of source that yield infinite data. Typical generators are repeat, cycle, and ints.

Let's sum the squares of the first 1000 odd integers:

(t:transduce
 (t:comp (t:filter #'oddp)             ;; (2) Keep only odd numbers.
         (t:take 1000)                 ;; (3) Keep the first 1000 filtered odds.
         (t:map (lambda (n) (* n n)))) ;; (4) Square those 1000.
 #'+         ;; (5) Reducer: Add up all the squares.
 (t:ints 1)) ;; (1) Source: Generate all positive integers.
;; => 1333333000 (31 bits, #x4F790C08)

Two things of note here:

  1. comp is used here to chain together different transducer steps. Notice that the order appears "backwards" from usual function composition. It may help to imagine that comp is acting like the ->> macro here. comp is supplied here as a convenience; you're free to use alexandria:compose if you wish.
  2. The reduction via + is listed as Step 5, but really it's occuring throughout the transduction process. Each value that makes it through the composed transducer chain is immediately added to an internal accumulator.

Explore the other transducers and reducers to see what's possible! You'll never write a loop again.

3.3. Using the fold Reducer

A reducer is a function that "reduces" or "folds" the results of the transducer chain into some single value. This could be a collection or some scalar. Some reducers can even short-circuit, yielding a desired value early.

fold is the ultimate reducer, and thus deserves special attention. fold creates an ad-hoc reducer based on a given 2-argument function. A SEED is required as the initial accumulator value, which also becomes the return value in case there were no input left in the transduction.

The normal CL functions + and * are automatically valid reducers, because they yield sane values even when given 0 or 1 arguments. Other functions like max cannot be used as-is as reducers since they can't be called without arguments. For functions like this, fold is appropriate.

;; The length of the longest word in this README.
(let ((xf (t:comp (t:map #'str:words)
                  #'t:concatenate
                  (t:filter (lambda (w) (every #'alpha-char-p w)))
                  (t:map #'length))))
  (t:transduce xf (t:fold #'cl:max 0) #p"README.org"))
;; => 14

In Clojure this function is called completing.

3.4. Processing JSON Data

The system transducers-jzon provides automatic JSON streaming support via the jzon library. Like transducers itself, it is expected that you import this system with a nickname:

(:local-nicknames (#:j #:transducers-jzon))

Only two functions are exposed: read and write.

  • read is a source that accepts a pathname, open stream, or a string. It produces parsed JSON values as Lisp types. JSON Objects become Hash Tables.
  • write is a reducer that expects an open stream. It writes the stream of Lisp types into their logical JSON equivalents.

Here is a simple example of reading some JSON data from a string, doing nothing to it, and outputting it again to a new string:

(with-output-to-string (stream)
  (t:transduce #'t:pass (j:write stream) (j:read "[{\"name\": \"A\"}, {\"name\": \"B\"}]")))
;; => "[{\"name\":\"A\"},{\"name\":\"B\"}]"

Note that the JSON data must be a JSON array. There is otherwise no size limit; the library can handle any amount of JSON input.

For more examples, see the Gallery below.

4. API and Compatibility Charts

This library offers an extensive implementation of the Transducer pattern compared to a number of languages. Note that the xforms library offers a number extensions to what is normally available in Clojure.

4.1. Transducers

  CL transducers loop macro Clojure Scheme Rust Haskell
pass   map identity map identity Just collect map id
map for x being the...
filter if / when
filter-map   keep   mapMaybe
remove   unless      
unique   distinct   nub
dedup   dedupe    
drop  
drop-while  
take  
take-while while etc.
replace        
Flat Map     mapcat tappend-map flat_map >>=
uncons   cat      
concatenate   cat flatten join
flatten        
segment   partition-all    
window       chunks  
group-by   partition-by    
intersperse   interpose
enumerate   map-indexed  
step by take-nth      
scan        
random-sample          
log Print in loop body   trace  

4.2. Higher-order Transducers

Transducers which can alter the transduction chain itself during runtime.

  CL transducers loop macro Clojure Scheme Rust Haskell
branch          
inject          
split          
zip      

4.3. Reducers

  CL transducers loop macro Clojure Scheme Rust Haskell
Into List collect conj
Into Vector vconcat conj  
Into String concat str  
Into Map   conj  
Into Set     conj      
count  
average          
anyp    
allp    
first return etc.    
last      
fold   completing  
max maximize    
min minimize    
find return etc.      
for-each do     mapM_

Why oh why is it so difficult to find an implementation of average in many languages?

4.4. Generators

  CL transducers loop macro Clojure Scheme Rust Haskell
ints for x from N to M     1.. [1..]
cycle      
repeat repeat    
random          
shuffle          

4.5. Data Sources

  CL transducers loop macro Clojure Scheme Rust Haskell
File Lines  
JSON Stream    

5. Example Gallery

5.1. Words in a File

(t:transduce (t:comp (t:map #'str:words) #'t:concatenate)
             #'t:count #p"README.org")
;; => 977

5.2. Reducing into Property Lists and Assocation Lists

There is no special reducer function for plists, because none is needed. If you have a stream of cons cells, you can break it up with uncons and then collect with cons as usual:

(t:transduce (t:comp (t:map (lambda (pair) (cons (car pair) (1+ (cdr pair)))))
                     #'t:uncons)
             #'t:cons (t:plist '(:a 1 :b 2 :c 3)))
;; => (:a 2 :b 3 :c 4)

Likewise, Association Lists are already lists-of-cons-cells, so no special treatment is needed:

(t:transduce #'t:pass #'t:cons '((:a . 1) (:b . 2) (:c . 3)))
;; => ((:a . 1) (:b . 2) (:c . 3)))

5.3. JSON: Calculating average age

Since JSON Objects are parsed as Hash Tables, we use the usual functions to retrieve fields we want.

(t:transduce (t:filter-map (lambda (ht) (gethash "age" ht)))
             #'t:average
             (j:read "[{\"age\": 34}, {\"age\": 25}]"))
;; => 59/2 (29.5)

5.4. Sieve of Eratosthenes

An ancient method of calculating Prime Numbers.

(let ((xf (t:comp (t:inject (lambda (prime) (t:filter (lambda (n) (/= 0 (mod n prime))))))
                  (t:take 10))))
  (cons 2 (t:transduce xf #'t:cons (t:ints 3 :step 2))))
;; => (2 3 5 7 11 13 17 19 23 29 31)

6. Limitations

  1. This library is generally portable, but assumes your CL implementation supports tail-recursive usage of labels.
  2. A way to model the common zip function has not yet been found.

7. Further Work

  • [ ] Notes on performance.
  • [ ] More higher-order transducers.

Created: 2023-09-26 Tue 00:58

Validate