Bump version number
Add Tiny C example unit tests
Add more in-lisp and README documentation
Monadic parser combinator library.
Conventional parsers are modeled as a two-stage process: a scanner that takes characters and produce tokens, and a parser that takes tokens and produce syntax trees. Monadic parsers instead model parsers as smaller parsers that can compose together, much like the procedures of a conventional program.
Other parser combinator libraries I've found for Common Lisp are either too macro-heavy for me, or warn that they are not production-ready. I don't trust third-party libraries that don't trust themselves, and so I've made my own, going for a simple interface targeted for public consumption.
Parsnip targets user-facing compilers, interpreters, or other readers of character-based languages, where programs would like to recover from or give insight about parser errors. Parsnip does not target performance-intensive or byte-based decoders, such as those used in network stacks, or JSON/XML decoders for request data for high-performance web applications.
Any comments, questions, issues, or patches are greatly appreciated! I do my main development on Sourcehut, with a mailing list and issue tracker.
Parsnip is available on Quicklisp:
(ql:quickload :parsnip)
Parsnip can also be installed locally. Make sure to also install the sole dependency Alexandria:
$ cd ~/common-lisp/ # Or wherever you store your definitions
$ git clone https://git.sr.ht/~shunter/parsnip
(require :parsnip)
(use-package :parsnip)
;; digit := [0-9]
(defparser one-digit ()
(char-if #'digit-char-p))
(defparser digits ()
(collect1 'one-digit))
;; whole-part := digit+
(defparser whole-part ()
(let! ((digits 'digits))
(ok (parse-integer (coerce digits 'string)))))
;; decimal-part := '.' [0-9]+
(defparser decimal-part ()
(let! ((digits (progn! (char-of #\.) 'digits)))
(ok (/ (parse-integer (coerce digits 'string))
(expt 10 (length digits))))))
;; number := whole-part [ decimal-part ]
(defparser decimal-number ()
(let! ((whole-value 'whole-part)
(decimal-value (or! 'decimal-part (ok 0))))
(ok (+ whole-value decimal-value))))
(defun parse-from-string (parser string)
(with-input-from-string (stream string)
(parse parser stream)))
(parse-from-string 'decimal-number "123.47") ;; => 12347/100
Parsnip aims to provide rich information for parsers aimed at end-users:
(use-package :xyz.shunter.parsnip.examples.json)
;; bad.json: [10,20,,]
(with-open-file (s "/tmp/bad.json")
(decode-json s))
/tmp/bad.json:1:7: Expected (#\f #\n #\t #\{ #\[
(:integer . 10) #\") on #<STREAM>
[Condition of type PARSER-ERROR]
(with-open-file (s "/tmp/bad.json")
(handler-case (decode-json s)
(parser-error (c)
(values (parser-error-line c)
(parser-error-column c)))))
=> 1
7
(handler-case (decode-json-from-string "[10,20,{\"foo\":\"bar\",}]")
(parser-error (c)
(format t "~A~%" c)
(parser-error-return-trace c)))
NIL:1:20: Expected #\" on #<STRING-INPUT-STREAM>
((xyz.shunter.parsnip.examples.json::value 1 0)
(xyz.shunter.parsnip.examples.json::json-array 1 0)
(xyz.shunter.parsnip.examples.json::value 1 7)
(xyz.shunter.parsnip.examples.json::json-object 1 7)
(xyz.shunter.parsnip.examples.json::json-string 1 20))
The test suite shows how each function works, and how it's expected to perform.
After a couple months of working on this project in my spare time, I believe it is ready for public use. However, you may have certain requirements for your own project which would hold you off from using this library over something else:
sb-cover
while developing this API, to limit erroneous behavior stemming from edge cases.
Every release includes a code coverage report, and every push to the repository triggers an automated system test.Time * Exposure
.
I also appreciate multiple eyes looking at this project.
Any comments, questions, and suggestions are well appreciated :)When the library reaches 1.0, I need to consider what parts of the library to solidify. I recognize these as breaking changes:
I recognize these as non-breaking changes:
The JSON example matches close to the grammar notation of the RFC8259 JSON specification. Outside of a couple outliers (e.g. the value definition is moved to the end), the code is laid out nearly section-by-section as stated in the RFC.
The Tiny C example is an extremely bare-bones subset of C, with a single primitive integer type and limited mechanisms. It demonstrates an example of what patterns can be used to parse C-family grammars with parser combinators.
Parsers are given four "tracks" of continuations: success/failure with consumed input, and "empty" success/failure (that is, without consumed output).
Most combinators that handle failures, only handle empty failures. The two exceptions are handle-rewind
and try!
, which saves the input stream's position and rewinds to an "empty" state before handling the failure.
A consumed success only matters to retrack all future empty continuations to consumed continuations (so a consumed success followed by an empty failure counts as a consumed failure).
This model is used by Haskell's parsec and megaparsec libraries. I tried to use a simple ok/fail model, but after writing the example JSON and Tiny C files, found that this model works best.
Return a parser that consumes nothing and returns the given value:
(with-input-from-string (s "abc123")
(list (parse (ok :hello) s)
(alexandria:read-stream-content-into-string)))
=> (:hello "abc123")
Return a parser that consumes nothing and fails, reporting the expected value.
Return a parser that consumes a character that satisfies the given predicate.
Return a parser that consumes the given character.
Return a parser that consumes a character in the given character bag.
Return a parser that consumes the given simple string. This parser may have consumed input on a failure.
Return a parser that consumes nothing and returns the given value (or nil
) if the input stream is exhausted.
Return a new parser that applies the given function to the parser's result, and then runs the parser the function returns. This function forms the basis of stringing multiple parsers together.
Return a parser that runs all given parsers, binds them all to their variables, evaluates the body, and then runs the parser the body returns.
Return a new parser that, on failure, applies the handler function to the parser's expected value and parse trace (as a list), and then runs the parser the handler returns.
handle
does not handle partial-parse failures, which can be recovered from via handle-rewind
.
Return a new parser that saves the stream's current position and, on failure, rewinds the stream, applies the handler function to the parser's expected value and parse trace (as a list), and then runs the parser the handler returns.
handle-rewind
only works when the parser is given a seekable stream.
Return a new parser that saves the stream's current position and, on failure, rewinds the stream before passing the failure down.
try!
only works when the parser is given a seekable stream.
Return a parser that strings together all given parsers and returns the last parser's result.
Return a parser that strings together all given parsers and returns the first parser's result.
Return a parser that strings together all given parsers and returns the second parser's result.
Return a parser that tries each given parser in order (until a partial-parse failure) and returns the result of the first successful parse.
If all parsers fail, then the parser error accumulates a list of all possible expected values.
Return a parser that runs the given parser until failure, and collects all results into a list.
Return a parser that runs the given parser once, keeps parsing until failure, and then collects all results into a list.
Return a parser that runs the given character parser until failure, and collects all characters into a string.
Return a parser that accepts a sequence of value-parser
input separated by sep-parser
input; such as values separated by commas.
Return a parser that keeps running until failure, and reduces its result into one value.
If initial-parser
is supplied, the parser may succeed without calling FUNCTION by returning INITIAL-PARSER's response.
Parse and pretend no input was consumed.
Keep parsing until failure, discard the results, and pretend no input was consumed.
Consume and return the number value of a digit.
Consume and return a natural number.
Define a parser as a function. It can then be referenced as a function designator.
Run a parser through a given stream and raise any failures as a parser-error
.
Parser errors are raised by parse
when a parser cannot recover from an error.
Parser error readers provide the line and column a parser ended at,
the return-trace of defparser
-defined parsers and the lines and columns each parser started at,
and an object that describes what the parser expected:
A return-trace is a list of (parser-name line column)
-structured objects detailing the state of the composite parser when it failed.
For both the parser-error and return-trace, lines start at 1
and columns start at 0
, and is initialized per parse call.