~sjm/notesviz_parse

389e9f2c8c954fe6feb7d99d974a167ddf2a1b35 — Sam Marshall 2 years ago 3f1e8bf master
clean up org file

add some extra context & headings
1 files changed, 35 insertions(+), 23 deletions(-)

M notesviz_parse.org
M notesviz_parse.org => notesviz_parse.org +35 -23
@@ 12,27 12,27 @@ I've generated a basic Elixir project, and I'll be "importing" the pre-existing 

I'm fairly new to literate programming, too - especially within org-mode. But I'm familiar enough with org-mode that I'm fairly comfortable. I think we can only see how this goes.

** tests
** Tests
:PROPERTIES:
:header-args: :remsh sam@sam-laptop :sname console :session console
:END:

#+begin_src elixir
NotesvizParse.Parser.org_link("[[test][another test]]")
NotesvizParse.Document.Org.org_link("[[test][another test]]")
#+end_src

#+RESULTS:
: {:ok, [link: [uri: "test", desc: "another test"]], "", %{}, {1, 0}, 22}

#+begin_src elixir
NotesvizParse.Parser.org_doc("one [[test][another test]]")
NotesvizParse.Document.Org.links("one [[test][another test]]")
#+end_src

#+RESULTS:
: {:ok, [link: [uri: "test", desc: "another test"]], "", %{}, {1, 0}, 26}

#+begin_src elixir
NotesvizParse.Parser.org_doc("one [[test][another test]] two [[test2][docs]]")
NotesvizParse.Document.Org.links("one [[test][another test]] two [[test2][docs]]")
#+end_src

#+RESULTS:


@@ 42,7 42,7 @@ NotesvizParse.Parser.org_doc("one [[test][another test]] two [[test2][docs]]")


#+begin_src elixir
NotesvizParse.Parser.org_doc("""
NotesvizParse.Document.Org.links("""
one [[test][another test]] two [[test2][docs]]

 blah blah


@@ 108,23 108,30 @@ end

** Parsing a link
:PROPERTIES:
:header-args: :noweb-ref parser-helpers
:header-args: :noweb-ref org-parser-helpers
:END:

*** Parser Combinators

While I've barely touched Elixir before, I'm a little familiar with parser combinators in other languages. So I had a google to see what was out there and the first library to pop up for Elixir was [[https://github.com/dashbitco/nimble_parsec][nimble parsec]]. This seems like as good a start as any, so I'll begin by adding this as a dependency to the library.

#+begin_src elixir :noweb-ref deps
{:nimble_parsec, "~> 1.0"}
#+end_src

*** Parsing a single link

I've never used this library before, though - so might need a little bit of play. I think the first goal is to parse just an org-mode link by itself. This looks like as follows

: [[target][name]]

where target refers to the file or place the link will lead to, and the name is the part generally displayed to the user.

**** A beginning (, middle, and end)

We can split the link into three quite neat sections which should always be there - the start, middle, and end of the link. They should always look as follows.


#+begin_src elixir
def start_link do
  string("[[")


@@ 143,8 150,9 @@ def middle_link do
end
#+end_src

**** The text in the link

 [[https://hexdocs.pm/nimble_parsec/NimbleParsec.html#ascii_string/3][ascii_string/3]] is a good way to slurp up text within a certain range of characters. I'll include an argument to ensure that we can use this in a pipe chain later.
[[https://hexdocs.pm/nimble_parsec/NimbleParsec.html#ascii_string/3][ascii_string/3]] is a good way to slurp up text within a certain range of characters. I'll include an argument to ensure that we can use this in a pipe chain later.


#+begin_src elixir :noweb yes


@@ 164,6 172,8 @@ It should be noted that this pattern disallows a =\= too, as you can see by chec

I'd like to tag specific uses of the link_text, so I'll wrap with some more specific functions.

**** A little refinement

#+begin_src elixir
def uri_text() do
  link_text()


@@ 176,6 186,7 @@ def desc_text() do
end
#+end_src

**** All together

With the link text declared, we can put together the full link parser. I'm ignoring the =[[][]]= parts and only collecting the =link_text= for later use.



@@ 190,20 201,21 @@ def org_link() do
end
#+end_src

This is an entryway fn, so it likely won't end up being defined here. We'll usually be parsing a file, not a single link. But this helps to debug and build the parser. We can always have multiple entrypoints, and this could abstract out into a function in the helpers.

org_doc is intended to parse its way through a whole document, extracting just the links. This should extract text until it hits a link, then parse the link, then extract text until it hits a link, so on and so forth.
*** The Entryway Parsers

These define parsers which accept a string and output data-structures. =:links= will consume an entire document - it says that somewhere in the body text there'll be at least one link. If there isn't, it will return an error saying so. This repeatedly works through the input, returning a list of all the links it can find.

*** The Parsers
=:link= is mostly present for testing purposes - it consumes a single org-mode link, and does nothing else with the input.

#+begin_src elixir :noweb-ref parse-link
#+begin_src elixir :noweb-ref parse-org-link
defparsec(
  :org_link,
  :link,
  org_link()
)

defparsec(
  :org_doc,
  :links,
  times(
    eventually(org_link()),
    min: 1


@@ 211,22 223,22 @@ defparsec(
)
#+end_src

** the Org document parsing module

** the parsing module

The module itself. I'm not usually a fan of names like =Parser=, but it's tough to know what else to call it. We're taking a link and...well, parsing it.
The module itself. This is =NotesvizParse.Document.Org=, which only parses
org links for now - but currently that is all that is necessary. In the future it can be expanded to parse a full document if necessary, and we can build other parsers such as =NotesvizParse.Document.Markdown=. This should be a better name than =Parser= alone.

Originally this was a single module, but the [[https://hexdocs.pm/nimble_parsec/NimbleParsec.html#defparsec/3][NimbleParsec docs]] suggest that a helper module is the prefered way to compose these little guys for a =defparsec= call.
The [[https://hexdocs.pm/nimble_parsec/NimbleParsec.html#defparsec/3][NimbleParsec docs]] suggest that a helper module is the prefered way to compose together functions into a single parser, so that's the separation I've made, simply adding =Helper= onto the end of the name.

#+begin_src elixir :noweb yes :tangle ./lib/notesviz_parse/parser.ex
defmodule NotesvizParse.Parser.Helper do
#+begin_src elixir :noweb yes :tangle ./lib/notesviz_parse/document_org.ex
defmodule NotesvizParse.Document.Org.Helper do
  import NimbleParsec
  <<parser-helpers>>
  <<org-parser-helpers>>
end

defmodule NotesvizParse.Parser do
defmodule NotesvizParse.Document.Org do
  import NimbleParsec
  import NotesvizParse.Parser.Helper
  <<parse-link>>
  import NotesvizParse.Document.Org.Helper
  <<parse-org-link>>
end
#+end_src