~sjm/notesviz_parse

3f1e8bf2a432c19c787e99cf3825ad5007cd3e4b — Sam Marshall 2 years ago e831011
chore: add fns for parsing org links
4 files changed, 213 insertions(+), 2 deletions(-)

A lib/notesviz_parse/parser.ex
M mix.exs
A mix.lock
M notesviz_parse.org
A lib/notesviz_parse/parser.ex => lib/notesviz_parse/parser.ex +50 -0
@@ 0,0 1,50 @@
defmodule NotesvizParse.Parser.Helper do
  import NimbleParsec
  def start_link do
    string("[[")
  end
  def end_link do
    string("]]")
  end
  def middle_link do
    string("][")
  end
  def link_text(), do: ascii_string(
        [{:not, ?[..?]}],
        min: 1
      )
  def uri_text() do
    link_text()
    |> unwrap_and_tag(:uri)
  end
  
  def desc_text() do
    link_text()
    |> unwrap_and_tag(:desc)
  end
  def org_link() do
    ignore(start_link())
    |> concat(uri_text())
    |> concat(ignore(middle_link()))
    |> concat(desc_text())
    |> concat(ignore(end_link()))
    |> tag(:link)
  end
end

defmodule NotesvizParse.Parser do
  import NimbleParsec
  import NotesvizParse.Parser.Helper
  defparsec(
    :org_link,
    org_link()
  )
  
  defparsec(
    :org_doc,
    times(
      eventually(org_link()),
      min: 1
    )
  )
end

M mix.exs => mix.exs +0 -2
@@ 9,14 9,12 @@ defmodule NotesvizParse.MixProject do
        deps: deps()
      ]
    end
  # Run "mix help compile.app" to learn about applications.
    def application do
      [
        extra_applications: [:logger],
        mod: {NotesvizParse.Application, []}
      ]
    end
    # Run "mix help deps" to learn about dependencies.
    defp deps do
      [
        {:nimble_parsec, "~> 1.0"}

A mix.lock => mix.lock +3 -0
@@ 0,0 1,3 @@
%{
  "nimble_parsec": {:hex, :nimble_parsec, "1.1.0", "3a6fca1550363552e54c216debb6a9e95bd8d32348938e13de5eda962c0d7f89", [:mix], [], "hexpm", "08eb32d66b706e913ff748f11694b17981c0b04a33ef470e33e11b3d3ac8f54b"},
}

M notesviz_parse.org => notesviz_parse.org +160 -0
@@ 12,6 12,55 @@ I've generated a basic Elixir project, and I'll be "importing" the pre-existing 

I'm fairly new to literate programming, too - especially within org-mode. But I'm familiar enough with org-mode that I'm fairly comfortable. I think we can only see how this goes.

** tests
:PROPERTIES:
:header-args: :remsh sam@sam-laptop :sname console :session console
:END:

#+begin_src elixir
NotesvizParse.Parser.org_link("[[test][another test]]")
#+end_src

#+RESULTS:
: {:ok, [link: [uri: "test", desc: "another test"]], "", %{}, {1, 0}, 22}

#+begin_src elixir
NotesvizParse.Parser.org_doc("one [[test][another test]]")
#+end_src

#+RESULTS:
: {:ok, [link: [uri: "test", desc: "another test"]], "", %{}, {1, 0}, 26}

#+begin_src elixir
NotesvizParse.Parser.org_doc("one [[test][another test]] two [[test2][docs]]")
#+end_src

#+RESULTS:
: {:ok,
:  [link: [uri: "test", desc: "another test"], link: [uri: "test2", desc: "docs"]],
:  "", %{}, {1, 0}, 46}


#+begin_src elixir
NotesvizParse.Parser.org_doc("""
one [[test][another test]] two [[test2][docs]]

 blah blah

ho more stuff [[http://link.com][link]] [[id:org id][yes]]

""")
#+end_src

#+RESULTS:
: {:ok,
:  [
:    link: [uri: "test", desc: "another test"],
:    link: [uri: "test2", desc: "docs"],
:    link: [uri: "http://link.com", desc: "link"],
:    link: [uri: "id:org id", desc: "yes"]
:  ], "\n\n", %{}, {5, 60}, 118}

** Config

A standard mix file, slightly deconstructed for neatness


@@ 58,6 107,9 @@ end
#+end_src

** Parsing a link
:PROPERTIES:
:header-args: :noweb-ref parser-helpers
:END:

While I've barely touched Elixir before, I'm a little familiar with parser combinators in other languages. So I had a google to see what was out there and the first library to pop up for Elixir was [[https://github.com/dashbitco/nimble_parsec][nimble parsec]]. This seems like as good a start as any, so I'll begin by adding this as a dependency to the library.



@@ 70,3 122,111 @@ I've never used this library before, though - so might need a little bit of play
: [[target][name]]

where target refers to the file or place the link will lead to, and the name is the part generally displayed to the user.

We can split the link into three quite neat sections which should always be there - the start, middle, and end of the link. They should always look as follows.

#+begin_src elixir
def start_link do
  string("[[")
end
#+end_src

#+begin_src elixir
def end_link do
  string("]]")
end
#+end_src

#+begin_src elixir
def middle_link do
  string("][")
end
#+end_src


 [[https://hexdocs.pm/nimble_parsec/NimbleParsec.html#ascii_string/3][ascii_string/3]] is a good way to slurp up text within a certain range of characters. I'll include an argument to ensure that we can use this in a pipe chain later.


#+begin_src elixir :noweb yes
def link_text(), do: ascii_string(
      [{:not, <<link-text-disallowed>>}],
      min: 1
    )
#+end_src

There's a whole specification for URIs which allows and disallows all sorts of characters, but as this is for personal use I'm not going to parse an URL properly right now - instead I'll just disallow =[]= for ease of parsing, and leave it at that. If it really comes to it and I need proper URI parsing I'll either find a NimbleParsec parser for them somebody else made or build my own (hopefully not the latter).

It should be noted that this pattern disallows a =\= too, as you can see by checking out an [[http://www.asciitable.com/][ascii table]]. But I'm not too upset, don't think we should use too many backslashes. I might need to fix this one day, though. That can sit with the above self-admonition.

#+begin_src elixir :noweb-ref link-text-disallowed
?[..?]
#+end_src

I'd like to tag specific uses of the link_text, so I'll wrap with some more specific functions.

#+begin_src elixir
def uri_text() do
  link_text()
  |> unwrap_and_tag(:uri)
end

def desc_text() do
  link_text()
  |> unwrap_and_tag(:desc)
end
#+end_src


With the link text declared, we can put together the full link parser. I'm ignoring the =[[][]]= parts and only collecting the =link_text= for later use.

#+begin_src elixir
def org_link() do
  ignore(start_link())
  |> concat(uri_text())
  |> concat(ignore(middle_link()))
  |> concat(desc_text())
  |> concat(ignore(end_link()))
  |> tag(:link)
end
#+end_src

This is an entryway fn, so it likely won't end up being defined here. We'll usually be parsing a file, not a single link. But this helps to debug and build the parser. We can always have multiple entrypoints, and this could abstract out into a function in the helpers.

org_doc is intended to parse its way through a whole document, extracting just the links. This should extract text until it hits a link, then parse the link, then extract text until it hits a link, so on and so forth.

*** The Parsers

#+begin_src elixir :noweb-ref parse-link
defparsec(
  :org_link,
  org_link()
)

defparsec(
  :org_doc,
  times(
    eventually(org_link()),
    min: 1
  )
)
#+end_src


** the parsing module

The module itself. I'm not usually a fan of names like =Parser=, but it's tough to know what else to call it. We're taking a link and...well, parsing it.

Originally this was a single module, but the [[https://hexdocs.pm/nimble_parsec/NimbleParsec.html#defparsec/3][NimbleParsec docs]] suggest that a helper module is the prefered way to compose these little guys for a =defparsec= call.

#+begin_src elixir :noweb yes :tangle ./lib/notesviz_parse/parser.ex
defmodule NotesvizParse.Parser.Helper do
  import NimbleParsec
  <<parser-helpers>>
end

defmodule NotesvizParse.Parser do
  import NimbleParsec
  import NotesvizParse.Parser.Helper
  <<parse-link>>
end
#+end_src