~huyngo/xrvs.net

3c7457a964ace6bd6cd396e5076aeca6ab7651f6 — Ngô Ngọc Đức Huy a month ago a2e6336 master
Add new post about DICT server
A content/posts/2023-04-11-dict-server.md => content/posts/2023-04-11-dict-server.md +171 -0
@@ 0,0 1,171 @@
---
title: "Writing a DICT (RFC 2229) server"
date: 2023-04-11
lang: en
categories: [ blog, development, hacking ]
tags: [dict]
translationKey: "2023-04-11-dict-server"
---

In last few weeks, I've implemented a minimal, barely compliant[^1]
[DICT][rfc2229] server called ExTra (also stylized ex.tra).  The server
implements the protocol as described in the linked specification, as well as
reading from existing text-databases used by other servers ([dictd][dictd] and
[GNU dico][dico]). I've done this as an exercise for learning Elixir, mainly,
and it lacks many features in comparison with the two other existing
implementations.

[rfc2229]: https://www.rfc-editor.org/rfc/rfc2229
[dictd]: https://github.com/cheusov/dictd
[dico]: https://puszcza.gnu.org.ua/software/dico/

## Spec summary

DICT is a simple protocol for looking up dictionaries over
<abbr>TCP</abbr>/<abbr>IP</abbr>.  It include a handful (well, more than
handful, if you include the optional authentication) commands like `MATCH` or
`DEFINE` to retrieve entries from a dictionary database.  The database format
is not defined, so technically, one can make it to work with, say, an SQL
database, but it's customary to use the format the reference implementation
(dictd) uses.

I largely built this based on [Elixir's guide for building a key-value
server][kv]

[kv]: https://elixir-lang.org/getting-started/mix-otp/introduction-to-mix.html

## Architecture

This diagram shows a very rough and simplified architecture overview of ExTra.
Yea, I know, it looks ugly, but organizing a diagram is hard, so I'll describe
it in details below.

![Architecture overview of ExTra](/images/extra-arch.png)

The processes[^2] are supervised by the application supervisor, and are
respawned once they crash. There are three main processes concerned here:

- `ExTra.Server`: This one accepts connections from the dedicated port
- `Task.Supervisor`: This one is a supervisor that will spawn new processes
    that will bind to clients until disconnected.
- `ExTra.Dict` is a [GenServer][gensrv] that executes commands sent to the
    server, and reads from dictionaries to generate response.  Note that
    GenServer is not a TCP/IP server.

[gensrv]: https://hexdocs.pm/elixir/GenServer.html

## TCP server

The <abbr>TCP</abbr> server is implemented using erlang's `:gen_tcp`:

```elixir
{:ok, client} = :gen_tcp.accept(socket)
```

Here the `client` is a connection to client. We send and receive data via this
process.  To read from this until disconnection, we would put it on an infinite
loop (`acceptor` in the diagram).  However, that also means the server is
locked to that client and cannot accept another---you probably have learned
this from a network class implementing an echo server in C.  This is where the
task supervisor comes in: instead of running directly into the infinite loop,
we spawn a `Task` that does it:

```elixir
Task.Supervisor.start_child(ExTra.ConnSupervisor, fn -> serve_first(client, host) end)
```

In the loop, commands are parsed and run, something like
`ExTra.Command.parse(data)` and `ExTra.Command.run(commands)`
The `ExTra.Command` module then sends these commands to `ExTra.Dict` GenServer
to execute it, something like `ExTra.Dict.command(ExTra.Dict, command)`.

## GenServer

The commands are sent to the `GenServer` and handled by some other modules:

```elixir
@impl true
def handle_call({:define, dictionary, word}, _from, state) do
  {:reply, ExTra.Dict.Define.define(dictionary, word), state}
end

def handle_call({:match, dictionary, strategy, word}, _from, state) do
  {:reply, ExTra.Dict.Match.match(dictionary, strategy, word), state}
end
```

This level of abstraction may seems a bit convoluted, but using GenServer here
would allow for caching matches and definitions, and separating matches and
definitions to a separate modules allow for different search modules depending
on config.  Not a necessary thing, just for educational purpose.

## Matching definitions

The `.dict` file stores entries as well as metadata as plain text, while the
`.index` file store positions of the entries as:

```
<entry> <start> <length>
```

Where `<start>` and `<length>` are in quartosexagesimal or base 64.  Numeral
base 64, not base 64 encoding that is implemented in the standard library.
After a few shortening, the conversion is done in less than 20 lines:

```elixir
def base64num(num) do
  alphabet =
    'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
    |> Stream.with_index()
    |> Enum.into(%{})

  len = String.length(num)

  num
  |> String.to_charlist()
  |> Stream.with_index()
  |> Stream.map(fn {c, i} -> {alphabet[c], len - i - 1} end)
  # Left-shift 6 * power bits is equal to multiply by (2^6)^power, but faster
  |> Stream.map(fn {digit, power} -> Bitwise.bsl(digit, power * 6) end)
  |> Enum.sum()
end
```

Firstly, I mapped the digits in the alphabet to its respective values, which
are also their indices in the char list.  The digits in the input string are
then mapped to their values based on this map, while their indices are mapped
to the power.  Finally, these values are powered and summed as the answer.

Upon reading these values, the definition from the `dict` files can be
retrieved as simply as:

```elixir
# the content that comes before what we need
_ = IO.read(file, start)
# has to `binread` to interpret the UTF-8 encoded characters
IO.binread(file, length)
```

## Further

The full implementation can be found on [SourceHut][src].  As this is the first
application I've written in Elixir, I'm sure there's a lot of stuff I've
written here isn't recommended, so if you have some suggestion for improvement,
please send me.

As of writing, there are still several features I'd like to implement that I
haven't, such as:

- Proper response for `STATUS` and `OPTION MIME` commands
- Implement more matching strategies
- Make uses of the GenServer's state for caching matches

[src]: https://git.sr.ht/~huyngo/ex.tra

[^1]: The `OPTION MIME` command is not yet compliant, actually. It currently is
  a no-op command while it should check `00-database-mime-header` in the file
  and respond an empty line if not present.
  However, none of the clients (that is, only `dict` and `dico`, as far as I
  know) does anything with that field, so it doesn't cause any trouble.
[^2]: To be understood as <abbr title="erlang virtual machine">BEAM</abbr>
  processes and not <abbr title="operating system">OS</abbr> processes

A static/images/extra-arch.png => static/images/extra-arch.png +0 -0
A static/images/extra-arch.uml => static/images/extra-arch.uml +28 -0
@@ 0,0 1,28 @@
@startuml
actor client
node "Application Supervisor" as sup
node Task.Supervisor as task
node "ExTra.Server" as server
node "loop_acceptor" as accept1
node "loop_acceptor" as accept2
component "ExTra.Command" as cmd
node "ExTra.Dict" as dict
collections "dictionaries" as dicts
cloud "network"

sup -right-> task
sup -up--> dict
sup -down--> server

server <-down- network : "listen for new connections"
server .right.> accept2 : "spawn new loop after first acceptor is occupied"

task -> accept1
task -> accept2

dict <-up- dicts : "reads"

client -up--> accept1 : "send command"
accept1 -up-> cmd : "parse command"
cmd <-up-> dict : "execute command"
@enduml

M themes/anibus => themes/anibus +1 -1
@@ 1,1 1,1 @@
Subproject commit 2884ad30ade0dd470cd1cac3646553ea6922144f
Subproject commit 52b3ef1bb34ab48421de277e9e9da7f53f49ed2b