~huyngo/xrvs.net

ref: a12ba1d413ef192285e7130245501f057d0035ab xrvs.net/content/posts/2022-01-16-dict-1.md -rw-r--r-- 7.6 KiB
a12ba1d4Ngô Ngọc Đức Huy return to draft 4 months ago

#title: "Implementing DICT protocol: Part 1" date: 2022-01-16 lang: en categories: [ blog ] tags: [dict, dictionary, go, golang, rfc2229, tcp ] draft: true translationKey: "2022-01-16-Dict-1"

#DICT Protocol

What is DICT protocol?

The Dictionary Server Protocol (DICT) is a TCP transaction based query/response protocol that allows a client to access dictionary definitions from a set of natural language dictionary databases.

DICT Protocol - RFC 2229

Notable implementations for this include dict(d) and GNU dico(d); the former is the reference implementation that supports multiple database formats, as listed in dictfmt (1).

I intend to implement a server and multiple clients (CLI, GUI, web) to this protocol, as well as some tools to easily create a dictd-readable database.

#Why?

No practical reason, but dict is one of the first command line tool introduced to me and easily one of my favorite, along with curl and jq. It's basically just a dictionary app, but it's cool:

  • works perfectly in terminal
  • easily self-hostable
  • fast
  • has cool dictionaries (though only Debian, Arch and derivatives distribute those)

Also, I'm writing dictionaries for my conlangs and I want to distribute them via this protocol. Clearly, implementing a server that is already implemented doesn't help, but I tend to go down rabbit holes.

I also like to explore non-web protocols, and starting with something simple like DICT might be a good idea.

#Reading the spec

The spec (linked at the top of this post) is shorter and easier to read than I thought. Ignoring the introduction, examples and citation, it's les than 20 pages. There are five classes of commands:

  • Querying the database: DEFINE, MATCH
  • SHOW metadata about the servers and the databases
  • Utilities: informing CLIENT name, check STATUS, show HELP, show OPTION and QUIT
  • Authentication: AUTH and SASLAUTH

The authentication ones are optional, and I don't find that useful, so I won't implement it anyway, this limits to the first three categories.

#Handling TCP

DICT is based on TCP, and there is a neat interactive TCP tool called telnet, which I used for testing the commands.

#telnet

DICT runs on port 2628:

$ telnet dict.org 2628
Trying 199.48.130.6...
Connected to dict.org.
Escape character is '^]'.
220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <89168346.27665.1642303045@dict.dict.org>

Let's try out some commands to understand how this work. Note that I prefix the command with ~> here so that it stands out of the response, and truncate long results with [...].

Let's first show what databases there are

~> SHOW DB
110 166 databases present
[...]
.
250 ok

There are a lot of dictionaries here, including GCIDE, WordNet, The Jargon File, V.E.R.A., FOLDOC, but most of them are FreeDict dictionaries.

To a word, the syntax is

~> MATCH database strategy word

Strategy is how the server will match the word you're looking up. To list all strategies available, send the command:

~> SHOW STRATEGIES

There are various strategies supported by dictd, for example, substring, which matches if the entry has the queried word as substring:

~> MATCH jargon substring program
152 13 matches found
jargon "c programmer's disease"
jargon "cargo cult programming"
jargon "mickey mouse program"
jargon "perfect programmer syndrome"
jargon "program"
[...]
.
250 ok [d/m/c = 0/13/5775; 0.000r 0.000u 0.000s]

This command only show which words in the database, if any, satisfy the match, without showing the definition. To actually view a definition, one has to supply the dictionary name to the DEFINE command. Note that, you can also use * for both DEFINE and MATCH command, which will define/match for all dictionaries.

~> DEFINE * programming
150 3 definitions retrieved
151 "programming" wn "WordNet (r) 3.0 (2006)"
programming
    [...]
.
151 "programming" jargon "The Jargon File (version 4.4.7, 29 Dec 2003)"
programming
 n.

    [...]

.
151 "programming" foldoc "The Free On-line Dictionary of Computing (30 December 2018)"
programming

.
250 ok [d/m/c = 3/0/145; 0.000r 0.000u 0.000s]

That's a gist of how to look up words with DICT protocol. You can find more commands with:

~> HELP
[...]
.
250 ok

Finally, to end the session, the command is:

~> QUIT
221 bye [d/m/c = 0/0/0; 123.000r 0.000u 0.000s]

Note that, the response always ends with a period and a 250 ok response---this is equivalent to HTTP's 200 OK---except for QUIT. These response code are defined in the protocol specification.

Commands other than HELP has some additional statistics, though this is optional. I figured out that d means definitions, m means matches, and s is probably the time it took to query (why are they always zero, though?), but no clues on what c, r, and u mean. I might check the source code to figure that out, but let's leave it for another time.

#Go

Of course we are not going to make the users type these commands (though it's not too unintuitive and can be easily remembered). I chose Go to build the CLI client, though without any conscious consideration of fitness. I'm trying out new things[^0] after all.

From the doc, we can figure out how to make a TCP connection.

conn, err := net.Dial("tcp", "golang.org:80")
if err != nil {
	// handle error
}
fmt.Fprintf(conn, "GET / HTTP/1.0\r\n\r\n")
status, err := bufio.NewReader(conn).ReadString('\n')
// ...

Let's copy that and replace with DICT command instead of HTTP:

conn, err := net.Dial("tcp", "dict.org:2628")
if err != nil {
	panic(err)
}
defer conn.Close()
buf := bufio.NewReader(conn)
fmt.Fprintf(conn, "MATCH jargon word programming\n")
fmt.Fprintf(conn, "QUIT\n")

for {
	response, err := buf.ReadString('\n')
	if err != nil {
		// oftentimes this is EOF error
		fmt.Println(err)
		break
	}
	fmt.Printf(response)
}

Running this code, we get response:

220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <89266600.1914.1642341395@dict.dict.org>
152 4 matches found
jargon "cargo cult programming"
jargon "programming"
jargon "programming fluid"
jargon "voodoo programming"
.
250 ok [d/m/c = 0/4/3814; 0.000r 0.000u 0.000s]
221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s]
EOF

which is a good start.

There is a problem with this code: currently we are reading line by line, rather than reading the whole response for each command. We can't know if line 3 is response for the first command or the second this way. A solution is to check if the line is prefixed with a status code, but do we have a better solution?

Let's wait till next week!

[^0]: Not really, I've written a CLI client for Wiktionary API with Go before.