ref: a392e0ca08b87c942b28617460440606da68faf8 haunted-blog/posts/lri-log-2021w13.md -rw-r--r-- 8.4 KiB
a392e0ca — Aluísio Augusto Silva Gonçalves haunt: Reword license statement in the footer 5 months ago

{"title": "LRI operations report, 2021-W13" ,"date": "2021-04-05 13:00" ,"lang": "en" ,"tags": ["lorkep lri", "dn42", "network", "dns"] ,"toc": false} ...

This is the 29 March 2021–4 April 2021 report on the state and activities of the Lorkep Long-Range Interconnect, a virtual network and autonomous system operating on dn42. This week's highlight is the automation of DNS record generation for all nodes in the Lorkep network.

#DNS updates

The new release of dns.nix (previously nix-dns) brings two important features: PTR records and zone-wide TTL (via the $TTL control entry). Both features are now in use on the LRI's authoritative DNS servers.

On a parallel line, work on a new service status dashboard on Grafana was held back by the lack of DNS records for some of the application containers in the network. Until now, collected metrics were associated to a machine (an instance label, in Prometheus parlance) based either on the first DNS label of the address used to connect to the scrape target, or on a mapping of IP addresses to instance names [^instance-names]. With the expansion of metrics collection to all containers, the risk of a mismatch between the manually maintained instance name and however a container was identified across the network grew, and so it was decided to remove the IP address mapping and scrape metrics solely through DNS names.

To ensure that all application containers had DNS addresses, inspiration was taken from the approach used for Lorkep nodes. The Lorkep nixos-configurations repository contains a Nix file acting as registry for all nodes in the network; from this registry, nodes are assigned their IP addresses in the network and DNS records are generated for nodes acting as routers. These same actions are now performed for all application containers, based on a new registry file.

Furthermore, the new PTR record support combined with the new registries enabled the automatic generation of reverse DNS zones to replace static zone files.

As a bonus, here's a function that maps an IPv6 address into the corresponding reverse DNS domain name:

# SPDX-FileCopyrightText: 2021 Aluísio Augusto Silva Gonçalves
# SPDX-License-Identifier: MIT

    inherit (lib) concatLists concatStringsSep reverseList fixedWidthString drop pipe removeSuffix splitString stringToCharacters take;
    inherit (aasg-nixexprs.lib) indexOf;

    uncompressZeros = hextets:
        zeroAt = indexOf "" hextets;
        hextetCount = builtins.length hextets;
        compressed = builtins.genList (_: "0000") (8 - hextetCount + 1);
    if zeroAt == -1
    then hextets
    else (take zeroAt hextets) ++ compressed ++ (drop (zeroAt + 1) hextets);

    fillHextet = hextet: fixedWidthString 4 "0" hextet;
# Let's assume `zone` is "8.b.d." and `ip` is "2001:db8::1".
zone: ip: pipe ip [
    # Split the IPv6 address string into its colon-separated groups.
    # In our example, the result is [ "2001" "db8" "" "1" ].
    (splitString ":")
    # Expand the groups hidden by ::. Now we have
    # [ "2001" "db8" "0" "0" "0" "0" "0" "1"].
    # Each group represents 4 hexadecimal digits, so add missing zeros
    # to the left of each hextet:
    # [ "2001" "0db8" "0000" "0000" "0000" "0000" "0000" "0001"].
    (map fillHextet)
    # Split each string into its hexadecimal digits, so we end up with
    # [ "2" "0" "0" "1" "0" "d" "b" "8" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1"].
    (map stringToCharacters)
    # Reverse the array of characters.
    # Concatenate all the characters (separated by .) and append
    # ip6.arpa.  We now have the complete reverse DNS name:
    # "".
    (ip': ip' ++ [ "ip6" "arpa" ])
    (concatStringsSep ".")
    # Remove the zone name from the result because dns.nix will append
    # it regardless: "".
    (removeSuffix ".${zone}")


As the result of a misguided attempt at fixing the Lorkep LRI looking glass [^::1], bird-lg-go is now available for NixOS on dn42.nix, a Nix repository for software and services commonly used on dn42.

#Reproducible static websites

Most websites on Behemoth were switched from a mix of IPFS proxying and directory serving to serving directories in the Nix store, with Caddy's config for those sites loaded from a link managed by nix-env and included in the main Caddy config through an import directive. This approach enables website updates without triggering a full system build.

This required writing services.caddy.config to a file and overriding the command lines set in Caddy's NixOS module to use this config file directly rather than adapt the config to JSON at build time, as that results in Caddy trying to resolve the import and either breaking the build (because the link didn't exist) or eliminating the very dynamism that caused us to go with a link and an import in the first place.

Overall, the new Caddy config is as follows:

{ config, lib, pkgs, ... }:
  inherit (lib) mkForce mkMerge mkOrder;

  cfg = config.services.caddy;
  configFile = pkgs.writeText "Caddyfile" cfg.config;
  wwwConfig = "/srv/www/.nix-profile/Caddyfile";
  services.caddy.config = mkMerge [
    # I guess this is why NixOS needs to convert the config into JSON:
    # Caddy doesn't support more than one global options block, and
    # NixOS can't guarantee (due to backwards compatibility?) that the
    # user config doesn't have its own block, so it resorts to patching
    # the JSON config post-facto.
    (mkOrder 400 ''
        acme_ca ${cfg.ca}
        email ${cfg.email}
      import ${wwwConfig}

      # …other sites here…

  # Override the service command lines to prevent the config from being
  # adapted into JSON and choking on or eliminating imports during the
  # build.
  systemd.services.caddy.serviceConfig.ExecStart = mkForce
    "${cfg.package}/bin/caddy run --config ${configFile} --adapter caddyfile";
  systemd.services.caddy.serviceConfig.ExecReload = mkForce
    "${cfg.package}/bin/caddy reload --config ${configFile} --adapter caddyfile";

  # Reload Caddy when the www Caddyfile changes.
  systemd.paths.caddy-reimport = {
    after = [ "caddy.service" ];
    pathConfig.PathChanged = [ wwwConfig ];
  systemd.services.caddy-reimport = {
    description = "Caddy config reimport";
    serviceConfig.Type = "oneshot";
    serviceConfig.ExecStart = "/run/current-system/systemd/bin/systemctl try-reload-or-restart --no-block caddy.service";

A side effect of the new setup, though arguably the main motivator for it, is a reduced dependency on IPFS. There seems to be some sort of race condition or missing lock cleanup that causes the pre-start configuration executed on NixOS to hang and eventually be terminated by systemd. More than once I had to point aasg.name to Cloudflare to keep it running while I brought IPFS back up; that will no longer be a problem. [^dnslink]

#Task list for 2021-W14

  • Now that metrics from all nodes and containers is collected, work on the new service status dashboard can resume.
  • This should be a calm week, so there should be time to implant the new node mentioned on the previous report.
  • Maybe join recursive-servers.dn42 or delegation-servers.dn42.

For reasons not yet understood, on Charybdis a socket bound to ::1 can not be connected to. While binding to works, and is the solution that was put into place, it goes counter to what is practiced in the rest of the network.

aasg.name is pinned and its DNSLink updated by its own CI build and is not affected by this change. The X-Ipfs-Path HTTP header is lost, however.

The mapping actually goes from a container's IP address to its hosting node's name -- an artifact of the containerized services originally running directly on the node.