~williamvds/website

a553bd0293de7e629dc576e498fb34fc60382f47 — williamvds 1 year, 10 months ago 54454d9
blog: Add A Journey Into Nix
1 files changed, 1400 insertions(+), 0 deletions(-)

A content/blog/a_journey_into_nix.md
A content/blog/a_journey_into_nix.md => content/blog/a_journey_into_nix.md +1400 -0
@@ 0,0 1,1400 @@
+++
title = "A Journey into Nix and NixOS"
description = """
I chronicle my introduction to and experiments with using a rather radical
GNU/Linux operating system on my webserver
"""
slug="journey-into-nix"
date = 2022-10-21
+++

Before I started this journey, I ran this website and a few personal services on
the [cheapest VPS OVH offer](https://www.ovhcloud.com/en-gb/vps), but the
limited disk space has frequently given be trouble. 20GB isn't a lot to work
with when you're using it to synchronise and stash personal files. I planned to
take cancel its upcoming renewal, and instead migrate to another VPS provider
that provides a bit better value for money. However, this introduces the problem
of manually migrating and setting things up again. Naturally, having done most
of the setup over a year ago and I'd forgotten most of it. I wasn't keen on
repeating that mistake: I wanted a proper automated deployment method. One that
doesn't rely on me doing everything manually, or rely on all the knowledge
staying in my head.

While Ansible and similar are the "industry-standard" tools for doing this. I've
seen Ansible used at a previous job, and witnessed how slow it can be, and now
messy it can get with its piles of YAML. Needless to say, I wasn't left with the
best impressions, and wasn't too keen on it for personal use. I'd came across
Nix online through articles and discussions, including [Xe Iaso's blogs and
talks](https://christine.website/talks/nixos-pain-2021-11-10), and [the Self
Hosted podcast](https://selfhosted.show/83). Its approach sounded novel and
interested me, so I took the server migration as an opportunity to experiment
with it.

I wrote most of this article as I went along, and it chronicles most of my
trial-and-error in creating a NixOS. It's mostly a stream of conciousness,
serving primarily as a record of what I struggled with and thoughts that I had
along the way. In a subsequent article, I'll write up my perspective on Nix and
NixOS and what I'd like to see from it in the future. That one will hopefully be
less ramble-y and more interesting to the average reader.

It's been a long road, I've been working on it on and off for months now,
starting in May 2022 and continuing until just before this article was
published. Thankfully I consider this experiment a success, my data and services
have all been migrated to the new server before my old VPS was up for renewal
renewal, as I'd planned. It's been quite messy and frustrated at times, but I'm
quite happy with how things turned out. I think all this work is a "once and for
all" effort to ensure I don't have to manually set up my server again. I'm also
reasonably happy with Nix, and I think I'll be using it more in the future.

## Preparatory reading

As with most things, I initially started with some web searches, and came across
[How to Learn Nix](https://ianthehenry.com/posts/how-to-learn-nix/), which I
ended up reading through quite a lot of.

I additional came across [Nix Pills](https://nixos.org/guides/nix-pills/), which
I think I briefly skimmed, and the [Nix Wiki](https://nixos.wiki), which I
initially referred to frequently while getting started.

## Installing on my Arch computer

Fairly straightforward, it's available in Arch's community repository, so I
installed it through pacman. After hitting some errors, I followed the wiki's
guide and added the nixpkgs-unstable channel.

Then looked at managing user config files. I found home-manager but sounded like
way more than I need - I want to keep my dotfiles for portability when I can't
use Nix, I just need a little Nix expression for installing the few config files
that need to live outside of ~/.config as symlinks.

A user Nix config for installing applications would be neat, but I decided to
postpone until I'm further along. I decided to make my first goal reproducing my
webserver configuration with Nix. So on to setting up a playground environment
before investing in another webserver.

## Raspberry Pi

I have a Raspberry Pi free for some experimentation, so I thought I'd use it as
a testbed in place of a real VPS.

The wiki article suggests downloading and writing the SD card, but it's a
graphical installer. I don't have a micro HDMI for the Pi so I want a headless
installation with SSH (which is what I used with Arch on ARM). I assume I need
to build my own image as suggested in the wiki, including my ssh key. Initially
Followed the wiki article, using cross-compilation.

```nix
{ ... }: {
  nixpkgs.crossSystem.system = "aarch64-linux";
  imports = [
    <nixpkgs/nixos/modules/installer/sd-card/sd-image-aarch64.nix>
  ];
  users.extraUsers.root.openssh.authorizedKeys.keys = [
    "ssh-rsa ...."
  ];
}
```

```sh
$ nix-build --system aarch64-linux --keep-failed --store /data/nix-raspberrypi '<nixpkgs/nixos>' -A config.system.build.sdImage -I nixos-config="$HOME"/.config/nix/raspberrypi-sdcard.nix -I nixpkgs=channel:nixos-21.11-aarch64
```

What does the command mean? Strangely it wanted to rebuild a lot of
applications. Why isn't it downloading everything from the cache? First
roadblock was an error in liburing, but hmm this appears to be building fine on
hydra. I find the `--keep-failed` option to actually inspect what's wrong. Some the
post install script referred to a nonexistent file. Tried adding the nixos-21.11
and nixpkgs-21.11-aarch64 channels - no change. Found the `--dry-run` option - why
are the hashes on hydra different to my nix-build? (Was it because I didn't pass
`--pure`?). I followed some of my previous reading material and managed to
override the liburing derivation using `overrideAttrs`: 

```nix
{ ... }: {
  nixpkgs.crossSystem.system = "aarch64-linux";
  imports = [
    <nixpkgs/nixos/modules/installer/sd-card/sd-image-aarch64.nix>
  ];
  nixpkgs.overlays = [
    /*
    (self: super:
      {
        liburing = super.liburing.overrideAttrs (attrs: {
          postInstall = ''
            # Copy the examples into $bin. Most reverse dependency of this package should
            # reference only the $out output
            mkdir -p $bin/bin
            cp ./examples/io_uring-cp examples/io_uring-test $bin/bin
            cp ./examples/link-cp $bin/bin/io_uring-link-cp
          '';
        });
      })
    */
  ];
  users.extraUsers.root.openssh.authorizedKeys.keys = [
    "ssh-rsa ...."
  ];
}
```

Also try the undocumented `--store` to override the store because my root drive is
running out of space. This causes issues later, so I clear up some space and use
the default `/nix/store` instead

More building...

Why am I building the Linux kernel? And the AMD drivers and Nouveau? I don't
need those. I hit "no space on device" errors, my tmpfs is running out of space
building all of this stuff.
Find some GitHub issues that mention remounting `/tmp` and resizing to 14GB.
Fine, I do that. (Later someone mentions that it actually uses `$TMPFS` rather
than hardcoding `/tmp`, so I could have used a different filesystem.)

Find the `nix-store --read-log` option, which is quite neat. Before finding it,
I ran builds several times over after losing the tmux scrollback buffer (and
later increasing the tmux scrollback limit).

More building...

Make: error 2 during linking(?)
At this point I was lurking in the NixOS on ARM channel, someone mentioned to
someone else that cross compilation isn't reliable. Suggests the qemu-user
method. Fine, time to try that.

I install `qemu-user` and the `binfmt` packages from the AUR, add this to
`/etc/nix/nix.conf`:

```
extra-sandbox-paths = /usr/bin/qemu-aarch64-static
extra-platforms = aarch64-linux
```

Still get some errors. I try overriding `nixpkgs.system` and `system`, no dice.
Apparently `--argstr system aarch64-linux` isn't working right now. Find another
undocumented option, `--system`, which does appear to work.

Later I find out (from the chat?) the distributed SD card image does support
SSH, you just need to create `/root/.ssh/authorized_keys` on the flashed system.
So that whole exercise was a waste of time.

I flash the SD card, plug Pi and Ethernet in, no green light, but lights on
Ethernet. Pi debugging ensues, replugging the SD card before Ethernet eventually
works. Pi shows up on the router's web UI, but isn't assigned an IP address.
DHCPD issue? I have no idea, and I'm not sure how to debug. I give up, maybe I
should try a VM, which should be simpler, right?

# Building a QEMU image

I find a couple of guides for making VMs with Nix:
* https://gist.github.com/tarnacious/f9674436fff0efeb4bb6585c79a3b9ff
* https://gist.github.com/573/c1d73a4fd04b8f8ca63885393856f9ea

I also find a link to the [VM Nix
module](https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/virtualisation/qemu-vm.nix),
which as all the available options. I ideally want no graphical window, just a
shell.

Why isn't `nix-build -A vm '<nixpkgs/nixos>' --arg configuration "{ imports = [
<nixpkgs/nixos/modules/virtualisation/build-vm.nix> ]; }" -I
nixos-config=vm.nix` working? Works if I put everything in `vm.nix` and add the
`build-vm.nix` import. Probably the `configuration` argument being ignored.

VM template:

```nix
{ pkgs, ... }:

{
  imports = [
    <nixpkgs/nixos/modules/profiles/qemu-guest.nix>
    <nixpkgs/nixos/modules/virtualisation/qemu-vm.nix>
  ];

  config = {
    system.stateVersion = "22.05";

    fileSystems."/" = {
      device = "/dev/disk/by-label/nixos";
      fsType = "ext4";
      autoResize = true;
    };

    boot = {
        growPartition = true;
        kernelParams = [ "console=ttyS0" ];
        loader.grub.device = "/dev/vda";
        loader.timeout = 0;
    };

    users.extraUsers.root.password = "";

    services.getty.autologinUser = "root";

    networking.hostName = "vm";

    virtualisation = {
      cores = 4;
      graphics = false;
      qemu.options = [ "-serial mon:stdio" ];
      # Forward any needed ports
      forwardPorts = [
        { from = "host"; host.port = 8080; guest.port = 80; }
      ];
    };

    nix.extraOptions = "
      extra-experimental-features = nix-command flakes
    ";
  };
}
```

Runs successfully with:

```sh
$ nix-build '<nixpkgs/nixos>' -A vm --arg configuration ./vm.nix
$ ./result/bin/run-vm-vm
```

> I'm not sure how to break down the `nix-build` call. Include `<nixpkgs/nixos`
> and `vm.nix`, evaluate the expression `vm`? What is the `vm` attribute? Where
> does it come from?

# Writing a Nix module for my webserver

## Nextcloud

I first find [the wiki page](https://nixos.wiki/wiki/Nextcloud), but it's pretty
barren. It doesn't even link to manual page. I follow the next search result and
find the manual section.

Setup is mostly straight-forward, most of the time spent trying to create these
secret files on the disk. Required a bit more exploration of the Nix language.

I initially think I could manage them by overriding options/attribtues when
calling `nix-build`. However the `nix-build` switch `--option` appears to refer
to Nix options, i.e. things that go in `/etc/nix/nix.conf` `--arg` and
`--argstr` appear to be function arguments, i.e. things specified in `{ arg }:`
at the start of a module

Is there a point in exploring this approach, if it'll be in plaintext in the
store anyway?
In the next section, I look into NixOps and if/how it handles secrets sensibly.

Selecting Nextcloud apps is a bit janky, I was expecting it to just need the app
name, but it wants you to manually specify the download URL, hash, and all that
jazz manually. I assume its for reproducibility, and no one's created packages
for Nextcloud apps yet.

## Dealing with secrets

Sorting this out probably going to be unavoidable if I want to use Nix
"properly", this is, running one command to completely reproduce a system with
virtually zero manual work required.

I find [the wiki article on the subject](https://nixos.wiki/wiki/Comparison_of_secret_managing_schemes)

Lots of options - none of them ideal and zero extra effort. Proper secret
management within Nix and the Nix store has been on the back-burner for a long
time: https://github.com/NixOS/nix/issues/8.

My ideal: hook into pass, my password manager, to retrieve my secrets. Let me
specify the required ones in my configuration, and deploy them when I'm running
NixOps or whatever, with appropriate (and configurable) file permissions.
Let them be persistent so I don't have to redeploy them every time, and clean up
older ones if I remove them from the config.

## NixOps

Okay, the default NixOps key system looks fine. You can override where keys are
saved so I can create them in `/secrets` or something. I'm thinking to use
NixOps to deploy to real servers anyway, so I'm gonna have to learn it at some
point. Looks like it has some features for delaying/restarting systemd units
when secrets change or are made available, which is handy.

What is up with the [NixOps manual on
nixos.org](https://nixos.org/nixops/manual/)? It has very little info, is
lacking context on _how_ to write NixOps configuration files and load commands.

Proper manual appears to be on
[releases.nixos.org](https://releases.nixos.org/nixops/nixops-1.7/manual/manual.html)

Can I override the SSH identity file used or do I really need to adjust my
~/.ssh/config?

I type in the host name as I would if I were running `ssh <host>`, but for some
reason not all SSH options are picked up. E.g. I need to set
`deployment.targetPort`, otherwise NixOps tries to connect to port 22.

`$ nixops deloy` is building stuff - is it using the version of nixpkgs I'm
running NixOps from? Do I need to create another deployment to reload the NixOps
network file?

No, it looks okay in the end. But hits some errors at the end of the deploy, I
assume running post-update GRUB hooks?

```
vm> updating GRUB 2 menu...
vm> installing the GRUB 2 boot loader on /dev/vda...
vm> Installing for i386-pc platform.
vm> /nix/store/4bmcgmin11ixrn2ij2jjinlddcxql1g6-grub-2.06/sbin/grub-install: warning: File system `ext2' doesn't support embedding.
vm> /nix/store/4bmcgmin11ixrn2ij2jjinlddcxql1g6-grub-2.06/sbin/grub-install: warning: Embedding is not possible.  GRUB can only be installed in this setup by using blocklists.  However, blocklists are UNRELIABLE and their use is discouraged..
vm> /nix/store/4bmcgmin11ixrn2ij2jjinlddcxql1g6-grub-2.06/sbin/grub-install: error: will not proceed with blocklists.
vm> /nix/store/lc7cvzrgsxks87g3jnjbg8pg2pqn15vh-install-grub.pl: installation of GRUB on /dev/vda failed: No such file or directory
vm> error: Traceback (most recent call last):
  File "/nix/store/4zh7crx1r2sizvpyb1c9h109yfimlzn2-python3-3.9.12-env/lib/python3.9/site-packages/nixops/deployment.py", line 893, in worker
    raise Exception(
Exception: unable to activate new configuration (exit code 1)
```

Why is it installing for i386? My VM is x86_64. Why is detecting the filesystem
as ext2? `/dev/vda` is ext4.

I assume there are errors from updating within the VM itself, and probably not
NixOps' fault. It just happened to update some packages.

Further research: `grub-install` is apparently trying to update an MBR partition
with UEFI stuff? Try to force the VM to use UEFI with GRUB by inspecting [the
qemu
module](https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/virtualisation/qemu-vm.nix).

I find a related option, and try setting `virtualisation.useEFIBoot = true`: get
the following error:

```
qemu-kvm: -drive if=pflash,format=raw,unit=1,file=: A block device must be specified for "file"
```

Further inspection: there is a dependency between `useEFIBoot` and
`useBootLoader` (the latter defaults to `false`)

I take a brief detour to enable KVM in my BIOS.

With the new changes, the VM just boots into the UEFI shell, instead of directly
to the system.

Back to the normal VM. It's quicker now due to the KVM modules loaded, but of
course back to the same NixOps error. 

Maybe I should create VMs through NixOps instead, which will be better
supported? How about a quick detour and using
[sops-nix](https://github.com/Mic92/sops-nix) instead?

(In hindsight, this detour was not quick at all)

## sops-nix and Flakes

Again, not ideal because I'd prefer to just use passwords stored in `pass`. But
I understand why sops is preferable when it comes to Nix - the encrypted secrets
can be used as an input to the system configuration. For now it seems like one
of the easier options to go for. I like that it apparently hooks into
`nixos-rebuild` and the like.

Quick question, how do I use `nixos-rebuild` when I'm not on NixOS? [Turns out
it's packaged in nixpkgs, duh](https://github.com/NixOS/nixpkgs/issues/44135). I
run it with `$ nix-shell -p nixos-rebuild`

sops-nix requires the target host's SSH key which is rather annoying, since it
means having to deploy after creating a machine. And some manual work involved
in extracting the SSH public key of new machines.

Setup guide is rather convoluted and complex, due to multiple encryption
methods: using age keys, converting GPG or ssh keys to age keys.

Try to copy sops-nix example, so I guess I'm trying out flakes now:

```nix
{
  inputs = {
    nixpkgs.url = "nixpkgs/nixpkgs-unstable";
    sops-nix.url = "github:Mic92/sops-nix";
    sops-nix.inputs.nixpkgs.follows = "nixpkgs"; # use the same nixpkgs from this flake's input, this seems hacky
  };

  outputs = { self, nixpkgs, sops-nix }: {
    nixosConfigurations."vm" = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [
        ./vm.nix
        sops-nix.nixosModules.sops
      ];
    };
  };
}
```

Try to deploy with `$ nixos-rebuild switch --target-host vm --flake '.#vm'`

```
error: getting status of '/nix/store/ppm1c1rch1p6w260kpfxma6kww8wkfdf-source/nix/flake.nix': No such file or directory
```

Okay, turns out I need to `git add flake.nix` for some reason.

```
error: cannot look up '<nixpkgs/nixos/modules/profiles/qemu-guest.nix>' in pure evaluation mode (use '--impure' to override)
```

I guess I need to explicitly add the nixpkgs flake?
Do flakes change the way imports work? How to I translate a `<nixpkgs/...>`
import into an expression?

Now I try to rebuild my VM, but now I'm getting a new error:

```
error: The option `sops' does not exist. Definition values:
       - In `/home/william/.config/nix/vm.nix':
           {
             defaultSopsFile = /home/william/.config/nix/secrets/common.yaml;
             secrets = {
               "nextcloud/admin" = { };
               "nextcloud/database" = { };
```

Durr, this is because I'm still running `nix-build` but I'm using the sops
option in my configuration. I assume I need to convert this to the new `nix
build` command to make use of Flakes? Let's give this a shot.

[The VM
module](https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/virtualisation/build-vm.nix)
creates `system.build.vm`, which is probably what I'm currently invoking with
`nix-build`. The `nix build --help` has the following example:

> Build a NixOS system configuration from a flake, and make a profile point to the result:

```
# nix build --profile /nix/var/nix/profiles/system \
  ~/my-configurations#nixosConfigurations.machine.config.system.build.toplevel
```

That looks pretty similar! Surely I can change `~/my-configurations` and
`machine` appropriately, replace `system.build.toplevel` with `system.build.vm`,
and that's my new build command?

```
$ nix build .#nixosConfigurations.vm.config.system.build.vm
warning: Git tree '/home/william/.config' is dirty
error: cannot look up '<nixpkgs/nixos/modules/profiles/qemu-guest.nix>' in pure evaluation mode (use '--impure' to override)

       at /nix/store/h7ncs2xdj5nmfwfrgnhxv4wr0q6v6fqd-source/nix/vm.nix:8:5:

            7|   imports = [
            8|     <nixpkgs/nixos/modules/profiles/qemu-guest.nix>
             |     ^
            9|     <nixpkgs/nixos/modules/virtualisation/qemu-vm.nix>
(use '--show-trace' to show detailed location information)
```

Progress? Now, how to import these nixpkgs modules... they're not exposed
through the nixpkgs flake, are they?

The Flakes wiki page has a [Making your evaluations
pure](https://nixos.wiki/wiki/Flakes#Making_your_evaluations_pure) section, this
suggests I need to somehow download these nixpkgs modules that I'm trying to
import? Can I import a module within a flake's repository directly? Surely the
entire repository is already in the nix store anyway?

Yes!

> I turns out that I've had a incorrect presumption. I don't grasp the reason
> but even in the case of being a flake input inputs.home-manager will still
> return the store path of the home-manager flake. Which means I can still use
> it as an ordinary non-flake input That's Great
<cite>
[A
comment](https://www.reddit.com/r/NixOS/comments/tt56cw/how_to_use_a_flake_input_as_nonflake_input/i313skw/?context=3) by ykis-0-0 on reddit.com/r/NixOS
</cite>

This hint led me to add a trace to within the `outputs` function in
`flake.nix`, and lo and behold, it just points to the store path for nixpkgs!

```nix
{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";

    sops-nix.url = "github:Mic92/sops-nix";
    sops-nix.inputs.nixpkgs.follows = "nixpkgs"; # nasty hack to use the same nixpkgs from this flake's input
  };

  outputs = { self, nixpkgs, sops-nix }: {
    nixosConfigurations."vm" = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [
        (nixpkgs + "/nixos/modules/profiles/qemu-guest.nix")
        (nixpkgs + "/nixos/modules/virtualisation/qemu-vm.nix")
        ./vm.nix
        sops-nix.nixosModules.sops
      ];
    };
  };
}
```

Not ideal, since now the import has moved outside of the `vm.nix` module which
actually needs these imports.

Question: Can I somehow move this import to inside the module?  
(I found out the answer much later: Yes! Using `specialArgs`)

Getting another error now:

    $ nix build .#nixosConfigurations.vm.config.system.build.vm
    warning: Git tree '/home/william/.config' is dirty
    error: getting status of '/nix/store/2awpx4cw6r4y4zxd3rnx93xq4h2mb9l3-source/nix/common.nix': No such file or directory
    (use '--show-trace' to show detailed location information)

If I explore this store, `common.nix` is indeed missing, but most of the files
are there! Also, this store contains a snapshot of my entire Git repository,
most of which isn't needed by the Nix stuff. But that's suspicious - is it
because `common.nix` isn't staged yet? Why indeed, it isn't. Is this what the
warning is about? Maybe that warning should also specify that only staged files
will be used during the build, and especially check special files like
`flake.nix`, it's confusing for builds to immediately fail with an error that
doesn't make sense unless you know that it's copying Git-tracked files to the
Nix store. I have everything ignored by default in my Git repository (because it
lives in `~/.config`, which many applications dump files into), so I hit this
problem repeatedly.

Finally, success! I've now converted to flakes, time to test out sops-nix again.

I can see sops-nix trying to do something at the start, but since I've wiped my
VM its ssh-key has changed:

    setting up secrets...
    sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint cb3e5e4ce4278da9a1bb7beb7ff790665581e7ad
    /nix/store/6a7866mckid9n3s8bvanp5ag5fk2awyl-sops-install-secrets-0.0.1/bin/sops-install-secrets: Failed to decrypt '/nix/store/p2inyam4wdxgljisqx035y1i8f85npi5-common.yaml': Error getting data key: 0 successful groups required, got 0
    Activation script snippet 'setupSecrets' failed (1)

Time to redo part of the sops-nix setup. I get why it's done this way - using
the host's SSH key is a neat idea - but I don't like that it's a manual task
that needs to be done after initially booting a system. Maybe it's more of a
problem with these throwaway VMs that I'm using.

Updated the age key for the VM, then ran `sops updatekeys <file>`. New error on
boot:

    setting up secrets...
    sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint cb3e5e4ce4278da9a1bb7beb7ff790665581e7ad
    /nix/store/6a7866mckid9n3s8bvanp5ag5fk2awyl-sops-install-secrets-0.0.1/bin/sops-install-secrets: Failed to decrypt '/nix/store/zl4yjsw14if8wqrblnf1lk50y01zvl6w-common.yaml': Error getting data key: 0 successful groups required, got 0
    Activation script snippet 'setupSecrets' failed (1)

Is sops-nix trying to use the host's RSA key? I thought the age method only
supported the ed25519 key?
My mistake: the sops-nix guide says to set the following:

```
sops.age.sshKeyPaths = [ "/etc/ssh/ssh_host_ed25519_key" ];
```

Also, I was accidentally using my actual host's SSH keys because I ran
`$ ssh-keyscan localhost -p 2222` instead of `$ ssh-keyscan -p 2222 localhost`.
That one's on `ssh-keyscan`.  
On top of that, `ssh-keyscan` doesn't detect the ed25519 keys on the VM for some
reason, so I had to copy the public key manually and pipe it through to
`ssh-to-age`. One last `$ age updatekeys ...`, and now there are secrets!

    setting up secrets...
    sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint cb3e5e4ce4278da9a1bb7beb7ff790665581e7ad

    ...

    $ ll /run/secrets/
    total 4
    drwxr-x--x 2 root keys  0 May 29 12:59 nextcloud
    -r-------- 1 root root 30 May 29 12:59 smtp

Went back and amended Nextcloud configuration to use sops-nix instead of
Nix options.

Finally, time to get back to the task at hand.

## Quick diversion: setting up a Nix formatter

Initially running `nix fmt` gives a somewhat helpful (but not exactly
user-friendly) error:

    warning: Git tree '/home/william/.config' is dirty
    error: flake 'git+file:///home/william/.config?dir=nix' does not provide attribute 'formatter.x86_64-linux'

It should probably indicate to check `nix fmt --help`, because that page has a
few examples to copy-paste from. I chose `nixpkgs-fmt` and added this to the
`output` set in my `flake.nix` as directed:

```nix
{
  output = {
    formatter.x86_64-linux = nixpkgs.legacyPackages.x86_64-linux.nixpkgs-fmt;
  };
}
```

## Making a package: systemd-failmsg

I use [systemd-failmsg](https://github.com/dino-/systemd-failmsg) to get emails
when a service fails on my webserver. I definitely want it on my new server, but
it's as-yet unpackaged. So I guess I need to create a proper Nix package for it?

Let's see [this wiki
page](https://nixos.wiki/wiki/Nixpkgs/Create_and_debug_packages#Rough_process_for_creating_a_package)
It basically says "look for something similar in nixpkgs and copy that". Which
isn't ideal. Some basic templates would be nice.

Okay, I've copied something that looks reasonable:

```nix
{ stdenv, lib, fetchFromGithub, ... }:

stdenv.mkDerivation rec {
  pname = "systemd-failmsg";
  version = "1.3";

  src = fetchFromGithub {
    owner = "dino-";
    repo = pname;
    rev = "v${version}";
    sha256 = "7a2a6cc9311f1370b1c295d3cf0428604e03ced08bb894e073cd58209f5ef537";
  };

  meta = with lib; {
    homepage = "https://github.com/dino-/systemd-failmsg";
    description = "systemd toplevel override which sends emails alerts when systemd services fail";
    license = licenses.isc;
    platforms = platforms.linux;
  };
}
```

How to I build it? `$ nix build --file ...` gives me:

    error: anonymous function at /home/william/.config/nix/pkgs/systemd-failmessage.nix:1:1 called without required argument 'stdenv'

The wiki article doesn't go into detail on using `$ nix build` to build a
package. Most of the existing documentation uses `nix-build` instead, which
obviously doesn't make use of flakes.
Okay, a different approach: specifying the package in `flake.nix`. The wiki page
has very little detail on how to use `output.packages` other than specifying
that it's used by `$ nix build .#<package>`. nixpkgs only uses `legacyPackages`,
so I have to search elsewhere. I looked at [the Flake
RFC](https://github.com/NixOS/rfcs/pull/49/files), in which a comment mentions the
`callPackage` pattern used by nixpkgs for its legacy packages, but within the
`output` function of my flake `nixpkgs` just refers to the store path. It took
me a bunch of searching until I found [this
blog](https://www.breakds.org/post/flake-part-1-packaging/) which mentions
actually importing `nixpkgs` to provide the classic `pkgs` input. Makes sense
once it's done in an example!

```nix
{
  # ...
  outputs = { self, nixpkgs, sops-nix }:
    let
      forAllSystems = f: nixpkgs.lib.genAttrs nixpkgs.lib.systems.flakeExposed (system: f system);
    in
    {
      packages = forAllSystems (system: import ./pkgs {
        pkgs = import nixpkgs { inherit system; };
      } );

      nixosConfigurations."vm" = nixpkgs.lib.nixosSystem {
        system = "x86_64-linux";
        modules = [
          (nixpkgs + "/nixos/modules/profiles/qemu-guest.nix")
          (nixpkgs + "/nixos/modules/virtualisation/qemu-vm.nix")
          ./vm.nix
          ./common.nix
          sops-nix.nixosModules.sops
        ];
      };

      formatter.x86_64-linux = nixpkgs.legacyPackages.x86_64-linux.nixpkgs-fmt;
    };
}
```

I also copied the `forAllSystems` function used by nixpkgs' `flake.nix` so my
packages are cross-platform by default.

Tried `$ nix build .#systemd-failmsg`, Encountered a very helpful error message!

    error: You meant fetchFromGitHub, with a capital H

Given how specific this error is ("with a capital H"), this is probably
hard-coded message for this specific misspelling of a function. However, it's an
example of what error messages should look like in practice.

The most related section of the NixOS manual appears to be [Adding custom
packages](https://nixos.org/manual/nixos/stable/index.html#sec-custom-packages),
which dutifully links to the more details [nixpkgs
manual](https://nixos.org/manual/nixpkgs/stable/) which actually covers
packaging in detail. My skim-reading caused me to skip over this link - I
instead searched for `mkDerviation` in the NixOS manual but found very little.
Nevertheless, I was able to piece the puzzle together from the NixOS manual
alone, working out my roadblocks:

- Implementing the `installPhase` attribute
- Using `bash` as an input to run the package's install script (otherwise the
  script would result in "/bin/bash: bad interpreter: No such file or
  directory"
- Using the `$out` variable as the install prefix (instead of simply `out`)
  - The error message for this is a bit vague:  
    
    error: builder for '/nix/store/4k49hl64mvwijpxc5v53szqv5vjzkhw6-systemd-failmsg-1.3.drv' failed to produce output path for output 'out' at /nix/store/...

However, after these issues I find everything as expected under `result/`. I'm
pleasantly surprised that Nix automatically substitutes the shebang in a script
which is part of the package - `#!/bin/bash` is replaced with
`#!/nix/store/.../bash`. Excellent, that spares me from some tedious patching.

I can now search for my package and even show its derivation, nifty!

```
$ nix search . systemd-failmsg
* packages.x86_64-linux.systemd-failmsg (1.3)
  systemd toplevel override which sends emails alerts when systemd services fail

$ nix show-derviation .#systemd-failmsg
{
  "/nix/store/l14p85r1cn2fix4sisr6c1cfmgzm4x3p-systemd-failmsg-1.3.drv": {
    "outputs": {
      "out": {
        "path": "/nix/store/bxj557f73c5k8ayirj30qlh4c4lvwc5w-systemd-failmsg-1.3"
      }
    },
    "inputSrcs": [
      "/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"
    ],
    "inputDrvs": {
      "/nix/store/3i6jqp61ra3031kjs1jlrmmqd2jyixd6-source.drv": [
        "out"
      ],
      "/nix/store/42pr7zqjf0y29v19q1wxn6hs5gdl5car-bash-5.1-p16.drv": [
        "dev",
        "out"
      ],
      "/nix/store/ddmyhp06jqy8bxj715zwsmbcnzvx8iax-stdenv-linux.drv": [
        "out"
      ]
    },
    "system": "x86_64-linux",
    "builder": "/nix/store/0d3wgx8x6dxdb2cpnq105z23hah07z7l-bash-5.1-p16/bin/bash",
    "args": [
      "-e",
      "/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"
    ],
    "env": {
      "buildInputs": "",
      "builder": "/nix/store/0d3wgx8x6dxdb2cpnq105z23hah07z7l-bash-5.1-p16/bin/bash",
      "configureFlags": "",
      "depsBuildBuild": "",
      "depsBuildBuildPropagated": "",
      "depsBuildTarget": "",
      "depsBuildTargetPropagated": "",
      "depsHostHost": "",
      "depsHostHostPropagated": "",
      "depsTargetTarget": "",
      "depsTargetTargetPropagated": "",
      "doCheck": "",
      "doInstallCheck": "",
      "installPhase": "PREFIX=$out bash install.sh\n",
      "name": "systemd-failmsg-1.3",
      "nativeBuildInputs": "/nix/store/qcalxj277ld4jiklmd2lzx6gkcvkc67k-bash-5.1-p16-dev",
      "out": "/nix/store/bxj557f73c5k8ayirj30qlh4c4lvwc5w-systemd-failmsg-1.3",
      "outputs": "out",
      "patches": "",
      "pname": "systemd-failmsg",
      "propagatedBuildInputs": "",
      "propagatedNativeBuildInputs": "",
      "src": "/nix/store/ig1jp44jq5vy1lmdkm6ilimhq96v6157-source",
      "stdenv": "/nix/store/28hqpbwpzvpff7ldbhxdhzcpdc34lgsa-stdenv-linux",
      "strictDeps": "",
      "system": "x86_64-linux",
      "version": "1.3"
    }
  }
}
```

And finally, I can add my package to the machine's configuration.
The question is... how? Again, this would be easier if I could pass stuff from
`flake.nix` into child modules. Doing a bit more research, it is indeed
possible, with the `specialArgs` attribute, which is a set that is merged with
the usual parameters that are passed to imported modules. I figured out that the
`rec` keyword means "recursive" from the NixOS manual, and it allows references
to other attributes within an attribute set, so I can include my flake's
packages as an argument to the other modules:

```nix
{
  outputs = { self, nixpkgs, sops-nix }@inputs:
    let
      forAllSystems = f: nixpkgs.lib.genAttrs nixpkgs.lib.systems.flakeExposed (system: f system);
    {
      packages = forAllSystems (system: import ./pkgs {
        pkgs = import nixpkgs { inherit system; };
      });

      nixosConfigurations."vm" = nixpkgs.lib.nixosSystem rec {
        system = "x86_64-linux";
        # Merge all flake inputs with the packages
        specialArgs = inputs // { mypkgs = packages.${system}; };
        modules = [ ./vm.nix ];
      };
    };
}
```

And add it to my system's configuration:

```nix
{ mypkgs, ... }:

{
  imports = [
    /* ... */
  ];

  environment.systemPackages = with mypkgs; [
    systemd-failmsg
  ];
}
```

Here's the tree of the package:

```
$ tree /nix/store/...-systemd-failmsg-1.3/
├── bin
│   └── failmsg.sh
├── lib
│   └── systemd
│       └── system
│           ├── failmsg@.service
│           ├── failmsg@.service.d
│           │   └── toplevel-override.conf
│           └── service.d
│               └── toplevel-override.conf
└── share
    └── systemd-failmsg
        ├── always-fails.service
        └── doc
            ├── LICENSE
            └── README.md
```

I can rebuild and run the VM successfully, but the systemd services aren't
available. I do see the script in `bin/failmsg.sh`, but nothing else.

Instead of using the provided install script, I'll try to do things the
NixOS way.
My initial alternative approach involved creating a Nix module for
`systemd-failmsg`, following the pattern used by other packages/modules that
provide systemd services, e.g.
[nginx](https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/services/web-servers/nginx/default.nix).
I tried to create the systemd unit file with `systemd.service.<service> = `, and
the number of config files for the overrides using `environment.etc`. This,
however, produced a number of "Permission denied" errors during building the
`env` package. Seemingly the `etc/systemd/system` path was already owned by root
at the respective part of the build, was it already locked down and considered
read-only? I didn't have any luck searching for this issue, but I assume this is
an un(der)documented special case for systemd and other core system packages.

I eventually discovered the correct solution while browsing existing examples
and issues. I was first directed to `/run/current-system/sw`, where I found my
service under `/lib/systemd/system` as I expected. This explains why `/lib`
doesn't exist - it's instead stashed away in here. It also shows that Nix does
intelligently install the systemd service and the configuration files. The
service did not however show up in `systemctl list-units`, suggesting I was
missing another step.

This step was adding the package to the `systemd.packages` list, which
[according to the
manual](https://nixos.org/manual/nixos/stable/index.html#sect-nixos-systemd-nixos)
"enables" the service. After doing this, my service and its systemd overrides
show up under `/etc/systemd`, and are all picked up correctly.

In the end I was left with this fairly short module:

```nix
{ config, lib, mypkgs, ... }:

with lib;

let
  cfg = config.services.systemd-failmsg;
in
{
  options = {
    services.systemd-failmsg.enable = mkEnableOption "systemd-failmsg";
  };

  config = mkIf cfg.enable {
    environment.systemPackages = [ mypkgs.systemd-failmsg ];
    systemd.packages = [ mypkgs.systemd-failmsg ];
  };
}
```

And enabling the service instead involved adding the following somewhere in my
system's configuration:

```nix
services.systemd-failmsg.enable = true;
```

Conveniently, some Nextcloud services were failing on boot, which gave me a
chance to test the failure message easily. Despite being able to execute the
script manually successfully, when executed by systemd it couldn't find the
`failmsg.sh` script. I found the `substituteInPlace` command in the manual and
applied a quick fix to the derivation:

```nix
postBuild = ''
  substituteInPlace failmsg@.service \
    --replace /usr/bin/failmsg.sh $out/bin/failmsg.sh
'';
```

The systemd service still failed, this time because it couldn't find the
commands used, including `hostname`, `id`, `sendmail`. A bit of digging around
issues and the manual showed that I needed to set up the path correctly - makes
sense, considering all the magic Nix does with `$PATH`.

I was initially considering using the `substitueInPlace` command again to
replace them with the appropriate executable path, e.g.
`--replace hostname ${coreutils.id}/bin/hostname`, but that didn't feel right.
Looking at some more example and the manual, I found the `writeShellApplication`
builder. This is more meant for packages wherein the only output is a single
shell script, but it automagically handles the script's `$PATH` for you, so the
script can use the commands like normal instead of absolute paths to the Nix
store.

In the end I saw `wrapProgram` in some examples, which as the name suggests,
wraps a program, setting environment variables as desired. I applied this to the
`failmsg.sh` script in the derivation, adding the required dependencies to `$PATH`:

```nix
let
    inputs = [ inetutils system-sendmail coreutils ];
in

# ...

  nativeBuildInputs = [ makeWrapper ];
  buildInputs = [ bash ] ++ inputs;

  installPhase = ''
    PREFIX=$out bash install.sh

    wrapProgram $out/bin/failmsg.sh \
      --prefix PATH : ${lib.makeBinPath inputs}
  '';
```

(Question for the future: what differentiates `nativeBuildInputs`,
`buildInputs`, and the like. Which are build-time only, which are runtime? What
does "native" mean?)

This resolved the issue with unavailable commands, but the service was still
failing due to network unavailability while the service was starting up. The
problem lies upstream, due to the service only containing
`After=network.target`, so the service doesn't have a hard requirement on
network availability. I applied a fairly simple fix by adding a `Requires`
and `After` to the systemd unit in the Nix module:

```nix
  systemd.services."failmsg@".unitConfig = {
    After = "network-online.target";
    Requires = "network-online.target";
  };
```

(TODO: should I submit this patch upstream?)

Finally, the service is working as expected, and I'm content with its
Nix implementation.

## Passing `nix flake check`

I came across the `nix flake check` command, which runs tests on your flake. So
I thought it would be a good idea to ensure my flake passes.


    $ nix flake check
    error: Package ‘systemd-failmsg-1.3’ in /nix/store/...-source/nix/pkgs/systemd-failmsg/default.nix:22 is not supported on ‘x86_64-darwin’, refusing to evaluate.

The first problem I hit was 
making my `systemd-failmsg` package only available on Linux, otherwise the check
produces the following error:

It was surprisingly difficult to find the appropriate method for "only create
this attribute if this condition holds". After a while I eventually remembered
coming across the pattern of extending with an attribute set which may be empty
depending on the condition. A simple enough function which is probably just
implemented like so:

    c: set: if c then set else {}

It took a surprising amount of time to come across the appropriate method. In
the end I came across the pattern in the nixpkgs manual, using the function
`lib.attrsets.optionalAttrs`.
Underneath it was the handy type specification in the lovely format I know from
Haskell:

    optionalAttrs :: Bool -> AttrSet

After discovering [the NixOS options search](https://search.nixos.org/options) I
can't help but wish there were a [Hoogle](https://hoogle.haskell.org/)
equivalent for Nix/nixpkgs library functions.

Back to the problem at hand, I try using the condition `pkgs.stdenv.isLinux`,
but this causes a strange error on `nix flake check`:

    error: attribute 'busybox' missing

Adding `--add-trace`, I eventually realised that at the very start of the trace,
the `system` causing the error is actually `mipsel-linux`:

     … while checking the derivation 'packages.mipsel-linux.systemd-failmsg'

     at /nix/store/cl9c0n6wlgga14mcqkv7hypc6djdab00-source/nix/pkgs/default.nix:4:3:

          3|   // pkgs.lib.optionalAttrs pkgs.stdenv.isLinux {
          4|   systemd-failmsg = pkgs.callPackage ./systemd-failmsg.nix { };
           |   ^
          5| }

    … while checking flake output 'packages'

    at /nix/store/3hnh67ga9pw39j495b94h4fwhgqscm37-source/nix/flake.nix:13:7:

        12|     in rec {
        13|       packages = forAllSystems (system: import ./pkgs {
          |       ^
        14|         pkgs = import nixpkgs { inherit system; };

This system appears to currently be broken, so I'll try specifically filtering
out `mipsel-linux`.

```
{ pkgs }:
{ }
  // pkgs.lib.optionalAttrs (pkgs.stdenv.isLinux && pkgs.system != "mipsel-linux") {
  systemd-failmsg = pkgs.callPackage ./systemd-failmsg { };
}
```

And `nix flake check` passes!

After coming across a common `flake.nix` pattern, I decided to invert this
system filtering, and instead only support a subset of platforms in my flake:

```nix
{
outputs = { self, nixpkgs, sops-nix }@inputs:
  let
    # Specify the list of supported platforms
    systems = [ "x86_64-linux" ];
    forAllSystems = f: nixpkgs.lib.genAttrs systems (system: f system);
  in
  rec {
    packages = forAllSystems (system: import ./pkgs {
      pkgs = import nixpkgs { inherit system; };
    });
  };
}
```

Considering only the platforms I care about seems like an easier way to go about
this, rather than trying to support all the platforms Nix does. If I want to
deploy to other platforms in the future I can simply add to `systems`.

## Overlays: fixing nextcloud-news-updater

[Nextcloud's news-updater](https://github.com/nextcloud/news-updater) is a handy
tool for speedily updating RSS feeds. It unfortunately hard-codes running the
`occ` command under the Nextcloud installation directory. However, NixOS has its
own `nextcloud-occ` script which conveniently wraps the normal command with
`sudo` and the normal Nix stuff.

I generally followed examples on Flake wiki page and managed to get it patched
via an overlay.

Worth noting I spent a long time trying to debug my first attempt, as `nix flake
check` would result in "error: infinite recursion encountered" in
fixed-point.nix. It took me a while to find the issue was specifying the
arguments as a set `{ self, super }:` instead of two separate arguments `self:
super:`. And this is why types and type checking is useful. That, and useful
errors.

## Packaging [Firefox Syncserver](https://github.com/mozilla-services/syncserver)

This was easily the most painful part of the conversion process.

At the time of writing, there's no official package for this Python app in
nixpkgs, because it's stuck on Python 2 and its dependencies have been broken.
The new implementation
[syncstorage-rs](https://github.com/mozilla-services/syncstorage-rs) [was
recently packaged](https://github.com/NixOS/nixpkgs/pull/176835), however,I
wanted to stick with the old version I was already using. Also, the new app only
supports MySQL and I didn't want to run yet another database engine on top of
PostgreSQL, when a SQLite database would do just fine.

Unfortunately there's no "official" tool for easily creating a Nix derivation
for a Python packages, with all its dependencies nicely bundled up.

I took a look at a few tools to do just that, with mixed success.

Ideally what I want is a tool that I provide a package name (and maybe version),
which then pulls down all its dependencies and generates Nix derivations for
them as necessary, effectively pinning them in the Nix fashion. I wouldn't mind
if this it autogenerated code for me, I just don't want to write it all myself.

### [pypi2nix](https://github.com/nix-community/pypi2nix)

Searching through older Nix guides and discussion, pypi2nix was brought up as an
easy way to import existing Python packages. However, the project's repository
now is marked as archived, and it was indeed marked as broken in nixpkgs. I gave
up on it quite quickly, looking for other suggested alternatives.

### [mach-nix](https://github.com/DavHau/mach-nix)

From the surface, it seems like a sensible system: you supply a list of
requirements, just as in your average Python project, and dependency resolution
is managed for you.

However, I quickly ran into trouble with [Python 2
support](https://github.com/DavHau/mach-nix), with many packages showing "not
supported for interpreter python2.7" errors. mach-nix isn't able to handle
evaluation errors due to packages marked as broken, which [appears to be more of
a problem with the Nix language](https://github.com/NixOS/nix/pull/5564). This
apparently makes it hard for mach-nix to do its dependency resolution without
running into evaluation errors for Python 2 packages.

I tried manually editing mach-nix to use `tryEval` in a few more places, but I
couldn't reliably fix the problem. Local testing in the `nix repl` seemed to
reliably prevent errors when evaluating derivations directly, but it failed to
catch evaluation errors for a derivation's dependencies. It also wasn't easy to
get a list of all of a derivations dependencies ([see an attempt
here](https://github.com/NixOS/nix/pull/5036)), which made trying to fix this
even more tedious.

At this point I also realised the underlying method mach-nix uses to make things
"pure" as Nix requires: fetching a Git repository which indexes everything
available in PyPi and other Python repositories. This index currently sits at
over 400MB, which I'm sure will only grow with time. Personally, I felt this
is a sledgehammer approach. I appreciate Nix already isn't particularly
conservative when it comes to disk usage, but this is extreme.

### [poetry2nix](https://github.com/nix-community/poetry2nix)

I initially thought poetry2nix was only for packages already using
[Poetry](https://python-poetry.org), so I overlooked it in favour of trying out
other Python-to-Nix tools. Reading a bit more into it, I realised I could
repackage syncserver myself.

I add `python3Packages.poetry` to my dev shell, and go through the `poetry init`
process, requiring python `2.7`. But when trying to `poetry add` some
dependencies, it complains about the current Python version not being `2.7`. It's
a bit confusing that a build tool needs to run on the version the end
application uses, but I suppose it may need to evaluate the `setup.py` script.
This initially wasn't easy, [poetry2nix doesn't officially support Python
2](https://github.com/nix-community/poetry2nix/issues/560) because `poetry` is
marked as broken for Python 2 in nixpkgs, due to 

After running `poetry env use python2.7`, it complains about my current version
`2.7.xx` not matching `2.7`. Of course, I need to make the version requirement
fuzzy by instead specifying `^2.7`. After the adjustment, `poetry env use
python2.7` works.

`poetry add` got a bit further now, but started complaining about the explicit
version requirements conflicting with transitive dependencies. To my horror, it
turns out `pip` doesn't do proper dependency resolution, apparently it simply
accepts the first version of a package that is requested. Naturally, Poetry
[doesn't provide an escape-hatch for this
situation](https://github.com/python-poetry/poetry/issues/697), making it a pain
to port packages relying on this broken `pip` behaviour to Poetry.
My solution was to provide to include and patch a nested copy of the broken
dependency. This could probably be replaced with the usual Nix source patching
mechanism along with a poetry2nix override.

Before working out how poetry2nix overrides worked, I tried manually adjusting
the `poetry.lock` file to change the source of another broken dependency,
[umemcache](https://github.com/esnme/ultramemcache), which has unreleased fixes.

Annoyingly, I hit a poetry bug [which interfered with
poetry2nix](https://github.com/nix-community/poetry2nix/issues/701), where
dependency metadata wasn't being specified, which is what poetry2nix relies on
to make all the dependency fetching pure. I fixed this with a `nix flake update`
to nixpkgs. Unfortunately this introduced more issues, including an [infinite
recursion error](https://github.com/nix-community/poetry2nix/issues/648), and a
large number of packages requiring `setuptools` to be added to their
dependencies. I managed to work around both with a tedious number of poetry2nix
overrides.

Another limitation I hit was Poetry [not locking URL
dependencies](https://github.com/python-poetry/poetry/issues/2060), causing
confusing errors about hashes being required when building with poetry2nix. The
fix is to override the `src` of the offending packages, repeating the URL and
providing the correct hash.

After this, I finally had a build-able package, I then used the `dependencyEnv`
attribute to get a store path containing all the dependencies, including
Gunicorn which I needed my service to execute. I hit a collision between some
files, so I had to set `ignoreCollisions = true;`:

    Collision between backports-functools-lru-cache and python2.7-configparser: python2.7/site-packages/backports/__init__.pyc

### Configuring the service

From there, I amended my NixOS module for firefox-syncserver until it was up and
running. Some fun problems:

Nix: confusing errors when generating the config file with `format.generate` and
`recursiveUpdate`. I've forgotten the details of what caused this.

Gunicorn: fun trial and error with the systemd service hardening, additionally
needed the `@chown` and `@setuid` permissions, for changing the owner of a file
in `/tmp`, and setting the user for worker processes respectively.

Gunicorn: having no logging by default! And making it tedious to configure
logging.

Syncserver: forgetting the `https://` in the `public_url` config option causes
the annoying mismatch between `public_url` and the origin URL received by
Gunicorn. The error message also unhelpfully duplicates the received URL,
causing me to confuse the issue for a bug in Nginx or Gunicorn itself.

<!--
Pros and cons

Searching for packages in the nixpkgs used as an input to a flake:

$ nix search --inputs-from .# nixpkgs <package>
-->

## My website

I use a fairly simple deployment method for my website: `$ git push` with a
[post-receive
hook](https://git.sr.ht/~williamvds/website/tree/master/item/post-receive) on
the receiving end, which rebuilds the website using
[Zola](https://www.getzola.org) (and [Graphviz](https://graphviz.org) for some
diagrams).

This introduces a problem to the Nix system configuration for the webserver:
cloning a Git repository and updating it separately is an impure operation. It
doesn't make much sense to wrap the website in a Nix package if I want to be
able to painlessly update the website separately to the system it runs on. 
However, I would like the NixOS configuration to include the initial setup of
the repository (cloning it, and setting up that post-receive hook).

While searching for potential methods of implement this, I came across
[`system.activationScripts`](https://nixos.org/manual/nixos/stable/options.html#opt-system.activationScripts)
in other people's configurations. This seems to fit the bill - it allows you to
specify arbitrary scripts that run on boot and `nixos-rebuild`. I created a
script that clones the repository to an appropriate location, and installs a
wrapped version of the post-receive hook.

# Deploying

I used [nixos-infect](https://github.com/elitak/nixos-infect), as a cloud-init
script when allocating a VM on Hetzner (with Ubuntu as the base OS).
It works perfectly, and after a reboot the new VM claims it is NixOS:

    $ cat /etc/os-release
    BUG_REPORT_URL="https://github.com/NixOS/nixpkgs/issues"
    BUILD_ID="22.05.3475.ed9b904c5eb"
    DOCUMENTATION_URL="https://nixos.org/learn.html"
    HOME_URL="https://nixos.org/"
    ID=nixos
    LOGO="nix-snowflake"
    NAME=NixOS
    PRETTY_NAME="NixOS 22.05 (Quokka)"
    SUPPORT_URL="https://nixos.org/community.html"
    VERSION="22.05 (Quokka)"
    VERSION_CODENAME=quokka
    VERSION_ID="22.05"

Then I attempted to deploy my system configuration with `nixos-rebuild`:

    $ nixos-rebuild nixos-rebuild switch --target-host atlas2 --flake .#webserver
    error:
           Failed assertions:
           - The ‘fileSystems’ option does not specify your root file system.
           - You must set the option ‘boot.loader.grub.devices’ or 'boot.loader.grub.mirroredBoots' to make the system bootable.

Probably missing the `/etc/nixos/hardware-configuration.nix`. Copied it from the
VM and included it in the system config.

Another error:

    error:
           Failed assertions:
           - You must set the option ‘boot.loader.grub.devices’ or 'boot.loader.grub.mirroredBoots' to make the system bootable.

Weird that nixos-infect doesn't sort out the bootloader for you. I manually set
to `boot.loader.grub.device = '/dev/sda';` and that sorted that.

The next deploy worked, but there were a couple of errors, because I forgot to
update the sops keys.

I also messed up and set `PermitRootLogin no` in the SSH config without setting
up a user with `sudo` access, so I rolled back over the existing SSH connection:

    $ nixos-rebuild switch --rollback

Glad to see that switching works without rebooting! It even disables & stops
services, deletes groups, and all that stuff.

While sorting this out, it was annoying having to swap my local SSH config
back-and-forth between root and a normal user.

Found the `--use-remote-sudo` option for `nixos-rebuild`, but this seems to
require passwordless sudo. I set that up with the following options:

```nix
users.users.<user>.extraGroups = [ "wheel" ];
security.sudo.wheelNeedsPassword = false;
```

Re-did the deploy with `--use-remote-sudo` and it had the same result, so I'm
taking it as done.

Some strange errors on the next rebuild:

    error: cannot add path '/nix/store/1rhrxs9n736a4f3gqlqmi211xnhi10ka-nextcloud-config.php' because it lacks a valid signature

From [this issue](https://github.com/NixOS/nix/issues/2127), it appears to be
caused by `nixos-rebuild` copying things from my local store to the server's
store - by default copying is apparently only allowed from `cache.nixos.org`.

At this point I messed up the SSH and sudo permissions and was locked out of
root. I also didn't have a root password, so I went through Hetzner rescue. The
lack of a normal `$PATH` made it somewhat tedious to run the commands needed,
including `bash` and `passwd`. I had to search the store for them.

The "valid signature" issue turned out to be due to swapping to the normal user
with `--use-remote-sudo`. Apparently one needs to tweak
[`nix.settings.trusted-users`](https://search.nixos.org/options?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=nix.settings.trusted-users)
to include any users that are able to `nixos-rebuild`. To resolve this, I added
`@wheel` to this setting, and ran `nixos-rebuild` from the machine itself to
work around the issue. I added this to [the wiki
page](https://nixos.wiki/index.php?title=Nixos-rebuild) to highlight the problem
before it occurs.

## Performing the migration

At this point I had a fully functional NixOS system, reproducing just about
everything I needed from the old server! As a temporary measure, I replicated my
normal website DNS entries but adding a prefix: `new.williamvds.me` - this was
quite easy since I'd made the domain name a Nix option.

I followed some standard data migration procedures, copying over my Nextcloud
data and PostgreSQL database. Things generally "just worked", though I noticed a
few odd things that needed minor adjustments to options.

Some testing ensured, I made sure all the new services worked by logging into
them on my devices. After some more hassle with firefox-syncserver, I was fairly
confident all was in order, so I migrated across by editing the normal DNS
entries to point to the new server's IP address.