~magic_rb/website

1646cb84bd1998479b1357783a815bad029e844c — Magic_RB 5 months ago 1c13d31
Add SearX blog post

Signed-off-by: Magic_RB <magic_rb@redalder.org>
2 files changed, 481 insertions(+), 0 deletions(-)

A blog/packaging-searx/.gitignore
A blog/packaging-searx/part1.org
A blog/packaging-searx/.gitignore => blog/packaging-searx/.gitignore +2 -0
@@ 0,0 1,2 @@
searx
searx-nix

A blog/packaging-searx/part1.org => blog/packaging-searx/part1.org +479 -0
@@ 0,0 1,479 @@
#+title: Packaging Searx - Part 1
#+date: <2022-07-24 Sun>

#+begin_src shell :exports none :results none
  git clone https://github.com/searx/searx
#+end_src

#+begin_src gitignore :exports none :tangle .gitignore
  searx
  searx-nix
#+end_src

In this N part blog post series, I'll show you the exact process of packaging [[https://github.com/searx/searx][Searx]] a meta seach engine. Here's an excerpt from Searx's readme to shine a bit of light on what we'll be packaging.

#+begin_quote
Searx is a free internet metasearch engine which aggregates results from more than 70 search services. Users are neither tracked nor profiled. Additionally, searx can be used over Tor for online anonymity.
#+end_quote

So if you're a privacy nerd or want ensure Google doesn't know what you're cooking tonight, read on and you'll learn how Searx works from a system administrator and packager perspective.

Searx is already packaged in nixpkgs, but for the sake of this blog post, let's pretend it isn't. I'll go over all the things I check, verify and all the things I do when packaging. So I'll quit mumbling and start Nix-ing!

* Discovery

First it's imperative that we find the upstream repo we'll be working with, it may sound simple enough, and in the case of Searx it luckily is, but it can also be challenging. It all depends on how well-known the project is and how unique the name is. My recommendation is to use a search engine and search for ~searx git~ in this case, which gets us [[https://github.com/searx/searx]].

Now that we have a link to the repo, we need to identify the language and in the case of some languages the build system. There are several ways to do this, one is to just look at the root of the repo and look for a few recognizable files. I'll leave an incomplete table below.

| files / directories               | language / build system |
|-----------------------------------+-------------------------|
| Cargo.toml, Cargo.lock            | Rust - cargo            |
| requirements.txt, setup.py        | Python 2/3              |
| CMakeFiles.txt                    | C, C++ - cmake          |
| meson.build                       | C, C++ - meson          |
| composer.json, composer.lock      | PHP - composer          |
| package.json, package-lock.json   | Node - npm              |
| package.json, yarn.lock           | Node - yarn             |
| *.cabal, stack.yaml, package.yaml | Haskell - stack/cabal   |

I won't list tools to use when packaging these different languages, because the recommended set changes often and I'd have to keep this blog post up to date :), but it's easy enough to search for them. Generally if you search for ~<package-manager>2nix~.

#+begin_note dream2nix
  [[https://github.com/nix-community/dream2nix][dream2nix]] is a new and shiny thing, I personally haven't used it yet and won't use it in this blog post, but do keep it in mind and check whether it's relevant to your project the next time you're packaging.
#+end_note

Looking at the repository we see a ~requirements.txt~ and a ~setup.py~, the first one is valuable because we *should* have a list of all python packages we need and the second we need to keep in mind, since it contains custom arbitrary python that we may need inspect and fix.

#+begin_src fundamental
  certifi==2022.5.18.1
  babel==2.9.1
  flask-babel==2.0.0
  flask==2.1.1
  jinja2==3.1.2
  lxml==4.9.0
  pygments==2.8.0
  python-dateutil==2.8.2
  pyyaml==6.0
  httpx[http2]==0.23.0
  Brotli==1.0.9
  uvloop==0.16.0; python_version >= '3.7'
  uvloop==0.14.0; python_version < '3.7'
  httpx-socks[asyncio]==0.7.4
  langdetect==1.0.9
  setproctitle==1.2.2
#+end_src

It's also worth looking at the ~Dockerfile~ and any ~Makefile~, ~Justfile~, or ~scripts~ folder. Here we have a ~Dockerfile~ and also a ~Makefile~, lucky! Let's start with the ~Dockerfile~, I'll pick out the important bits only.

#+begin_src dockerfile
  FROM alpine:3.15
#+end_src

A pretty crucial piece of information here, we now know both the distro the container uses so we can descern the environment a bit and that Searx will happily run on musl libc.

#+begin_src dockerfile
  ENTRYPOINT ["/sbin/tini","--","/usr/local/searx/dockerfiles/docker-entrypoint.sh"]
#+end_src

Here we see where we should look for the startup script.

#+begin_src dockerfile
  ENV INSTANCE_NAME=searx \
      AUTOCOMPLETE= \
      BASE_URL= \
      MORTY_KEY= \
      MORTY_URL= \
      SEARX_SETTINGS_PATH=/etc/searx/settings.yml \
      UWSGI_SETTINGS_PATH=/etc/searx/uwsgi.ini
#+end_src

Here we have a *incomplete* list of arguments we can pass to into the Docker container, it's important to later notice where they're handled, in the scripting or in the actual program itself?

#+begin_src shell
  apk add --no-cache -t build-dependencies \
    build-base \
    py3-setuptools \
    python3-dev \
    libffi-dev \
    libxslt-dev \
    libxml2-dev \
    openssl-dev \
    tar \
    git \
#+end_src

Here we see a list of packages installed with apt, but you (and me actually) may not know what does ~-t build-dependencies~ do. It's best to look at the manpage for ~apk add~, so search for ~apk-add man~. According to [[https://www.mankier.com/8/apk-add]] ~-t~ adds a virtual package with the dependencies listed on the command line and then installs that package. So we have one package ~build-dependencies~ containing a set of packages we need at build time.

#+begin_src shell
  apk add --no-cache \
    ca-certificates \
    su-exec \
    python3 \
    py3-pip \
    libxml2 \
    libxslt \
    openssl \
    tini \
    uwsgi \
    uwsgi-python3 \
    brotli \
#+end_src

Next we have a list of packages needed at runtime, this one is really important to remember since we may have to add these in a special way later. You'll see what I mean.

#+begin_src shell
  pip3 install --upgrade pip wheel setuptools \
#+end_src

Then it upgrades ~pip~, ~wheel~, and ~setuptools~. I personally had to look up what ~wheel~ is. But looking at [[https://pkgs.alpinelinux.org/packages?name=*wheel*&branch=edge][Alpine Linux packages]] yields no results, so let's just ignore it for now. If it doesn't come up later it's not important.

#+begin_src shell
 pip3 install --no-cache -r requirements.txt \
#+end_src

Second to last it installs the packages specied in ~requirements.txt~ as expected.

#+begin_src shell
  apk del build-dependencies \
  && rm -rf /root/.cache
#+end_src

And lastly it does some cleanup. Which is interesting, because I expected those dependencies to be used later by some custom searx native component, but I guess it makes sense they're not.

#+begin_src dockerfile
  COPY searx ./searx
  COPY dockerfiles ./dockerfiles
#+end_src

We now see where that startup script comes from.

#+begin_src dockerfile
  RUN /usr/bin/python3 -m compileall -q searx; \
      touch -c --date=@${TIMESTAMP_SETTINGS} searx/settings.yml; \
      touch -c --date=@${TIMESTAMP_UWSGI} dockerfiles/uwsgi.ini; \
      if [ ! -z $VERSION_GITCOMMIT ]; then\
        echo "VERSION_STRING = VERSION_STRING + \"-$VERSION_GITCOMMIT\"" >> /usr/local/searx/searx/version.py; \
      fi; \
      find /usr/local/searx/searx/static -a \( -name '*.html' -o -name '*.css' -o -name '*.js' \
      -o -name '*.svg' -o -name '*.ttf' -o -name '*.eot' \) \
      -type f -exec gzip -9 -k {} \+ -exec brotli --best {} \+
#+end_src

This is a complicated little beast, we see ~searx/settings.yml~ ~dockerfiles/uwsgi.ini~ and ~/usr/local/searx/searx/version.py~, we also see that it compiles all the python files, but that will be taken care of by nixpkgs. Interestingly it also compresses all the assets with gzip. The find command looks for all files with ~.html~, ~.css~, ~.js~, ~.svg~, ~.ttf~ and ~.eot~, then executes ~gzip -9 -k~ and ~brotli --best~. (here I had to again search for what's brotli). (it looks to be a [[https://github.com/google/brotli][compression scheme]])

That's all from the Dockerfile. Now we need to look at the script it calls.

** ~docker-entrypoint.sh~ script

#+begin_src shell
  printf "\nEnvironment variables:\n\n"
  printf "  INSTANCE_NAME settings.yml : general.instance_name\n"
  printf "  AUTOCOMPLETE  settings.yml : search.autocomplete\n"
  printf "  BASE_URL      settings.yml : server.base_url\n"
  printf "  MORTY_URL     settings.yml : result_proxy.url\n"
  printf "  MORTY_KEY     settings.yml : result_proxy.key\n"
  printf "  BIND_ADDRESS  uwsgi bind to the specified TCP socket using HTTP protocol. Default value: \"${DEFAULT_BIND_ADDRESS}\"\n"
#+end_src

That's a nice little rundown of the supported configuration options and also that Searx is configured with ~settings.yml~, this knowledge will come in handy when we're writing the NixOS module for Searx.

#+begin_src shell
  # update settings.yml
  sed -i -e "s|base_url : False|base_url : ${BASE_URL}|g" \
     -e "s/instance_name : \"searx\"/instance_name : \"${INSTANCE_NAME}\"/g" \
     -e "s/autocomplete : \"\"/autocomplete : \"${AUTOCOMPLETE}\"/g" \
     -e "s/ultrasecretkey/$(openssl rand -hex 32)/g" \
     "${CONF}"
#+end_src

This command confirms that in fact we're dealing with a settings.yaml.

#+begin_src shell
  sed -i -e "s/image_proxy : False/image_proxy : True/g" \
              "${CONF}"
  cat >> "${CONF}" <<-EOF

  # Morty configuration
  result_proxy:
     url : ${MORTY_URL}
     key : !!binary "${MORTY_KEY}"
  EOF
#+end_src

This bit is interesting, I initially thought that the script updates the existing config with new values, but the code block above would mean that on every restart a new ~result_proxy~ block would be added. Which means that it must take a default config, write your settings in and replace the current one with that.

It's common to realize things like this, it unusual to get all assumptions right initially, but when you go further into the package, you'll naturally stumble upon issues caused by your assumptions. Just make sure you remember what you know and what you assume.

#+begin_src bash
  if [ -f "${CONF}" ]; then
      if [ "${REF_CONF}" -nt "${CONF}" ]; then
          # There is a new version
          if [ $FORCE_CONF_UPDATE -ne 0 ]; then
              # Replace the current configuration
              printf '⚠️  Automaticaly update %s to the new version\n' "${CONF}"
              if [ ! -f "${OLD_CONF}" ]; then
                  printf 'The previous configuration is saved to %s\n' "${OLD_CONF}"
                  mv "${CONF}" "${OLD_CONF}"
              fi
              cp "${REF_CONF}" "${CONF}"
              $PATCH_REF_CONF "${CONF}"
          else
              # Keep the current configuration
              printf '⚠️  Check new version %s to make sure searx is working properly\n' "${NEW_CONF}"
              cp "${REF_CONF}" "${NEW_CONF}"
              $PATCH_REF_CONF "${NEW_CONF}"
          fi
      else
          printf 'Use existing %s\n' "${CONF}"
      fi
  else
      printf 'Create %s\n' "${CONF}"
      cp "${REF_CONF}" "${CONF}"
      $PATCH_REF_CONF "${CONF}"
  fi
#+end_src

When you encounter such an ugly piece of code, you don't need to understand it fully, just the general jist of it is more than enough. At a glance we see that configuration is based on a reference config and patching of it to produce a final config.

#+begin_src shell
  # make sure there are uwsgi settings
  update_conf ${FORCE_CONF_UPDATE} "${UWSGI_SETTINGS_PATH}" "/usr/local/searx/dockerfiles/uwsgi.ini" "patch_uwsgi_settings"

  # make sure there are searx settings
  update_conf "${FORCE_CONF_UPDATE}" "${SEARX_SETTINGS_PATH}" "/usr/local/searx/searx/settings.yml" "patch_searx_settings"
#+end_src

Looking at the call sites, we see both the reference config file paths and the functions used for patching.

#+begin_src shell
  patch_uwsgi_settings() {
      CONF="$1"

      # Nothing
  }
#+end_src

Interestingly the ~uwsgi~ config doesn't get patched, so the reference one should be fine in most cases.

#+begin_src shell
  exec su-exec searx:searx uwsgi --master --http-socket "${BIND_ADDRESS}" "${UWSGI_SETTINGS_PATH}"
#+end_src

And finally we see the command used to actually launch Searx.

** What is ~uwsgi~

I once again had to look this up. But according to Wikipedia it's similar to CGI if you're familiar with that. If not then, well, it's used to allow webserver's like Nginx to serve arbitrary scripts in arbitrary languages. So ~client -> Nginx - uwsgi -> Python backend~.

*** Aren't we missing a full webserver?

#+begin_quote
uWSGI natively speaks HTTP, FastCGI, SCGI and its specific protocol named “uwsgi”
#+end_quote

No, uwsgi can serve as a lightweight webserver. So ideally in the NixOS module we'd support all methods, HTTP, CGI, SCGI and uwsgi, but that's something to worry about later.

* Packaging

Now that we know all there is to know from the Docker image and related files, we can start writing Nix expressions. First let us create a new repository quickly, we'll first do it as a Flake, it's easier and can be easily ported to nixpkgs if done right.

#+begin_src shell :results none
  git init searx-nix
#+end_src

#+begin_src nix :tangle searx-nix/flake.nix
  {
    inputs.nixpkgs.url = "github:NixOS/nixpkgs";

    outputs =
      {
        self,
        nixpkgs
      }:
      let
        supportedSystems = [ "x86_64-linux" ];
        forAllSystems' = nixpkgs.lib.genAttrs;
        forAllSystems = forAllSystems' supportedSystems;

        pkgsForSystem =
          system:
          import nixpkgs { inherit system; };
      in
        {
          packages = forAllSystems
            (system:
              let
                pkgs = pkgsForSystem system;
              in
                {
                  default = pkgs.callPackage ./searx.nix {};
                }
            );
        };
  }
#+end_src

We then create a tiny ~flake.nix~, the cruft around it is generic and not really important, the important bit is src_nix{pkgs.callPackage ./searx.nix {}}, that ensures that our actual package doesn't really care for whether it's in a flake or not.

Looking up ~nixpkgs python~ gets us to the nixpkgs manual (the information is both in the official one and ryatm's, but the latter is better since it isn't one huge html page) [[https://ryantm.github.io/nixpkgs/languages-frameworks/python/][ryatm's nixpkgs manual]].

#+begin_src nix
  { lib, python3 }:

  python3.pkgs.buildPythonApplication rec {
    pname = "luigi";
    version = "2.7.9";

    src = python3.pkgs.fetchPypi {
      inherit pname version;
      sha256 = "035w8gqql36zlan0xjrzz9j4lh9hs0qrsgnbyw07qs7lnkvbdv9x";
    };

    propagatedBuildInputs = with python3.pkgs; [ tornado python-daemon ];

    meta = with lib; {
      ...
    };
  }
#+end_src

As an example we're given a derivation for luigi, I don't know and don't need to know what luigi is. It's important to ignore irrelevant information and not research it to speed up packaging.

Based on the example derivation we can build our own. Instead of ~python3.pkgs.fetchPypi~ we're going to use ~fetchFromGitHub~ as that's more universal and easier to work with.

#+begin_src nix :tangle searx-nix/searx.nix
  {
    lib,
    python3,
    fetchFromGitHub
  }:
  with lib;
  let
    pname = "searx";
    version = "1.0.0";
  in
  python3.pkgs.buildPythonApplication {
    inherit pname version;

    src = fetchFromGitHub {
      rev = version;
      repo = pname;
      owner = pname;
      # If you update the version, you need to switch back to ~lib.fakeSha256~ and copy the new hash
      sha256 = "sha256-sIJ+QXwUdsRIpg6ffUS3ItQvrFy0kmtI8whaiR7qEz4="; # lib.fakeSha256;
    };

    postPatch = ''
      sed -i 's/==.*$//' requirements.txt
    '';

    # tests try to connect to network
    doCheck = false;

    pythonImportsCheck = [ "searx" ];

    # Since Python is weird, we need to put any dependencies we know of here
    # and not into ~buildInputs~ or ~nativeBuildInputs~ as one might expect.
    # As a starting point, just copy everything from ~requirements.txt~ and
    # hope for the best.
    propagatedBuildInputs = with python3.pkgs;
      [
        certifi
        babel
        flask-babel
        flask
        jinja2
        lxml
        pygments
        python-dateutil
        pyyaml
        # httpx[http2]
        httpx
        brotli
        # uvloop==0.16.0; python_version >= '3.7'
        # uvloop==0.14.0; python_version < '3.7'
        uvloop
        # httpx-socks[asyncio]
        httpx-socks
        langdetect
        setproctitle

        # sometimes the packages in ~requirements.txt~ may not be enough, so if something is missing, just add it
        requests
      ];

    meta = with lib; {
      # You'll fill this in later when upstreaming to nixpkgs
    };
  }
#+end_src

#+begin_src shell :results none :exports none
  git -C searx-nix add searx.nix flake.nix
#+end_src

#+begin_note clarifications
Let me just clarify a few things.

#+begin_src nix
  {
    cmake,
    gnumake,
    gcc
  }:
#+end_src

That pattern works, because Nix has a special builtin which allow one to inspect the arguments of a function, getting a list with all its arguments. ~calLPackage~ then uses that list to call the function with your requested packages.

#+begin_src nix
  {
    deps =
      [
        "cmake"
        "gnumake"
        "gcc"
      ];
    fn =
      {
        cmake,
        gnumake,
        gcc
      }:
  }
#+end_src

The above would also work, but we like conciseness.

Lastly, you may ask what's up with the ~lib.fakeSha256~, well, it returns ~sha256-AAAAAAAAAAAAAAAAAAAAA=~ (I didn't count the number of ~A~ so it's probably wrong), which stands for /I don't know yet/. The point is that when Nix dowloads the source code and checks the hash, it won't match, therefore it will print out the one you gave it and the one it calculated. You can then replace ~lib.fakeSha256~ with the actual hash.
#+end_note

At this point I looked at the already existing derivation, because I was qurious.

#+begin_src nix
  # tests try to connect to network
  doCheck = false;

  pythonImportsCheck = [ "searx" ];

  postPatch = ''
    sed -i 's/==.*$//' requirements.txt
  '';
#+end_src

The src_nix{doCheck = false} is there by experimentation. I didn't know what src_nix{pythonImportsCheck = [ "searx" ]} does, so I looked around, I first went to [[https://github.com/NixOS/nixpkgs][nixpkgs]] and clicked on ~Go to file~, searched for ~python~ and then went to ~pkgs/top-level/python-packages.nix~. Inspecting the file on line 41 I found the definition of ~buildPythonApplication~.

#+begin_src nix
  buildPythonPackage = makeOverridablePythonPackage (lib.makeOverridable (callPackage ../development/interpreters/python/mk-python-derivation.nix {
    inherit namePrefix;     # We want Python libraries to be named like e.g. "python3.6-${name}"
    inherit toPythonModule; # Libraries provide modules
  }));
#+end_src

This points to a file called ~mk-python-derivation.nix~, so again, ~Go to file~. [[https://github.com/NixOS/nixpkgs/blob/nixos-22.05/pkgs/development/interpreters/python/mk-python-derivation.nix][mk-python-derivation.nix]] tells us a lot, but still not what ~pythonImportsCheck~ does, it's only mentioned as ~pythonImportsCheckHook~, which prompted me to look for said hook. Going to the containing directory and into ~hooks/python-imports-check-hook.sh~ we can satiate our curiosity.

Lastly the src_nix{postPatch = ''...''} is used to patch out the requirement version constraints, it seems to cause an error at build time.

With all these things, we get a successful build.

In the next blog post we'll start with the NixOS module by first trying to actually get a full launch of Searx. Till then!