~bitfehler/k8s.sr.ht

Experimental sr.ht k8s setup
Various ceph(-csi) updates
Add kube-router service account + RBAC + token
Add Coredns service account + RBAC and token

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~bitfehler/k8s.sr.ht
read/write
git@git.sr.ht:~bitfehler/k8s.sr.ht

You can also use your local clone with git send-email.

#K8s setup

As part of moving our infrastructure to Europe we are experimenting with Kubernetes. This repository is a non-reusable code log of my actions to install it on our servers and documentation of the current status (moving target).

In the SourceHut spirit, the goal is to keep the setup as simple as possible. In its current form, all Kubernetes components are being run from Alpine packages. The only containers being pulled from the Internet right now are the HAProxy ingress controller and the ceph-csi controllers. And we hope to build those ourselves eventually, too.

#High level overview

We currently have three servers in the EU data-center: two compute servers, sakuya2.sr.ht and sakuya3.sr.ht, and a storage server, patchouli2.sr.ht.

We are aiming for the following production setup:

  • Etcd
    • Needs at least 3 nodes for HA, so runs on all nodes
  • Control plane
    • Comprised of kube-apiserver, kube-controller-manager, and kube-scheduler
    • Does not cluster itself and is essentially stateless
    • For now, running on both compute nodes will provide sufficient redundancy
  • Kubelets
    • The actual nodes of the Kubernetes cluster, running workloads
    • Comprised of kubelet, kube-proxy, container runtime (containerd), etc.
    • Initially, use just the two compute servers
    • If needed, the storage server could be added and labelling be used so that it only runs very specific workloads
  • CoreDNS
    • The de-facto standard for cluster DNS, so we probably end up running it
    • We run it out-of-cluster from Alpine packages
    • Currently on both compute nodes for redundancy
    • Hence, it might make for a good solution for internal (non-cluster) DNS
  • Ingress controller
    • Running the HAProxy ingress controller
    • As a daemon set (i.e. one instance on each cluster member)
    • Using the host network (exposing all configured ingresses to the internet, much like in the US setup)
    • An experiment running it out-of-cluster is planned
  • Routing
    • Is set up manually (see below)
  • Storage
    • The storage server provides remote block devices via Ceph
    • The compute servers could potentially provided X% of their SSD storage as remote block devices?

See here for a good reference on HA Kubernetes setups.

#System preparation

Make sure the testing repo is enabled (all Kubernetes packages are currently in testing). In /etc/apk/repositories:

http://dl-cdn.alpinelinux.org/alpine/edge/testing

Remove default nginx to free port for ingress controller:

rc-update del nginx

The metrics are being scraped right from the node-exporter (see metrics.sr.ht), so it is no longer needed.

Load br_netfilter on boot:

printf "br_netfilter\n" >> /etc/modules

Enable IP forwarding:

printf "net.ipv4.ip_forward = 1\n" > /etc/sysctl.d/ip_forward.conf

Add entries for node names to /etc/hosts mapping to internal IPs. Make sure that a node's hostname is not mapped to localhost. Example:

127.0.0.1   localhost locahost.localdomain
::1         localhost localhost.localdomain
10.0.0.132  patchouli2
10.0.0.134  sakuya2
10.0.0.135  sakuya3

#Network

#Service network

Network:

  • 10.32.0.0./24

#Public virtual service IPs

We have to handle SSH traffic to multiple destinations. SSH is inherently hard to route (no SNI, host header, or such). Hence, we will need dedicated IPs for certain services (git, hg?, build runner). By far the simplest solution is to maintain a mapping by hand.

Each service should get at least two IPs for redundancy. Each IP is manually assigned to a cluster memeber. The range and numbering scheme is to be determined, but for example:

  • git.sr.ht
    • 46.23.81.200 (assigned to sakuya2)
    • 46.23.81.201 (assigned to sakuya3)
  • k8s.runners.sr.ht
    • 46.23.81.202 (assigned to sakuya2)
    • 46.23.81.203 (assigned to sakuya3)

DNS has to be configured manually. Each IP has to be brought up on the host's main network interface, in addition to their host IP. A Kubernetes service can then be declared like this:

apiVersion: v1
kind: Service
metadata:
  name: buildsrht-ssh
spec:
  selector:
    app: buildsrht-ssh
  ports:
    - protocol: TCP
      port: 22
      targetPort: 22
  externalIPs:
    - 46.23.81.202
    - 46.23.81.203

This will cause kube-proxy to intercept and handle traffic destined for the specified IP/port pairs.

#Pod network

Network:

  • 10.200.0.0/16

Subnets:

  • 10.200.132.0/24 (patchouli2.sr.ht/10.0.0.132)
  • 10.200.134.0/24 (sakuya2.sr.ht/10.0.0.134)
  • 10.200.135.0/24 (sakuya3.sr.ht/10.0.0.135)

#Routing

Each node must have the following configured:

  • Its host address
  • Any virtual service IPs the host should handle (see above)
  • A route to the service network via the cni0 interface (.1 of the host's pod network subnet)
  • A route to other hosts' pod network subnets via the hosts' main addresses

E.g. on sakuya2, in /etc/network/interfaces (assuming the virtual service IPs provided in the example above):

auto eth0
iface eth0 inet static
    hostname sakuya2
    address 46.23.81.134
    netmask 255.255.255.128
    gateway 46.23.81.129

iface eth0 inet static
    address 46.23.81.200
    netmask 255.255.255.128

iface eth0 inet static
    address 46.23.81.202
    netmask 255.255.255.128

auto eth1
iface eth1 inet static
    address 10.0.0.134
    netmask 255.255.255.0
    up ip route add 10.32.0.0/24 via 10.200.134.1
    up ip route add 10.200.132.0/24 via 10.0.0.132
    up ip route add 10.200.135.0/24 via 10.0.0.135

#Installation

I created this repo so that we have something to base changes/updates/discussions on. It's not very polished and files are scattered all over the place. However, if you look at the instructions below with links to the main Makefiles, you should be able to see what files go where.

#Generate and distribute secrets

Note: Do not run this, as it would overwrite existing keys installed on the hosts. It is mainly intended for reference.

  • Run make gen (generates anything that uses keys, keep safe)
  • Run make dist-secrets to push generated secrets to /root/k8s.sr.ht on all hosts

#Distribute files and install

This step can be repeated as much as needed. It will use the secrets distributed in the previous step.

  • Run make dist-files to distribute all files (minus secrets) to ~/k8s.sr.ht on all hosts
  • Host management: on any of the hosts, you can run make install in one of the following sub-directories of ~/k8s.sr.ht:
    • etcd to install etcd
    • kube-api to install the Kubernetes control plane
    • coredns to install CoreDNS (requires control plane on same host)
    • kubelet to install a Kubernetes node (kubelet etc.)
  • Cluster management: anywhere with kubectl set up, you can run make install in of the following sub-directories of ~/k8s.sr.ht:
    • acls to create/update some basic ACLs
    • ingress-controller to install/update the HAProxy ingress controller
    • ceph-csi to install/update the Ceph CSI driver

#Usage

On all hosts, the admin config can be found at /root/admin.kubeconfig:

export KUBECONFIG=/root/admin.kubeconfig
kubectl get nodes

Note that the kubeconfig contains private keys, so treat it with care. Currently, the API endpoint is not exposed to the outside, but it could be (most people do this), so that we could use kubectl remotely.

The demo service was deployed like this:

kubectl create deployment demo --image=httpd --port=80
kubectl expose deployment demo
kubectl create ingress demo-localhost --rule="demo.sr.ht/*=demo:80"

Test (you can pick any cluster member instead of sakuya2):

curl http://demo.sr.ht -sS --connect-to "::sakuya2.sr.ht:"