~bitfehler/k8s.sr.ht

Experimental sr.ht k8s setup
Update storage class w/ final CephFS name
Setup CephFS CSI driver
Update containerd config

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~bitfehler/k8s.sr.ht
read/write
git@git.sr.ht:~bitfehler/k8s.sr.ht

You can also use your local clone with git send-email.

#K8s setup

As part of moving our infrastructure to Europe we are experimenting with Kubernetes. This repository is a non-reusable code log of my actions to install it on our servers and documentation of the current status (moving target).

In the SourceHut spirit, the goal is to keep the setup as simple as possible. In its current form, all Kubernetes components are being run from Alpine packages. The only containers being pulled from the Internet right now are the HAProxy ingress controller and the ceph-csi controllers. And we hope to build those ourselves eventually, too.

#High level overview

We currently have three servers in the EU data-center: two compute servers, sakuya2.sr.ht and sakuya3.sr.ht, and a storage server, patchouli2.sr.ht.

We are aiming for the following production setup:

  • Etcd
    • Needs at least 3 nodes for HA, so runs on all nodes
  • Control plane
    • Comprised of kube-apiserver, kube-controller-manager, and kube-scheduler
    • Does not cluster itself and is essentially stateless
    • For now, running on both compute nodes will provide sufficient redundancy
  • Kubelets
    • The actual nodes of the Kubernetes cluster, running workloads
    • Comprised of kubelet, kube-proxy, container runtime (containerd), etc.
    • Initially, use just the two compute servers
    • If needed, the storage server could be added and labelling be used so that it only runs very specific workloads
  • CoreDNS
    • The de-facto standard for cluster DNS, so we probably end up running it
    • We run it out-of-cluster from Alpine packages
    • Currently on both compute nodes for redundancy
    • Hence, it might make for a good solution for internal (non-cluster) DNS
  • Ingress controller
    • Running the HAProxy ingress controller
    • As a daemon set (i.e. one instance on each cluster member)
    • Using the host network (exposing all configured ingresses to the internet, much like in the US setup)
    • An experiment running it out-of-cluster is planned
  • Routing
    • Is set up manually (see below)
  • Storage
    • The storage server provides remote block devices via Ceph
    • The compute servers could potentially provided X% of their SSD storage as remote block devices?

See here for a good reference on HA Kubernetes setups.

#System preparation

Make sure the testing repo is enabled (all Kubernetes packages are currently in testing). In /etc/apk/repositories:

http://dl-cdn.alpinelinux.org/alpine/edge/testing

Remove default nginx to free port for ingress controller:

rc-update del nginx

The metrics are being scraped right from the node-exporter (see metrics.sr.ht), so it is no longer needed.

Load br_netfilter on boot:

printf "br_netfilter\n" >> /etc/modules

Enable IP forwarding:

printf "net.ipv4.ip_forward = 1\n" > /etc/sysctl.d/ip_forward.conf

Add entries for node names to /etc/hosts mapping to internal IPs. Make sure that a node's hostname is not mapped to localhost. Example:

127.0.0.1   localhost locahost.localdomain
::1         localhost localhost.localdomain
10.0.0.132  patchouli2
10.0.0.134  sakuya2
10.0.0.135  sakuya3

#Network

#Pod network

Network:

  • 10.200.0.0/16

Subnets:

  • 10.200.132.0/24 (patchouli2.sr.ht/10.0.0.132)
  • 10.200.134.0/24 (sakuya2.sr.ht/10.0.0.134)
  • 10.200.135.0/24 (sakuya3.sr.ht/10.0.0.135)

#Routing

Add routes on each node for the other two. E.g. on patchouli2, in /etc/network/interfaces:

auto eth1
iface eth1 inet static
    address 10.0.0.132
    netmask 255.255.255.0
    up ip route add 10.200.134.0/24 via 10.0.0.134
    up ip route add 10.200.135.0/24 via 10.0.0.135

#Installation

I created this repo so that we have something to base changes/updates/discussions on. It's not very polished and files are scattered all over the place. However, if you look at the instructions below with links to the main Makefiles, you should be able to see what files go where.

#Generate and distribute secrets

Note: Do not run this, as it would overwrite existing keys installed on the hosts. It is mainly intended for reference.

  • Run make gen (generates anything that uses keys, keep safe)
  • Run make dist-secrets to push generated secrets to /root/k8s.sr.ht on all hosts

#Distribute files and install

This step can be repeated as much as needed. It will use the secrets distributed in the previous step.

  • Run make dist-files to distribute all files (minus secrets) to ~/k8s.sr.ht on all hosts
  • Host management: on any of the hosts, you can run make install in one of the following sub-directories of ~/k8s.sr.ht:
    • etcd to install etcd
    • kube-api to install the Kubernetes control plane
    • coredns to install CoreDNS (requires control plane on same host)
    • kubelet to install a Kubernetes node (kubelet etc.)
  • Cluster management: anywhere with kubectl set up, you can run make install in of the following sub-directories of ~/k8s.sr.ht:
    • acls to create/update some basic ACLs
    • ingress-controller to install/update the HAProxy ingress controller
    • ceph-csi to install/update the Ceph CSI driver

#Usage

On all hosts, the admin config can be found at /root/admin.kubeconfig:

export KUBECONFIG=/root/admin.kubeconfig
kubectl get nodes

Note that the kubeconfig contains private keys, so treat it with care. Currently, the API endpoint is not exposed to the outside, but it could be (most people do this), so that we could use kubectl remotely.

The demo service was deployed like this:

kubectl create deployment demo --image=httpd --port=80
kubectl expose deployment demo
kubectl create ingress demo-localhost --rule="demo.sr.ht/*=demo:80"

Test (you can pick any cluster member instead of sakuya2):

curl http://demo.sr.ht -sS --connect-to "::sakuya2.sr.ht:"