makerhouse_network
Public scripts, services, and configuration for running MakerHouse's home network. This network supports:
- TODO features here
For more high level details, see [this blog post](TODO TODO TODOOOOOO)
TODO use the drawing at https://docs.google.com/drawings/d/1UkQKlT5fA8L5bAdiAecp-bR1siNsGnlf4KK2kBhsDHk/edit
Setup
Setting up a replicated pi cluster from scratch is an involved process consisting of several steps:
Setting up the cluster
- Purchasing the hardware
- (optional) Network setup
- Flashing the OS
- Installing K3S and linking the nodes together
Configuring the cluster to be useful
- Configuring the load balancer and reverse proxy
- Installing a distributed storage solution
- Setting up SSL certificate handling and dynamic DNS
Setting up customization for IoT and other uses
- Deploying an image registry for custom container images
- Setting up monitoring/alerting and IoT messaging
There are knowledge prerequisites for following this guide:
- Some basic networking (e.g. how to find a remote device's IP address and SSH into it)
- Linux command line fundamentals (navigating to files, opening and editing them, and running commands)
- It's also useful to know what DHCP is and how to configure it and subnets in your router, for the optional network setup step.
Even if you have advanced knowledge of kubernetes, be prepared to spend several hours on initial setup, plus an hour or two here and there to further refine it.
Purchasing the Hardware
For the cluster network, you will need:
- An ethernet switch (preferably gigabit) with as many ports as the number of nodes in your cluster, plus one.
- A power supply for your switch
- An ethernet cable running to whatever existing network you have.
For each node, you will need:
- A raspberry pi 4 (or better), recommended 4GB. Ideally all nodes are the same type of pi with the same hardware specs.
- A USB C power supply (5V with at least 2A)
- A short ethernet cable (to connect the pi to the network switch)
For sufficient storage, you will need (per node):
- A USB 3 NVMe M.2 SSD enclosure https://www.amazon.com/gp/product/B07MNFH1PX
- An NVMe M.2 SSD (I picked this 256GB one)
Before continuing on:
- connect your switch to power and the LAN
- connect each raspberry pi via ethernet to the switch (whichever port doesn't matter)
- Install an SSD into each enclosure, then plug one enclosures into one of the blue USB ports on each raspberry pi
- At this point, it helps to label the SSDs with the name you expect each node to be, e.g.
k3s1
,k3s2
etc. to keep track of where the image 'lives'.
- At this point, it helps to label the SSDs with the name you expect each node to be, e.g.
A note on earlier versions of raspbery pi:
Try to avoid using raspberry pi's earlier than the pi 4. To check for compatibility, run:
uname -a
If the output contains armv6l then kubernetes does not support the device. There are precompiled k8s binaries for armv6l which you could get, but you’d have to compile manually. This issue describes that kubernetes support for armv6l has been dropped.
A comment at the end of that issue links to compiled binaries for armv6l:
https://github.com/aojea/kubernetes-raspi-binaries
(Optional) Network setup
This guide will assume your router is set up with a LAN subnet of 192.168.0.0/23
(i.e. allowing for IP addresses from 192.168.0.1
all the way to 192.168.1.254
).
192.168.0.1
is the address of the router- IP addresses from
192.168.0.2-254
are for exposed cluster services (i.e. virtual devices) - IP addresses from
192.168.1.2-254
are for physical devices (the raspi's, other IoT devices, laptops, phones etc.)- We recommend having a static IP address range not managed by DHCP, e.g.
192.168.1.2-30
and avoiding leasing192.168.1.1
as it'd be confusing.
- We recommend having a static IP address range not managed by DHCP, e.g.
If you wish to have public services, set up port forwarding rules for 192.168.0.2
(or the equivalent loadBalancerIP
set below) for ports 80 and 443, so that your services can be viewed outside the local network.
Flashing the OS
Setup SSD boot
Follow these instructions to install a USB bootloader onto each raspberry pi. Stop when you get to step 9 (inserting the Raspberry Pi OS) as we'll be installing Ubuntu instead.
Use https://www.balena.io/etcher/ or similar to write an Ubuntu 20.04 ARM 64-bit LTS image to one of the SSDs. We'll do the majority of setup on this drive, then clone it to the other pi's (with some changes).
Enable cgroups and SSH
Unplug and re-plug the SSD, then navigate to the boot
partition and ensure there's a file labeled ssh
there (if not, create a blank one). This allows us to remote in to the raspi's.
Now we will enable cgroups which are used by k3s to manage the resources of processes that are running on the cluster.
Append to /boot/firmware/cmdline.txt (see here):
cgroup_enable=memory cgroup_memory=1
Example of a correct config:
ubuntu@k3s1:~$ cat /boot/firmware/cmdline.txt
net.ifnames=0 dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc cgroup_enable=memory cgroup_memory=1
Verify installation
Plug in the SSD, then plug in power to your raspberry pi. Look on your router to find the IP address of the raspberry pi,
You should be able to SSH into it with username and password ubuntu
.
While we're inside, run passwd
to change away from the default password.
Run sudo shutdown now
(sudo password is ubuntu
) and remove power once its led stops blinking.
Clone to other pi's
Remove the SSD and use your software of choice (e.g. gparted
for linux) to clone it to the other blank SSDs. For each SSD, mount it and edit /etc/hostname to be something unique (e.g. k3s1
, k3s2
...)
At this time, you can edit your router settings to assign static IP addresses to each raspberry pi for easier access later.
Installing k3s and linking the nodes together
We will have one server node named k3s1
and two worker nodes (k3s2
and k3s3
). These instructions generally follow the installation guide from Rancher.
Set up k3s1 as master
SSH into the pi, and run the install script from get.k3s.io (see install options for more details):
export INSTALL_K3S_VERSION=v1.19.7+k3s1
curl -sfL https://get.k3s.io | sh -s - --disable servicelb --disable local-storage
Note:
- We include the K3S version for repeatability.
- ServiceLB and local storage are disabled to make way for MetalLB and Longhorn (distributed storage) configured later in this guide.
Before exiting k3s1
, run sudo cat /var/lib/rancher/k3s/server/node-token
and copy it for the next step of linking the client nodes.
Install and link the remaining nodes
To install on worker nodes and add them to the cluster, run the installation script with the K3S_URL and K3S_TOKEN environment variables. Note use of raw IP - this is more reliable than depending on the cluster DNS (Pihole) to be serving, since that service will itself be hosted on k3s.
export K3S_URL=https://<k3s1 IP address>:6443
export INSTALL_K3S_VERSION=v1.19.7+k3s1
export K3S_TOKEN=<token from k3s1>
curl -sfL https://get.k3s.io | sh -
Where K3S_URL is the URL and port of a k3s server, and K3S_TOKEN comes from /var/lib/rancher/k3s/server/node-token
on the server node (described in the prior step)
Verifying
That should be it! You can confirm the node successfully joined the cluster by running kubectl get nodes
when SSH'd into `k3s1:
~ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k3s1 Ready control-plane,master 5m v1.21.0+k3s1
k3s2 Ready <none> 1m v1.21.0+k3s1
k3s3 Ready <none> 1m v1.21.0+k3s1
Set Up Remote Access
It's useful to run cluster management commands from a personal computer rather than having to SSH into the master every time.
Let's grab the k3s.yaml file from master, and convert it into our local config:
ssh ubuntu@k3s1 "sudo cat /etc/rancher/k3s/k3s.yaml" > ~/.kube/config
Now edit the server address to be the address of the pi, since from the server's perspective the master is localhost
:
sed -i "s/127.0.0.1/<actual server IP address>/g" ~/.kube/config
Configuring the load balancer and reverse proxy
We will be using MetalLB to allow us to "publish" virtual cluster services on actual IP addresses (in our 192.168.0.2-254
range). This allows us to type in e.g. 192.168.0.10
in a browser and see a webpage hosted from our cluster, without having a device with that specific IP address.
We will also use Traefik to reverse-proxy incoming requests. This lets us different services respond to different subdomains (mqtt.mkr.house
and registry.mkr.house
, for instance) without having to do lots of manual IP address mapping.
MetalLB load balancing / endpoint handling
Install MetalLB onto the cluster following https://metallb.universe.tf/installation/:
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.5/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.5/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
kubectl apply -f metallb-configmap.yml
- See
./core/metallb-configmap.yml
- See
Note: instructions say to do kubectl edit configmap -n kube-system kube-proxy
but there's no such config map in k3s. This wasn't a problem for our installation.
Test whether metallb is working by starting an exposed service, then cleaning up after:
5.kubectl apply -f ./core/lbtest.yaml
6. kubectl describe service hello
* Look for "IPAllocated" in event log * Visit 192.168.0.3
and confirm "Welcome to nginx!" is visible 7. kubectl delete service hello
8. kubectl delete deployment hello
Troubleshooting
Some failure modes of MetalLB cause only a fraction of the VIPs to not be responsive.
Check to see if all MetalLB pods are in state "running"
-
kubectl get pods -n metallb-system -o wide
``` speaker-7l7kv 1/1 Running 2 16d 192.168.1.5 pi4-1 <none> <none> controller-65db86ddc6-fkpnj 1/1 Running 2 16d 10.42.0.75 pi4-1 <none> <none> speaker-st749 1/1 Running 1 16d 192.168.1.7 pi4-3 <none> <none> speaker-8wcwj 1/1 Running 0 16m 192.168.1.6 pi4-2 <none> <none> ```
More details - download kubetail - see bottom of this page
./kubetail.sh -l component=speaker -n metallb-system
- If you see an error like "connection refused" referencing 192.168.1.#:7946, check to see if one of the "speaker" pods isn't actually running.
Traefik configuration
Traefik is already installed by default with k3s. We still need to configure it, though.
Generate the dashboard password:
htpasswd -c passwd admin
echo ./passwd
- get the part after the colon, before the trailing slash. That's
$password
- Update config (
/var/lib/rancher/k3s/server/manifests/traefik.yaml
, move it totraefik-customized.yaml
):
ssl.insecureSkipVerify: true
metrics.serviceMonitor.enabled: true
dashboard.enabled: true
dashboard.serviceType: "LoadBalancer"
dashboard.auth.basic.admin: $password
loadBalancerIP: "192.168.0.2"
logLevel: "debug"
- Edit
/etc/systemd/system/k3s.service
and add--disable traefik
to disable original traefik config
sudo systemctl daemon-reload
sudo service k3s restart
- Test the configuration:
kubectl apply -f ./core/default-ingress.yml
kubectl get ingress
- You should see something like
hello <none> i.mkr.house 192.168.0.2 80 2m2s
- You should see something like
Note: Attempts to query *.mkr.house
internally lead to the router admin page. You'll need to use a mobile network to test external ingress properly, i.e. that with the lbtest.yaml and default-ingress.yml applied, a "Welcome to nginx!" page is displayed from outside the network.
Troubleshooting tips
- You can use
journalctl -u k3s
to view k3s logs and look for errors.
Installing a distributed storage solution
Now we can set up a distributed storage solution, so that we can host things on any of the raspberry pi's that can move freely between them, without worring about locality of data to any particular pi.
We'll be using Longhorn, the recommended solution from Rancher.
Follow the installation guide to set it up. See core/longhorn.yaml
for the MakerHouse configured version.
Note: this solution requires an arm64 architecture, NOT armhf/armv7l which is the default for Raspbian / Raspberry PI OS.
Be sure to also set it as the default storage class, or else certain helm charts will fail to provision their persistent volumes without specifying storageClass
specifically:
kubectl patch storageclass longhorn -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
The Longhorn UI is not exposed by default; you can expose it with these instructions.
Setting up SSL certificate handling and dynamic DNS
Now we will set up SSL certificate handling, so that we can serve https pages without browsers complaining about "risky business".
Dynamic DNS will also be configured so that an external DNS provider (in our case, Hover) can direct web traffic to our cluster using a domain name.
Certificate Management
The following instructions are based on https://opensource.com/article/20/3/ssl-letsencrypt-k3s, but with substitutions for arm64 packages (this tutorial assumes just "arm").
Note that you will need to have ports 80 and 443 forwarded to whatever address is given by kubectl get ingress
, which is what Traefik is configured to use in /var/lib/rancher/k3s/server/manifests/traefik-customized.yaml
(See "Traefik configuration" above).
The first two instructions aren't needed if core/cert-manager-arm.yaml
is correct for the setup:
curl -sL https://github.com/jetstack/cert-manager/releases/download/v0.11.0/cert-manager.yaml | sed -r 's/(image:.*):(v.*)$/\1-arm64:\2/g' > cert-manager-arm.yaml
grep "image:" cert-manager-arm.yaml
Now we apply the cert manager:
kubectl create namespace cert-manager
kubectl apply -f cert-manager-arm.yaml
kubectl --namespace cert-manager get pods
kubectl apply -f letsencrypt-issuer-prod.yaml
kubectl apply -f ingresstest.yaml
(TODO ingress test file)- including "annotations" and "tls" sections described here, "request a certificate for our website"
kubectl get certificate
- Should be "true", although this may take a couple seconds after init
- If not, check if
i.mkr.house
resolves to the current house IP. May have to update Hover manually for this portion.
kubectl describe certificate
- Should say "Certificate issued successfully"
- Confirm behavior by going to https://i.mkr.house from external network and seeing the test page.
Private Registry
A private registry hosts customized containers - such as our custom NodeRed installation with specific addons for handling google sheets, google assistant etc.
This parallels the guide at https://www.linuxtechi.com/setup-private-docker-registry-kubernetes/
For "simple password" i.e. htpasswd setup (following these instructions):
sudo apt -y install apache2-utils
htpasswd -Bc htpasswd registry_htpasswd
kubectl create secret generic private-registry-htpasswd --from-file ./htpasswd
kubectl describe secret private-registry-htpasswd
- Values:
user: registry_htpasswd
pass: <your password here>
Then start the deployment:
kubectl apply -f private-registry.yml
- This creates a persistent volume (via Longhorn), deployment/pod, an exposed service on
192.168.0.5
and a TLS certificate.
- This creates a persistent volume (via Longhorn), deployment/pod, an exposed service on
- Add to pihole DNS: "registry" and "registry.lan" mapping to that IP
To test the registry, let's try tagging and pushing an image:
- docker login registry.mkr.house:443
- (add username & password when prompted)
docker pull ubuntu:20.04
docker tag ubuntu:20.04 registry.mkr.house:443/ubuntu
docker push registry.mkr.house:443/ubuntu
To see what's in the registry:
curl -X GET --basic -u registry_htpasswd https://registry.mkr.house:443/v2/_catalog | python -m json.tool
To pull the image:
docker pull registry.mkr.house:443/ubuntu
Now we need to set up each node so it knows to look for the registry, following these instructions (note: not TLS)
-
ssh ubuntu@k3s1
-
sudo vim /etc/rancher/k3s/registries.yaml
mirrors: "registry.mkr.house:443": endpoint: - "https://registry.mkr.house:443" configs: "registry.mkr.house:443": auth: username: "registry_htpasswd" password: "r,A!U9@p>N^(nW!Ja-~6~h" tls: insecure_skip_verify: true
-
sudo service k3s restart
, then logout
Let's copy it to the remaining nodes and reboot them:
scp ubuntu@k3s1:/etc/rancher/k3s/registries.yaml .
scp ./registries.yaml ubuntu@k3s2:/home/ubuntu/
ssh ubuntu@k3s2
sudo mkdir -p /etc/rancher/k3s/
sudo mv registries.yaml /etc/rancher/k3s/
sudo service k3s-agent restart
- Repeat steps 11-15 for
k3s3
.
Prometheus monitoring & Grafana dashboarding
We'll set up Prometheus to collect metrics for us - including timeseries data we expose from IoT devices via NodeRed.
Grafana will host dashboards showing visualizations of the data we collect.
To install Prometheus we will be using Helm, as there is a nice community provided helm "chart" that does a lot of config and setup work for us.
helm repo add prometheus-community [https://prometheus-community.github.io/helm-charts](https://prometheus-community.github.io/helm-charts)
helm upgrade --install prometheus prometheus-community/kube-prometheus-stack --values k3s-prometheus-stack-values.yaml
- If you need to modify the config, you can see what changes to the
*values.yaml
file do by running:helm upgrade **--dry-run** prometheus prometheus-community/kube-prometheus-stack --values k3s-prometheus-stack-values.yaml
We'll set up an additional scrape config (for e.g. nodered custom metrics; see here for documentation on the config).
kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml --dry-run -oyaml > additional-scrape-configs.yaml
kubectl apply -f additional-scrape-configs.yaml
Troubleshooting
If prometheus runs out of space, the "prometheus-prometheus-kube-prometheus-prometheus-0" job will crashloop forever with an obscure stack trace. Resizing the volume that prometheus uses is somewhat tricky:
- go to 192.168.0.4 (the longhorn web ui) to assess how much storage you can assign.
kubectl edit deployment prometheus-kube-prometheus-operator
- Set "replicas" to 0. The operator automatically updates other prometheus entities in kubernetes, so if it's running you can't edit replicasets etc. without them immediately being reverted.
kubectl edit statefulset prometheus-prometheus-kube-prometheus-prometheus
- Set "replicas" to 0. This generates the pod which binds to the data volume. Longhorn storage must be unbound before it can be resized.
vim ~/makerhouse/k3s/k3s-prometheus-stack-values.yaml
- Under prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage, change to e.g. "50Gi"
helm upgrade prometheus prometheus-community/kube-prometheus-stack --values k3s-prometheus-stack-values.yaml
- Longhorn should indicate the volume is being resized. You can also check with
kubectl describe pvc prometheus-prometheus-prometheus-kube-prometheus-prometheus-0
and look for an event like "External resizer is resizing volume pvc-9da184ed-28f9-48d1-82ea-3e0c0a93cf1d" - If the status of the pvc is still "Bound", run
kubectl get pods | grep prometheus
to see whether the prometheus operator or the main prometheus pod is still running for some reason. It should be deletable withkubectl delete pod <foo>
if the deployment and statefulset are both set to 0 replicas.
- Longhorn should indicate the volume is being resized. You can also check with
If you want to delete unneeded metrics:
curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=a_bad_metric&match[]={region="mistake"}'
curl -X POST -g 'http://prometheus:9090/api/v1/admin/tsdb/delete_series?match[]={instance="192.168.1.5:6443"}'
- Deletes all metrics for a particular target/instance.
curl -X POST -g [http://prometheus:9090/api/v1/admin/tsdb/clean_tombstones](http://prometheus:9090/api/v1/admin/tsdb/clean_tombstones)
- Do this to actually garbage collect the data - note that this may grow the used disk size (up to 2X if you're deleting most things!) before it shrinks it
MQTT (NodeRed + Mosquitto)
We will be using MQTT to pass messages to and from embedded IoT and other devices, and Node-RED to set up automation flows based on messages seen.
Let's build the nodered image to include some extra plugins not provided by the default one:
cd ./nodered && docker build -t registry.mkr.house:443/nodered:latest && docker image push registry.mkr.house:443/nodered:latest
Both MQTT and NodeRed are included in the mqtt.yaml
config. "mosquitto" is the specific MQTT broker we're installing.
kubectl apply -f mqtt.yaml -f configmap-mosquitto.yml
To support Google Assistant commands, we'll need a JWT file. More details on the plugin page for how to acquire this file for your particular instance.
kubectl create secret generic nodered-jwt-key --from-file=/home/ubuntu/makerhouse/k3s/secretfile.json
Maintenance Log
2021-04-30 Master node reinstall
Prep:
- Set router DHCP to 8.8.8.8 DNS
- Copied pihole config ("Teleporter" setting)
- Saved Nodered flows
- TODO Copy k3s keys
Unlisted dependency:
- When setting up SSL cert-manager, certificates couldn’t be issued because the Hover IP hadn’t been updated. Manually update IP in Hover to current house IP.
2021-07-22 personal website install
Needed to extend the "SUBDOMAIN" env var in ddns-lexicon.yml, and possibly also add the record to hover.com (may be doing an update, not an upsert?) in addition to creating ingress/service/deployment k3s configs
2021-09-02 pihole out of disk
Ran pihole -g -r
to recreate gravity.db, also deleted /etc/pihole/pihole-FTL.db