# K3s Cluster Setup Plan (Enterprise Ready) ## 1. Architecture Overview We will deploy a High-Availability (HA) K3s cluster consisting of 3 Control Plane nodes (embedded etcd). This setup is resilient against the failure of a single node. * **Topology:** 3 Nodes (Server + Agent mixed). * **Operating System:** Ubuntu 24.04 (via Terraform/Cloud-Init). * **Networking:** * VLAN 40 (IP Range: `10.100.40.0/24`). * **VIP (Virtual IP):** A floating IP managed by `kube-vip` for the API Server and Ingress Controller. * **Ingress Flow:** * `Internet` -> `Traefik im k3s Cluster (VIP 10.100.40.6)` -> `Traefik Ingress (K3s)` -> `Pod`. * **GitOps:** * **Tool:** FluxCD. * **Repository Structure:** * `stabify-infra` (Current): Bootstraps the nodes, installs K3s, installs Flux Binary. * `stabify-gitops` (New): Watched by Flux. Contains system workloads (Cert-Manager, Traefik Internal) and User Apps. ## 2. Terraform Changes (`terraform/`) We will update the existing `locals.tf` to reflect the 3-node HA structure. * **`terraform/locals.tf`**: * Refactor `vms` map: * `vm-k3s-master-400` (`10.100.40.10`) * `vm-k3s-master-401` (`10.100.40.11`) * `vm-k3s-master-402` (`10.100.40.12`) * Define VIPs: * `k3s-api-vip`: `10.100.40.1` (or `.5`) - Endpoint for kubectl and Nodes. * `k3s-ingress-vip`: `10.100.40.2` (or `.6`) - Endpoint for Traefik Edge. * **`terraform/main.tf`**: * Add `opnsense_unbound_host_override` resources for the VIPs to ensure internal DNS resolution. ## 3. Ansible Role Design (`infrastructure/ansible/`) We will create a new role `k3s` and a corresponding playbook. * **Inventory (`inventory.ini`)**: * Add `[k3s_masters]` group. * **Role: `k3s`**: * **Task: System Prep:** Install `open-iscsi`, `nfs-common`, `curl`. Configure sysctl (bridged traffic). * **Task: Install K3s (First Node):** * Exec: `curl -sfL https://get.k3s.io | sh -` * Args: `--cluster-init --disable traefik --disable servicelb --tls-san k3s-api.stabify.de` * **Task: Install K3s (Other Nodes):** * Args: `--server https://:6443 --token ` * **Task: Install Kube-VIP:** * Deploy Manifest for Control Plane HA (ARP Mode). * Deploy Manifest for Service LoadBalancer (ARP Mode). * **Task: Bootstrap Flux:** * Install Flux CLI. * Run `flux bootstrap git ...`. ## 4. Network & DNS Strategy * **DNS Records (OPNsense):** * `vm-k3s-master-*.stabify.de` -> Node IPs (Managed by Terraform). * `k3s-api.stabify.de` -> `10.100.40.5` (VIP). * `*.k3s.stabify.de` -> `10.100.40.6` (Ingress VIP). * **Traefik Edge Config (im k3s Cluster):** * File Provider für TLS Passthrough zu k3s Services. * ConfigMap: `traefik-edge-dynamic-k3s` * Rule: `HostSNIRegexp('^.+\.k3s\.stabify\.de$')` * Target: `10.100.40.6:443` (TLS Passthrough). ## 5. Next Steps for Implementation 1. **Refactor Terraform:** Update `locals.tf` to 3 Masters. Apply to create VMs. 2. **DNS Update:** Verify OPNsense records. 3. **Ansible Development:** Create `k3s` role. 4. **Execute Ansible:** Deploy Cluster. 5. **Flux Bootstrap:** Link cluster to GitOps repo. 6. **Traefik Edge:** Configure routing. This plan ensures a clean separation of concerns: Terraform builds the hardware, Ansible installs the OS/Cluster software, and Flux manages the workloads.