Files
infrastructure/K3S_PLANNING.md
2026-01-10 21:42:51 +00:00

3.4 KiB

K3s Cluster Setup Plan (Enterprise Ready)

1. Architecture Overview

We will deploy a High-Availability (HA) K3s cluster consisting of 3 Control Plane nodes (embedded etcd). This setup is resilient against the failure of a single node.

  • Topology: 3 Nodes (Server + Agent mixed).
  • Operating System: Ubuntu 24.04 (via Terraform/Cloud-Init).
  • Networking:
    • VLAN 40 (IP Range: 10.100.40.0/24).
    • VIP (Virtual IP): A floating IP managed by kube-vip for the API Server and Ingress Controller.
  • Ingress Flow:
    • Internet -> Traefik Edge (VM 302) -> K3s VIP (LoadBalancer) -> Traefik Ingress (K3s) -> Pod.
  • GitOps:
    • Tool: FluxCD.
    • Repository Structure:
      • stabify-infra (Current): Bootstraps the nodes, installs K3s, installs Flux Binary.
      • stabify-gitops (New): Watched by Flux. Contains system workloads (Cert-Manager, Traefik Internal) and User Apps.

2. Terraform Changes (terraform/)

We will update the existing locals.tf to reflect the 3-node HA structure.

  • terraform/locals.tf:

    • Refactor vms map:
      • vm-k3s-master-400 (10.100.40.10)
      • vm-k3s-master-401 (10.100.40.11)
      • vm-k3s-master-402 (10.100.40.12)
    • Define VIPs:
      • k3s-api-vip: 10.100.40.1 (or .5) - Endpoint for kubectl and Nodes.
      • k3s-ingress-vip: 10.100.40.2 (or .6) - Endpoint for Traefik Edge.
  • terraform/main.tf:

    • Add opnsense_unbound_host_override resources for the VIPs to ensure internal DNS resolution.

3. Ansible Role Design (infrastructure/ansible/)

We will create a new role k3s and a corresponding playbook.

  • Inventory (inventory.ini):

    • Add [k3s_masters] group.
  • Role: k3s:

    • Task: System Prep: Install open-iscsi, nfs-common, curl. Configure sysctl (bridged traffic).
    • Task: Install K3s (First Node):
      • Exec: curl -sfL https://get.k3s.io | sh -
      • Args: --cluster-init --disable traefik --disable servicelb --tls-san k3s-api.stabify.de
    • Task: Install K3s (Other Nodes):
      • Args: --server https://<First-Node-IP>:6443 --token <Secret>
    • Task: Install Kube-VIP:
      • Deploy Manifest for Control Plane HA (ARP Mode).
      • Deploy Manifest for Service LoadBalancer (ARP Mode).
    • Task: Bootstrap Flux:
      • Install Flux CLI.
      • Run flux bootstrap git ....

4. Network & DNS Strategy

  • DNS Records (OPNsense):

    • vm-k3s-master-*.stabify.de -> Node IPs (Managed by Terraform).
    • k3s-api.stabify.de -> 10.100.40.5 (VIP).
    • *.k3s.stabify.de -> 10.100.40.6 (Ingress VIP).
  • Traefik Edge Config (vm-docker-traefik-302):

    • New Router/Service in config/dynamic/30-k3s.yaml.
    • Rule: HostRegexp('^.+\.k3s\.stabify\.de$')
    • Target: https://10.100.40.6:443 (PassHostHeader=true).

5. Next Steps for Implementation

  1. Refactor Terraform: Update locals.tf to 3 Masters. Apply to create VMs.
  2. DNS Update: Verify OPNsense records.
  3. Ansible Development: Create k3s role.
  4. Execute Ansible: Deploy Cluster.
  5. Flux Bootstrap: Link cluster to GitOps repo.
  6. Traefik Edge: Configure routing.

This plan ensures a clean separation of concerns: Terraform builds the hardware, Ansible installs the OS/Cluster software, and Flux manages the workloads.