preparation for k3s

This commit is contained in:
Ubuntu
2026-01-10 21:42:51 +00:00
parent f57870280c
commit a415c515e3
16 changed files with 471 additions and 15 deletions

3
.gitignore vendored
View File

@@ -30,6 +30,9 @@ infrastructure/apps/vault/certs/
*~ *~
.DS_Store .DS_Store
# CI/CD Strategy
00-CICD-Strategy/
# OS generated # OS generated
Thumbs.db Thumbs.db

78
K3S_PLANNING.md Normal file
View File

@@ -0,0 +1,78 @@
# K3s Cluster Setup Plan (Enterprise Ready)
## 1. Architecture Overview
We will deploy a High-Availability (HA) K3s cluster consisting of 3 Control Plane nodes (embedded etcd). This setup is resilient against the failure of a single node.
* **Topology:** 3 Nodes (Server + Agent mixed).
* **Operating System:** Ubuntu 24.04 (via Terraform/Cloud-Init).
* **Networking:**
* VLAN 40 (IP Range: `10.100.40.0/24`).
* **VIP (Virtual IP):** A floating IP managed by `kube-vip` for the API Server and Ingress Controller.
* **Ingress Flow:**
* `Internet` -> `Traefik Edge (VM 302)` -> `K3s VIP (LoadBalancer)` -> `Traefik Ingress (K3s)` -> `Pod`.
* **GitOps:**
* **Tool:** FluxCD.
* **Repository Structure:**
* `stabify-infra` (Current): Bootstraps the nodes, installs K3s, installs Flux Binary.
* `stabify-gitops` (New): Watched by Flux. Contains system workloads (Cert-Manager, Traefik Internal) and User Apps.
## 2. Terraform Changes (`terraform/`)
We will update the existing `locals.tf` to reflect the 3-node HA structure.
* **`terraform/locals.tf`**:
* Refactor `vms` map:
* `vm-k3s-master-400` (`10.100.40.10`)
* `vm-k3s-master-401` (`10.100.40.11`)
* `vm-k3s-master-402` (`10.100.40.12`)
* Define VIPs:
* `k3s-api-vip`: `10.100.40.1` (or `.5`) - Endpoint for kubectl and Nodes.
* `k3s-ingress-vip`: `10.100.40.2` (or `.6`) - Endpoint for Traefik Edge.
* **`terraform/main.tf`**:
* Add `opnsense_unbound_host_override` resources for the VIPs to ensure internal DNS resolution.
## 3. Ansible Role Design (`infrastructure/ansible/`)
We will create a new role `k3s` and a corresponding playbook.
* **Inventory (`inventory.ini`)**:
* Add `[k3s_masters]` group.
* **Role: `k3s`**:
* **Task: System Prep:** Install `open-iscsi`, `nfs-common`, `curl`. Configure sysctl (bridged traffic).
* **Task: Install K3s (First Node):**
* Exec: `curl -sfL https://get.k3s.io | sh -`
* Args: `--cluster-init --disable traefik --disable servicelb --tls-san k3s-api.stabify.de`
* **Task: Install K3s (Other Nodes):**
* Args: `--server https://<First-Node-IP>:6443 --token <Secret>`
* **Task: Install Kube-VIP:**
* Deploy Manifest for Control Plane HA (ARP Mode).
* Deploy Manifest for Service LoadBalancer (ARP Mode).
* **Task: Bootstrap Flux:**
* Install Flux CLI.
* Run `flux bootstrap git ...`.
## 4. Network & DNS Strategy
* **DNS Records (OPNsense):**
* `vm-k3s-master-*.stabify.de` -> Node IPs (Managed by Terraform).
* `k3s-api.stabify.de` -> `10.100.40.5` (VIP).
* `*.k3s.stabify.de` -> `10.100.40.6` (Ingress VIP).
* **Traefik Edge Config (`vm-docker-traefik-302`):**
* New Router/Service in `config/dynamic/30-k3s.yaml`.
* Rule: `HostRegexp('^.+\.k3s\.stabify\.de$')`
* Target: `https://10.100.40.6:443` (PassHostHeader=true).
## 5. Next Steps for Implementation
1. **Refactor Terraform:** Update `locals.tf` to 3 Masters. Apply to create VMs.
2. **DNS Update:** Verify OPNsense records.
3. **Ansible Development:** Create `k3s` role.
4. **Execute Ansible:** Deploy Cluster.
5. **Flux Bootstrap:** Link cluster to GitOps repo.
6. **Traefik Edge:** Configure routing.
This plan ensures a clean separation of concerns: Terraform builds the hardware, Ansible installs the OS/Cluster software, and Flux manages the workloads.

39
import_fix.sh Executable file
View File

@@ -0,0 +1,39 @@
#!/bin/bash
set -e
# Pfad zum Terraform Verzeichnis
cd terraform
echo "Starten des Import-Vorgangs..."
echo "Dies registriert existierende VMs im Terraform State, damit sie nicht neu erstellt werden."
echo "Fehler bei nicht existierenden VMs sind normal und können ignoriert werden."
echo ""
# Helper Funktion
import_vm() {
NAME=$1
ID=$2
echo ">>> Importiere $NAME (ID: $ID)..."
# Führt Import aus, ignoriert Fehler wenn VM schon im State ist oder nicht existiert
terraform import "proxmox_vm_qemu.vm_deployment[\"$NAME\"]" $ID || echo "⚠️ Import für $NAME übersprungen (evtl. nicht vorhanden oder bereits im State)."
echo ""
}
# Importiere alle in locals.tf definierten VMs
# Docker
import_vm "vm-docker-mailcow-300" 300
import_vm "vm-docker-apps-301" 301
import_vm "vm-docker-traefik-302" 302
# K3s
import_vm "vm-k3s-master-400" 400
import_vm "vm-k3s-master-401" 401
import_vm "vm-k3s-master-402" 402
# Bastion
import_vm "vm-bastion-900" 900
import_vm "vm-bastion-901" 901
echo "--------------------------------------------------------"
echo "✅ Fertig. Bitte führe jetzt erneut 'terraform plan' aus."
echo "--------------------------------------------------------"

View File

@@ -3,12 +3,12 @@ vm-docker-apps-301.stabify.de ansible_host=10.100.30.11
vm-docker-traefik-302.stabify.de ansible_host=10.100.30.12 vm-docker-traefik-302.stabify.de ansible_host=10.100.30.12
# vm-docker-mailcow-300.stabify.de ansible_host=10.100.30.10 # vm-docker-mailcow-300.stabify.de ansible_host=10.100.30.10
[k3s_hosts] [k3s_masters]
# vm-k3s-master-400.stabify.de ansible_host=10.100.40.10 vm-k3s-master-400.stabify.de ansible_host=10.100.40.10
# ... vm-k3s-master-401.stabify.de ansible_host=10.100.40.11
vm-k3s-master-402.stabify.de ansible_host=10.100.40.12
[all:vars] [all:vars]
ansible_user=ansible ansible_user=ansible
ansible_ssh_common_args='-o StrictHostKeyChecking=no' ansible_ssh_common_args='-o StrictHostKeyChecking=no'
ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible_prod ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible_prod

View File

@@ -0,0 +1,43 @@
---
- name: K3s Cluster Deployment
hosts: k3s_masters
become: true
gather_facts: true
vars:
# Vault Pfad für K3s Secrets
vault_k3s_path: "secret/data/k3s"
# GitOps Repository URL (für common Rolle und ArgoCD)
git_repo_url: "https://git.cloud-infra.prod.openmailserver.de/stabify/infrastructure.git" # Passe dies ggf. an deine echte URL an
pre_tasks:
- name: Lade K3s Token aus Vault
community.hashi_vault.vault_kv2_get:
path: "infrastructure/k3s"
engine_mount_point: "secret"
url: "{{ lookup('env', 'VAULT_ADDR') }}"
token: "{{ lookup('env', 'VAULT_TOKEN') }}"
ca_cert: "{{ lookup('env', 'VAULT_CACERT') | default(playbook_dir ~ '/../../vault-ca.crt') }}"
register: vault_k3s_data
register: vault_k3s_data
delegate_to: localhost
ignore_errors: true
vars:
ansible_connection: local
- name: Setze k3s_token aus Vault (oder Fail wenn nicht vorhanden)
set_fact:
k3s_token: "{{ vault_k3s_data.secret.token }}"
kubevip_version: "{{ vault_k3s_data.secret.kubevip_version | default('v0.8.0') }}"
when: vault_k3s_data.secret.token is defined
- name: Fail wenn kein Token gefunden (Sicherheit)
fail:
msg: "Kein K3s Token in Vault gefunden! Bitte erstelle 'secret/k3s' mit Key 'token'."
when: k3s_token == "SECRET_TOKEN_REPLACE_ME" and vault_k3s_data.secret.token is undefined
roles:
- role: common # User, Docker (nicht zwingend für K3s, aber gut für Tools), CA Certs
- role: k3s

View File

@@ -0,0 +1,4 @@
k3s_version: "v1.28.5+k3s1"
k3s_token: "SECRET_TOKEN_REPLACE_ME" # Sollte via Vault kommen
k3s_api_vip: "10.100.40.5"
k3s_interface: "ens18" # Netzwerk-Interface der VMs

View File

@@ -0,0 +1,15 @@
- name: Erstelle ArgoCD Namespace
shell: /usr/local/bin/kubectl create namespace argocd --dry-run=client -o yaml | /usr/local/bin/kubectl apply -f -
environment:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
- name: Installiere ArgoCD (Stable Manifest)
shell: /usr/local/bin/kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
environment:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
- name: Warte auf ArgoCD Server (Optional, kann dauern)
shell: /usr/local/bin/kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s
ignore_errors: yes
environment:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml

View File

@@ -0,0 +1,72 @@
---
- name: Installiere Abhängigkeiten
apt:
name:
- open-iscsi
- nfs-common
- curl
state: present
update_cache: yes
- name: Lade br_netfilter Kernel Modul
modprobe:
name: br_netfilter
state: present
- name: Stelle sicher, dass br_netfilter beim Boot geladen wird
copy:
content: "br_netfilter\n"
dest: /etc/modules-load.d/k3s.conf
mode: '0644'
- name: Aktiviere Bridged IPv4 Traffic
sysctl:
name: net.bridge.bridge-nf-call-iptables
value: '1'
state: present
sysctl_file: /etc/sysctl.d/k8s.conf
reload: yes
- name: Hole K3s Installations-Skript
get_url:
url: https://get.k3s.io
dest: /usr/local/bin/k3s_install.sh
mode: '0700'
- name: Init First Master Node
shell: >
INSTALL_K3S_VERSION={{ k3s_version }}
K3S_TOKEN={{ k3s_token }}
/usr/local/bin/k3s_install.sh server
--cluster-init
--tls-san {{ k3s_api_vip }}
--tls-san k3s-api.stabify.de
--disable traefik
--disable servicelb
--etcd-expose-metrics=true
when: inventory_hostname == groups['k3s_masters'][0]
args:
creates: /var/lib/rancher/k3s/server/node-token
- name: Warten bis First Master API erreichbar ist
wait_for:
host: "{{ hostvars[groups['k3s_masters'][0]]['ansible_host'] }}"
port: 6443
delay: 10
timeout: 300
when: inventory_hostname != groups['k3s_masters'][0]
- name: Join Other Master Nodes
shell: >
INSTALL_K3S_VERSION={{ k3s_version }}
K3S_TOKEN={{ k3s_token }}
K3S_URL=https://{{ hostvars[groups['k3s_masters'][0]]['ansible_host'] }}:6443
/usr/local/bin/k3s_install.sh server
--tls-san {{ k3s_api_vip }}
--tls-san k3s-api.stabify.de
--disable traefik
--disable servicelb
--etcd-expose-metrics=true
when: inventory_hostname != groups['k3s_masters'][0]
args:
creates: /var/lib/rancher/k3s/server/node-token

View File

@@ -0,0 +1,11 @@
- name: Deploy Kube-VIP RBAC Manifest
template:
src: kube-vip-rbac.yaml.j2
dest: /var/lib/rancher/k3s/server/manifests/kube-vip-rbac.yaml
mode: '0644'
- name: Deploy Kube-VIP DaemonSet Manifest
template:
src: kube-vip-daemonset.yaml.j2
dest: /var/lib/rancher/k3s/server/manifests/kube-vip-daemonset.yaml
mode: '0644'

View File

@@ -0,0 +1,14 @@
- name: Include K3s Installation
include_tasks: install.yml
- name: Include Kube-VIP Setup
include_tasks: kube-vip.yml
when: inventory_hostname == groups['k3s_masters'][0]
- name: Include ArgoCD Setup
include_tasks: argocd.yml
when: inventory_hostname == groups['k3s_masters'][0]
#- name: Include FluxCD Setup
# include_tasks: flux.yml
# when: inventory_hostname == groups['k3s_masters'][0]

View File

@@ -0,0 +1,75 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-vip-ds
namespace: kube-system
spec:
selector:
matchLabels:
app.kubernetes.io/name: kube-vip-ds
template:
metadata:
labels:
app.kubernetes.io/name: kube-vip-ds
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
containers:
- args:
- manager
env:
- name: vip_arp
value: "true"
- name: port
value: "6443"
- name: vip_interface
value: "{{ k3s_interface }}"
- name: vip_cidr
value: "32"
- name: cp_enable
value: "true"
- name: cp_namespace
value: kube-system
- name: vip_ddns
value: "false"
- name: svc_enable
value: "true" # Erlaubt LoadBalancer Services (für Ingress)
- name: svc_leasename
value: plndr-svcs-lock
- name: vip_leaderelection
value: "true"
- name: vip_leasename
value: plndr-cp-lock
- name: vip_leaseduration
value: "5"
- name: vip_renewdeadline
value: "3"
- name: vip_retryperiod
value: "1"
- name: address
value: "{{ k3s_api_vip }}"
- name: prometheus_server
value: :2112
image: ghcr.io/kube-vip/kube-vip:{{ kubevip_version }}
imagePullPolicy: Always
name: kube-vip
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_RAW
hostNetwork: true
serviceAccountName: kube-vip
tolerations:
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists

View File

@@ -0,0 +1,30 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-vip
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:kube-vip-role
rules:
- apiGroups: [""]
resources: ["services", "services/status", "nodes", "endpoints"]
verbs: ["list","get","watch", "update"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["list", "get", "watch", "update", "create"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: system:kube-vip-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-vip-role
subjects:
- kind: ServiceAccount
name: kube-vip
namespace: kube-system

49
setup_k3s_secrets.sh Executable file
View File

@@ -0,0 +1,49 @@
#!/bin/bash
set -e
# Config
VAULT_ADDR="https://10.100.30.11:8200"
VAULT_CA="./vault-ca.crt"
# Check dependencies
if ! command -v vault &> /dev/null; then
echo "❌ 'vault' CLI nicht gefunden."
exit 1
fi
if [ ! -f "$VAULT_CA" ]; then
echo "⚠️ $VAULT_CA nicht gefunden. Versuche Download..."
scp -i ~/.ssh/id_ed25519_ansible_prod ansible@10.100.30.11:/opt/vault/certs/ca.crt "$VAULT_CA"
fi
echo "🔐 Setup K3s Secrets in Vault"
echo "-----------------------------"
# Auth
if [ -z "$VAULT_TOKEN" ]; then
read -sp "Bitte Vault Root Token eingeben: " VAULT_TOKEN
echo ""
export VAULT_TOKEN
fi
export VAULT_ADDR
export VAULT_CACERT="$VAULT_CA"
# 1. Generate K3s Token
K3S_TOKEN=$(openssl rand -base64 32)
echo "✅ K3s Token generiert."
# 2. Set Kube-VIP Version
KUBEVIP_VERSION="v0.8.0"
# 3. Write to Vault
echo "Schreibe nach secret/infrastructure/k3s..."
vault kv put secret/infrastructure/k3s \
token="$K3S_TOKEN" \
kubevip_version="$KUBEVIP_VERSION" \
kubevip_address="10.100.40.5"
echo ""
echo "✅ Secrets erfolgreich angelegt!"
echo " K3s Token: (im Vault gespeichert)"
echo " Kube-VIP IP: 10.100.40.5"

View File

@@ -10,7 +10,8 @@ data "vault_generic_secret" "opnsense" {
path = "secret/infrastructure/opnsense" path = "secret/infrastructure/opnsense"
} }
data "vault_generic_secret" "vm_creds" { data "vault_kv_secret_v2" "vm_creds" {
count = var.use_vault ? 1 : 0 count = var.use_vault ? 1 : 0
path = "secret/infrastructure/vm-credentials" mount = "secret"
name = "infrastructure/vm-credentials"
} }

View File

@@ -1,10 +1,10 @@
locals { locals {
# SSH Public Key for Provisioning # SSH Public Key for Provisioning
ssh_key = var.use_vault ? data.vault_generic_secret.vm_creds[0].data["ssh_public_key"] : var.ssh_public_key ssh_key = var.use_vault ? data.vault_kv_secret_v2.vm_creds[0].data["ssh_public_key"] : var.ssh_public_key
# CI Credentials # CI Credentials
ci_user = var.use_vault ? data.vault_generic_secret.vm_creds[0].data["ci_user"] : var.ci_user ci_user = var.use_vault ? data.vault_kv_secret_v2.vm_creds[0].data["ci_user"] : var.ci_user
ci_password = var.use_vault ? data.vault_generic_secret.vm_creds[0].data["ci_password"] : var.ci_password ci_password = var.use_vault ? data.vault_kv_secret_v2.vm_creds[0].data["ci_password"] : var.ci_password
vms = { vms = {
# VLAN 30: Docker # VLAN 30: Docker
@@ -12,14 +12,19 @@ locals {
"vm-docker-apps-301" = { id = 301, cores = 2, memory = 4096, vlan = 30, tags = "docker,apps", ip = "10.100.30.11", gw = "10.100.30.1" } "vm-docker-apps-301" = { id = 301, cores = 2, memory = 4096, vlan = 30, tags = "docker,apps", ip = "10.100.30.11", gw = "10.100.30.1" }
"vm-docker-traefik-302" = { id = 302, cores = 1, memory = 2048, vlan = 30, tags = "docker,ingress", ip = "10.100.30.12", gw = "10.100.30.1" } "vm-docker-traefik-302" = { id = 302, cores = 1, memory = 2048, vlan = 30, tags = "docker,ingress", ip = "10.100.30.12", gw = "10.100.30.1" }
# VLAN 40: K3s # VLAN 40: K3s (HA Control Plane)
"vm-k3s-master-400" = { id = 400, cores = 2, memory = 4096, vlan = 40, tags = "k3s,master", ip = "10.100.40.10", gw = "10.100.40.1" } "vm-k3s-master-400" = { id = 400, cores = 2, memory = 4096, vlan = 40, tags = "k3s,master", ip = "10.100.40.10", gw = "10.100.40.1" }
"vm-k3s-worker-401" = { id = 401, cores = 2, memory = 4096, vlan = 40, tags = "k3s,worker", ip = "10.100.40.11", gw = "10.100.40.1" } "vm-k3s-master-401" = { id = 401, cores = 2, memory = 4096, vlan = 40, tags = "k3s,master", ip = "10.100.40.11", gw = "10.100.40.1" }
"vm-k3s-worker-402" = { id = 402, cores = 2, memory = 4096, vlan = 40, tags = "k3s,worker", ip = "10.100.40.12", gw = "10.100.40.1" } "vm-k3s-master-402" = { id = 402, cores = 2, memory = 4096, vlan = 40, tags = "k3s,master", ip = "10.100.40.12", gw = "10.100.40.1" }
"vm-k3s-worker-403" = { id = 403, cores = 2, memory = 4096, vlan = 40, tags = "k3s,worker", ip = "10.100.40.13", gw = "10.100.40.1" }
# VLAN 90: Bastion # VLAN 90: Bastion
"vm-bastion-900" = { id = 900, cores = 1, memory = 2048, vlan = 90, tags = "bastion", ip = "10.100.90.10", gw = "10.100.90.1" } "vm-bastion-900" = { id = 900, cores = 1, memory = 2048, vlan = 90, tags = "bastion", ip = "10.100.90.10", gw = "10.100.90.1" }
"vm-bastion-901" = { id = 901, cores = 1, memory = 2048, vlan = 90, tags = "bastion", ip = "10.100.90.11", gw = "10.100.90.1" } "vm-bastion-901" = { id = 901, cores = 1, memory = 2048, vlan = 90, tags = "bastion", ip = "10.100.90.11", gw = "10.100.90.1" }
} }
# Extra DNS entries for VIPs (Virtual IPs)
extra_dns = {
"k3s-api" = { ip = "10.100.40.5", tags = "k3s,vip,api" }
"k3s-ingress" = { ip = "10.100.40.6", tags = "k3s,vip,ingress" }
}
} }

View File

@@ -64,12 +64,20 @@ resource "proxmox_vm_qemu" "vm_deployment" {
tags = each.value.tags tags = each.value.tags
lifecycle { lifecycle {
ignore_changes = [ network ] ignore_changes = [
network,
sshkeys,
ciuser,
cipassword
]
} }
} }
resource "opnsense_unbound_host_override" "dns_entries" { resource "opnsense_unbound_host_override" "dns_entries" {
for_each = local.vms for_each = merge(
{ for k, v in local.vms : k => { ip = v.ip, tags = v.tags } },
local.extra_dns
)
enabled = true enabled = true
hostname = each.key hostname = each.key
@@ -77,3 +85,12 @@ resource "opnsense_unbound_host_override" "dns_entries" {
description = "Managed by Terraform: ${each.value.tags}" description = "Managed by Terraform: ${each.value.tags}"
server = each.value.ip server = each.value.ip
} }
# Wildcard DNS record for K3s Ingress
resource "opnsense_unbound_host_override" "dns_wildcard_k3s" {
enabled = true
hostname = "*"
domain = "k3s.stabify.de"
description = "Managed by Terraform: Wildcard for K3s Ingress VIP"
server = local.extra_dns["k3s-ingress"].ip
}