A GitOps-driven multi-site homelab managed by ArgoCD, bootstrapped via Terraform IaC and CI/CD to deploy a Talos Kubernetes cluster on Proxmox with Cilium CNI and a Prometheus/Loki/Alloy observability stack.
Three sites with distinct roles, each deployed by its own GitHub Actions workflow:
| Site | Role | Platform | Deploys via |
|---|---|---|---|
| Vieta | Primary homelab: Kubernetes cluster, services, storage | Talos K8s on Proxmox | main-deploy.yaml |
| Minerva | Secondary site: lightweight services | Docker Compose | minerva-deploy.yaml |
| Cloud Edge | Public-facing edge: HAProxy TLS relay, VPN tunneling | NixOS on Oracle Cloud (ARM) | cloud-edge.yaml |
| Domain | Tools |
|---|---|
| IaC & CI/CD | Terraform (S3-backed state), Ansible, GitHub Actions, Tailscale runner, Renovate |
| Compute | Proxmox (Intel NUC), Talos Linux, NixOS, Raspberry Pi workers |
| Orchestration | Kubernetes, ArgoCD (App of Apps), Helm |
| Networking | Cilium (CNI, kube-proxy replacement, Gateway API, Hubble), UniFi, Cloudflare, HAProxy, WireGuard, Tailscale |
| Storage | NFS CSI, local-path-provisioner (planned: Democratic CSI on TrueNAS) |
| Data | CloudNativePG, Crossplane (DBaaS) |
| Observability | Prometheus, Alertmanager, Grafana, Loki, Alloy |
| Identity & Secrets | Authentik (SSO), External Secrets Operator, cert-manager (planned: Vault) |
| Security | Kyverno (admission policy), Tetragon (eBPF runtime security), Trivy (CI scanning) |
| DNS | Blocky (filtering), Unbound (DNSSEC + DoT upstream), Cloudflare |
Push to main triggers an orchestrator workflow that detects which layers changed and runs them in order. PRs get a Terraform plan comment for review, and changes under charts//apps/ are gated by a chart-and-policy workflow that renders every wrapper chart, schema-checks it with kubeconform, and runs the cluster's real Kyverno policies against the output before it can reach ArgoCD. Tailscale connects the GitHub runner to the homelab network. Renovate keeps dependencies (Helm charts, container images, Terraform providers, Action versions, Nix flake refs, and more) up to date by opening PRs against the repo.
Trivy scans for secrets and IaC misconfigurations across Terraform, Helm charts, Kubernetes manifests, and Docker Compose. Findings are reported as SARIF to the GitHub Security tab.
The primary site, structured as four Terraform layers plus the applications deployed by ArgoCD. State flows forward via remote state outputs.
For the physical build the (10" mini-rack, hardware, and 3D-printed mounts) see the Vieta physical setup.
| Layer | Scope |
|---|---|
00-global |
S3 state backend, shared config |
01-network |
UniFi VLANs, firewall, DHCP, DNS; Cloudflare records |
02-infrastructure |
Proxmox VMs/LXCs, Talos cluster bootstrap, NFS |
03-services |
Cluster platform (CNI, certs, ingress, secrets) |
| ArgoCD | Applications via GitOps |
Manages the Vieta site network through the UniFi controller API and Cloudflare. Covers VLANs, zone-based firewall policies, switch port profiles, static DHCP reservations with local DNS, and Cloudflare DNS records.
| VLAN | ID | Subnet | Purpose |
|---|---|---|---|
| Default | 10 | 10.10.10.0/24 |
Consumer devices, IoT, mDNS enabled |
| Athena | 20 | 10.10.1.0/24 |
Homelab infrastructure, network-isolated |
Inter-VLAN traffic is blocked by default. Only SSH, HTTPS, and SMB are permitted from Default into Athena.
| Rule | From | To | Ports | Action |
|---|---|---|---|---|
| Service access | Internal (Default) | Athena | 22, 443, 445 | Allow |
| VPN gateway | Athena | External (10.0.3.2) |
All | Allow |
| VPN lockout | Internal (Default) | External (10.0.3.2) |
All | Block |
Cloudflare manages the lippok.dev zone. A wildcard and root A record are created in 03-services pointing to the Kubernetes Gateway LoadBalancer IP. Oracle records are managed under cloud-edge.
| Subdomain | DNS | TLS terminated at | Use case |
|---|---|---|---|
*.lippok.dev |
Local LB IP | Local Gateway | Local-only services |
*.cloud.lippok.dev |
Oracle IP | HAProxy | Cloud-hosted services |
*.relay.lippok.dev |
Oracle IP | HAProxy TLS passthrough to Local | Proxied services |
- Entry: Client (DoH) to
dns.relay.lippok.devvia Oracle HAProxy. - Tunnel: TLS relay over WireGuard to homelab (E2EE).
- Homelab: Terminates TLS and resolves through Blocky (filtering) then Unbound (DNSSEC).
- Upstream: ODoH-style via VPN + DoT to 1.1.1.1.
Validation: dns-check.cloud.lippok.dev only resolves to oci.cloud.lippok.dev behind Blocky.
Provisions VMs and containers on a Proxmox host (Intel NUC) and bootstraps a Talos-based Kubernetes cluster. All IPs and MACs are sourced from 01-network via remote state.
- OS: Talos Linux: immutable, API-driven, no SSH
- Image: Built via Talos Image Factory with
qemu-guest-agentextension - CNI: Set to
noneat bootstrap (Cilium installed in03-services) - kube-proxy: Disabled (Cilium takes over)
| Node Role | Count | Platform |
|---|---|---|
| Control plane | 1 | Proxmox VM |
| Workers (general) | 3 | Raspberry Pi (Athena VLAN) |
| Worker (database) | 1 | Proxmox VM, tainted dedicated=database:NoSchedule |
Debian 12 LXC container with dual storage (SSD for OS, HDD for data). Exports /srv/nfs/kubernetes to the cluster. Proxmox firewall defaults to DROP; only K8s nodes and the NUC are whitelisted via IP set. Configured via Ansible.
A Debian LXC running WireGuard as the homelab end of the Cloud Edge tunnel (Ansible-provisioned). It NATs the Oracle edge into the Athena VLAN, letting the cloud HAProxy reach the internal services (for the *.relay.lippok.dev path.)
Exports kubeconfig, talosconfig, cluster info, and NFS server details for the next layer.
Bootstraps all platform-level services that make the cluster operational. Reads state from both 01-network (LB CIDR) and 02-infrastructure (kubeconfig, NFS server). Everything here is a prerequisite for the applications managed by ArgoCD.
Cilium replaces kube-proxy and serves as the cluster CNI. It handles LoadBalancer IP advertisement via L2 announcements on all nodes (the IP pool is sourced from the 01-network output), provides ingress through the Kubernetes Gateway API (cilium gatewayClassName), and exposes flow-level observability through Hubble with its UI and relay.
A single Gateway resource handles all ingress with HTTP (80) and HTTPS (443) listeners. The HTTPS listener terminates TLS with a wildcard *.lippok.dev certificate. Services are exposed by creating HTTPRoute resources in their own namespaces.
- Issuer: Let's Encrypt (production ACME)
- Challenge: DNS-01 via Cloudflare API token
- Certificate: Wildcard
*.lippok.dev+ root, stored in thegatewaynamespace
The cluster mounts persistent volumes via csi-driver-nfs, talking to the NFS server provisioned in 02-infrastructure (IP and export path passed through outputs). The default StorageClass nfs-client provides NFS 4.1 mounts to all pods.
Why NFS? Several seemingly odd decisions in this cluster trace back to one constraint: avoiding SD card wear on the Raspberry Pi workers. Local PVCs on the Pis would burn through SD cards quickly under typical Kubernetes write patterns, so persistent storage is offloaded to NFS. The same constraint is why the database worker is a dedicated VM on the NUC (tainted
dedicated=database:NoSchedule) rather than scheduling Postgres onto the Pis.
Planned migration: Once the new NAS/TrueNAS is online, remove the temporary Proxmox database worker VM, NFS LXC, and
local-path-provisioner. Switch to Democratic CSI for dynamic ZFS-backed iSCSI/NFS provisioning and snapshots, with a new Talos database VM hosted on TrueNAS.
Manages secret distribution across namespaces.
- Backend (current): Kubernetes secrets in a dedicated
secret-storenamespace, seeded by Terraform. - Backend (planned): HashiCorp Vault
- ClusterSecretStore reads from the temporary backend via a dedicated ServiceAccount + RBAC
ArgoCD is deployed via Helm in 03-services. Everything beyond the platform services is managed through ArgoCD's App of Apps pattern: a root Application watches the apps/ directory in this repo and automatically syncs each application definition to the cluster.
| Service | Role |
|---|---|
| ArgoCD | GitOps controller: self-managed via App of Apps |
| CloudNative-PG | PostgreSQL operator; provides databases for services |
| Crossplane | DBaaS: provisions Postgres databases, PgBouncer, and credentials |
| Kyverno | Admission policy engine; PSS Restricted via kyverno-policies + custom rules, in Enforce mode |
| Tetragon | Cilium eBPF runtime-security agent; observe-only TracingPolicy tripwires shipped to Loki via Alloy |
| Local Path Provisioner | Node-local dynamic storage for DBs |
A unified Grafana-stack for metrics, logs, and alerting (Prometheus + Alloy + Loki).
| Service | Role |
|---|---|
| kube-prometheus-stack | Prometheus + Alertmanager + Grafana; scrapes cluster metrics plus Proxmox and UniFi via dedicated exporters and Cilium/Hubble |
| Loki | Log aggregation backend (single-binary, filesystem-backed) |
| Alloy | Log/telemetry agent: DaemonSet collects pod logs + Talos/Proxmox/UniFi syslog; StatefulSet ships Kubernetes events |
| Service | Role |
|---|---|
| Authentik | Self-hosted identity provider and SSO; backed by CNPG PostgreSQL |
| Service | Role |
|---|---|
| Tailscale Operator | Kubernetes-native Tailscale integration for secure mesh access |
| Blocky + Unbound | Internal DNS stack: Blocky for filtering/caching, Unbound as DNSSEC-validating resolver with DoT upstream |
| Gateway External Routes | Nginx reverse-proxy deployed as HTTPRoute targets to bridge non-Kubernetes hosts (NAS, Proxmox, router) into cluster ingress |
| Service | Role |
|---|---|
| Gatus | Endpoint health monitoring and status page; Discord alerting, PostgreSQL-backed history |
| IT-Tools | Self-hosted suite of developer and network utilities |
Secondary site running services via Docker Compose. Deployed via minerva-deploy.yaml.
Public-facing edge node on Oracle Cloud's Always Free ARM tier. Provides:
- HAProxy: SNI routing for
*.cloudand*.relaysubdomains - WireGuard: encrypted tunnel back to the homelab
- Tailscale: out-of-band management
| Layer | Scope |
|---|---|
cloud-edge/*.tf |
OCI instance, VCN, security list, edge subnet/firewall, Cloudflare *.cloud and *.relay records |
cloud-edge/nixos/ |
NixOS flake (deployed via nixos-anywhere); full host configuration for the oracle-edge node |