Files
talos-proxmox-cluster/terraform/README.md
2025-12-25 20:25:54 +01:00

12 KiB

Talos Cluster on Proxmox - Terraform Configuration

This Terraform project creates and provisions a Talos Kubernetes cluster on Proxmox VE with integrated Proxmox Cloud Controller Manager (CCM) and Container Storage Interface (CSI) driver.

Features

  • 🚀 Automated VM provisioning on Proxmox VE
  • ☁️ Proxmox Cloud Controller Manager - Native Proxmox integration for Kubernetes
  • 💾 Proxmox CSI Driver - Dynamic volume provisioning using Proxmox storage
  • 🔄 High Availability - Multi-node control plane with optional VIP
  • 🌐 Flexible networking - DHCP or static IP configuration
  • 📦 Full stack deployment - From VMs to running Kubernetes cluster

Prerequisites

  1. Proxmox VE server with API access
  2. Terraform >= 1.0
  3. SSH access to Proxmox node
  4. Network requirements:
    • Available IP addresses for VMs (DHCP or static)
    • Network connectivity between VMs
    • Access to download Talos ISO (for initial setup)

Quick Start

1. Create terraform.tfvars

Create a terraform.tfvars file with your Proxmox and cluster configuration:

# Proxmox Connection
proxmox_endpoint = "https://proxmox.example.com:8006"
proxmox_username = "root@pam"
proxmox_password = "your-password"
proxmox_node     = "pve"

# Proxmox API Tokens (required for CCM/CSI)
proxmox_ccm_token_secret = "your-ccm-token-secret"
proxmox_csi_token_secret = "your-csi-token-secret"

# Cluster Configuration
cluster_name     = "talos-cluster"
cluster_endpoint = "https://10.0.0.100:6443"

# VM Configuration
controlplane_count = 3
worker_count       = 2

# Network (DHCP - IPs will be auto-assigned)
# For static IPs, see advanced configuration below

2. Initialize and Apply

terraform init
terraform plan
terraform apply

3. Get Cluster Access

# Get talosconfig
terraform output -raw talosconfig > ~/.talos/config

# Get kubeconfig
terraform output -raw kubeconfig > ~/.kube/config

# Verify cluster
talosctl version --nodes <controlplane-ip>
kubectl get nodes

4. Verify Proxmox Integration

# Check CCM is running
kubectl get pods -n kube-system | grep proxmox-cloud-controller

# Check CSI is running
kubectl get pods -n csi-proxmox

# View available storage classes
kubectl get storageclass

# Create a test PVC
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: proxmox-data
EOF

Configuration Options

Basic Configuration

Variable Description Default
proxmox_endpoint Proxmox API endpoint -
proxmox_username Proxmox username root@pam
proxmox_password Proxmox password -
proxmox_insecure Allow insecure Proxmox API connections true
proxmox_ssh_user SSH user for Proxmox node root
proxmox_node Proxmox node name -
proxmox_storage Storage location for VM disks local
proxmox_network_bridge Network bridge for VMs vmbr40
proxmox_ccm_token_secret Proxmox API token for CCM (sensitive) -
proxmox_csi_token_secret Proxmox API token for CSI (sensitive) -
cluster_name Talos cluster name -
cluster_endpoint Cluster API endpoint -
vm_id_prefix Starting VM ID prefix 800
talos_version Talos version to use v1.9.1
talos_iso_url Custom Talos ISO URL "" (uses default)

Network Configuration

Variable Description Default
controlplane_ips Static IPs for control plane nodes [] (DHCP)
worker_ips Static IPs for worker nodes [] (DHCP)
gateway Default gateway (required for static IPs) ""
netmask Network mask in CIDR notation 24
nameservers DNS nameservers ["1.1.1.1", "8.8.8.8"]
cluster_vip Virtual IP for HA control plane "" (disabled)

Proxmox Integration

Variable Description Default
proxmox_region Region identifier for CCM proxmox

VM Resources

Variable Description Default
controlplane_count Number of control plane nodes 3
worker_count Number of worker nodes 2
controlplane_cpu CPU cores per control plane 2
controlplane_memory Memory (MB) per control plane 4096
controlplane_disk_size Disk size per control plane 20
worker_cpu CPU cores per worker 4
worker_memory Memory (MB) per worker 8192
worker_disk_size Disk size per worker 10

Static IP Configuration

For production deployments, use static IPs. All three parameters (IPs, gateway, and netmask) must be configured together:

# Control plane IPs
controlplane_ips = [
  "10.0.0.101",
  "10.0.0.102",
  "10.0.0.103"
]

# Worker IPs
worker_ips = [
  "10.0.0.104",
  "10.0.0.105"
]

# Network settings (required for static IPs)
gateway     = "10.0.0.1"      # Default gateway
netmask     = 24              # CIDR notation (e.g., 24 = 255.255.255.0)
nameservers = ["1.1.1.1", "8.8.8.8"]  # DNS servers

# Use VIP for control plane endpoint
cluster_vip      = "10.0.0.100"
cluster_endpoint = "https://10.0.0.100:6443"

Important: When using static IPs, you must configure:

  • controlplane_ips and/or worker_ips - List of IP addresses
  • gateway - Network gateway IP address
  • netmask - Network mask in CIDR notation (default: 24)
  • nameservers - DNS servers (default: ["1.1.1.1", "8.8.8.8"])

If any of these are missing, the nodes will use DHCP instead.

High Availability Setup

For HA control plane, configure a virtual IP:

cluster_vip      = "10.0.0.100"
cluster_endpoint = "https://10.0.0.100:6443"
controlplane_count = 3  # Minimum 3 for HA

Custom Talos Version

talos_version = "v1.9.1"
# Or use custom ISO URL
talos_iso_url = "https://custom-mirror.com/talos.iso"

Advanced Configuration

Custom Storage Backend

proxmox_storage = "ceph-storage"  # or "nfs-backup", etc.

Custom Network Bridge

proxmox_network_bridge = "vmbr1"

Custom VM ID Range

vm_id_prefix = 1000  # VMs will be 1000, 1001, 1002, etc.

Proxmox API Token Setup

The CCM and CSI drivers require Proxmox API tokens for authentication. Generate tokens in Proxmox:

  1. Navigate to Datacenter → Permissions → API Tokens
  2. Create a token for CCM with appropriate permissions
  3. Create a token for CSI with storage permissions
  4. Add the token secrets to your terraform.tfvars:
proxmox_ccm_token_secret = "your-ccm-api-token-secret"
proxmox_csi_token_secret = "your-csi-api-token-secret"

Architecture

The project creates:

  1. Control Plane VMs (default: 3)

    • Run Kubernetes control plane components
    • Can schedule workload pods if configured
    • Participate in etcd cluster
    • Run Proxmox CCM for cloud provider integration
  2. Worker VMs (default: 2)

    • Run application workloads
    • Join the cluster automatically
    • Support CSI for dynamic volume provisioning
  3. Talos Configuration

    • Machine secrets and certificates
    • Node-specific configurations
    • Client configurations (talosconfig, kubeconfig)
    • Cloud provider configuration for CCM integration
  4. Proxmox Integration

    • CCM (Cloud Controller Manager): Provides node lifecycle management and metadata
    • CSI (Container Storage Interface): Enables dynamic PV provisioning from Proxmox storage

Workflow

  1. VM Creation: VMs are created in Proxmox with Talos ISO attached
  2. Boot to Maintenance: VMs boot into Talos maintenance mode
  3. Configuration Apply: Terraform applies Talos machine configurations with cloud-provider settings
  4. Cluster Bootstrap: First control plane node bootstraps the cluster
  5. Node Join: Remaining nodes join automatically
  6. Kubeconfig Generation: Cluster credentials are generated
  7. CCM Installation: Proxmox Cloud Controller Manager is deployed (if enabled)
  8. CSI Installation: Proxmox CSI driver and storage class are deployed (if enabled)

Proxmox Integration Details

Cloud Controller Manager (CCM)

The CCM provides:

  • Node Management: Automatic node registration with Proxmox metadata
  • Node Labels: Topology labels (region, zone, instance-type)
  • Node Lifecycle: Proper handling of node additions and removals

Nodes are automatically labeled with:

node.kubernetes.io/instance-type: proxmox
topology.kubernetes.io/region: <proxmox_region>
topology.kubernetes.io/zone: <proxmox_node>

Container Storage Interface (CSI)

The CSI driver provides:

  • Dynamic Provisioning: Automatically create volumes in Proxmox storage
  • Volume Expansion: Support for expanding PVCs
  • Multiple Storage Backends: Use any Proxmox storage (LVM, ZFS, Ceph, NFS, etc.)

Example usage:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: proxmox-data

Accessing the Cluster

Talos CLI

# Export talosconfig
terraform output -raw talosconfig > ~/.talos/config

# Get nodes
talosctl get members

# Get service status
talosctl services

# Access logs
talosctl logs kubelet

Kubernetes CLI

# Export kubeconfig
terraform output -raw kubeconfig > ~/.kube/config

# Get cluster info
kubectl cluster-info
kubectl get nodes -o wide
kubectl get pods -A

# Check Proxmox integrations
kubectl get pods -n kube-system | grep proxmox
kubectl get pods -n csi-proxmox
kubectl get storageclass

Maintenance

Upgrading Talos

# Update talos_version variable
talos_version = "v1.9.2"

# Apply changes
terraform apply

# Or upgrade manually
talosctl upgrade --image ghcr.io/siderolabs/installer:v1.9.2

Scaling Workers

# Update worker_count
worker_count = 5

# Apply changes
terraform apply

Removing the Cluster

terraform destroy

Troubleshooting

VMs not getting IP addresses

For DHCP:

  • Check Proxmox network bridge configuration
  • Verify DHCP server is running on the network
  • Ensure VMs are connected to the correct network bridge

For Static IPs:

  • Verify all required parameters are set: controlplane_ips/worker_ips, gateway, and netmask
  • Check that IPs are in the correct subnet
  • Ensure gateway IP is correct and reachable
  • Verify no IP conflicts with existing devices

Cannot connect to nodes

  • Verify firewall rules allow port 50000 (Talos API)
  • Check VM networking in Proxmox
  • Ensure nodes are in maintenance mode: talosctl version --nodes <ip>

Bootstrap fails

  • Check control plane IPs are correct
  • Verify cluster_endpoint is accessible
  • Review logs: talosctl logs etcd

ISO upload fails

  • Verify SSH access to Proxmox node
  • Check /var/lib/vz/template/iso/ permissions
  • Manually upload ISO if needed

CCM/CSI not working

  • Verify Proxmox API token secrets are correct
  • Check that tokens have appropriate permissions in Proxmox
  • Review template logs for CCM/CSI configuration

Project Structure

.
├── main.tf                    # Main VM, Talos, CCM/CSI resources
├── variables.tf               # Input variables
├── outputs.tf                 # Output values
├── versions.tf                # Provider versions (Talos, Proxmox, Helm, K8s)
├── locals.tf                  # Local values
├── terraform.tfvars           # Your configuration (create this)
├── templates/
│   ├── install-disk-and-hostname.yaml.tmpl
│   ├── static-ip.yaml.tmpl    # Static IP configuration
│   ├── node-labels.yaml.tmpl
│   └── vip-config.yaml.tmpl
└── files/
    ├── cp-scheduling.yaml
    └── cloud-provider.yaml

References

License

Based on examples from siderolabs/contrib