456 lines
13 KiB
Markdown
456 lines
13 KiB
Markdown
# Talos Cluster on Proxmox - Terraform Configuration
|
|
|
|
This Terraform project creates and provisions a Talos Kubernetes cluster on Proxmox VE with integrated Proxmox Cloud Controller Manager (CCM) and Container Storage Interface (CSI) driver.
|
|
|
|
## Features
|
|
|
|
- 🚀 **Automated VM provisioning** on Proxmox VE
|
|
- ☁️ **Proxmox Cloud Controller Manager** - Native Proxmox integration for Kubernetes
|
|
- 💾 **Proxmox CSI Driver** - Dynamic volume provisioning using Proxmox storage
|
|
- 🔄 **High Availability** - Multi-node control plane with optional VIP
|
|
- 🌐 **Flexible networking** - DHCP or static IP configuration
|
|
- 📦 **Full stack deployment** - From VMs to running Kubernetes cluster
|
|
|
|
## Prerequisites
|
|
|
|
1. **Proxmox VE** server with API access
|
|
2. **Terraform** >= 1.0
|
|
3. **SSH access** to Proxmox node
|
|
4. **Network requirements**:
|
|
- Available IP addresses for VMs (DHCP or static)
|
|
- Network connectivity between VMs
|
|
- Access to download Talos ISO (for initial setup)
|
|
|
|
## Quick Start
|
|
|
|
### 1. Create terraform.tfvars
|
|
|
|
Create a `terraform.tfvars` file with your Proxmox and cluster configuration:
|
|
|
|
```hcl
|
|
# Proxmox Connection
|
|
proxmox_endpoint = "https://proxmox.example.com:8006"
|
|
proxmox_username = "root@pam"
|
|
proxmox_password = "your-password"
|
|
proxmox_node = "pve"
|
|
|
|
# Proxmox API Tokens (required for CCM/CSI)
|
|
proxmox_ccm_token_secret = "your-ccm-token-secret"
|
|
proxmox_csi_token_secret = "your-csi-token-secret"
|
|
|
|
# Cluster Configuration
|
|
cluster_name = "talos-cluster"
|
|
cluster_endpoint = "https://10.0.0.100:6443"
|
|
|
|
# VM Configuration
|
|
controlplane_count = 3
|
|
worker_count = 2
|
|
|
|
# Network (DHCP - IPs will be auto-assigned)
|
|
# For static IPs, see advanced configuration below
|
|
```
|
|
|
|
### 2. Initialize and Apply
|
|
|
|
```bash
|
|
terraform init
|
|
terraform plan
|
|
terraform apply
|
|
```
|
|
|
|
### 3. Get Cluster Access
|
|
|
|
```bash
|
|
# Get talosconfig
|
|
terraform output -raw talosconfig > ~/.talos/config
|
|
|
|
# Get kubeconfig
|
|
terraform output -raw kubeconfig > ~/.kube/config
|
|
|
|
# Verify cluster
|
|
talosctl version --nodes <controlplane-ip>
|
|
kubectl get nodes
|
|
```
|
|
|
|
### 4. Verify Proxmox Integration
|
|
|
|
```bash
|
|
# Check CCM is running
|
|
kubectl get pods -n kube-system | grep proxmox-cloud-controller
|
|
|
|
# Check CSI is running
|
|
kubectl get pods -n csi-proxmox
|
|
|
|
# View available storage classes
|
|
kubectl get storageclass
|
|
|
|
# Create a test PVC
|
|
kubectl apply -f - <<EOF
|
|
apiVersion: v1
|
|
kind: PersistentVolumeClaim
|
|
metadata:
|
|
name: test-pvc
|
|
spec:
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
resources:
|
|
requests:
|
|
storage: 10Gi
|
|
storageClassName: proxmox-data
|
|
EOF
|
|
```
|
|
|
|
## Configuration Options
|
|
|
|
### Basic Configuration
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `proxmox_endpoint` | Proxmox API endpoint | - |
|
|
| `proxmox_username` | Proxmox username | `root@pam` |
|
|
| `proxmox_password` | Proxmox password | - |
|
|
| `proxmox_insecure` | Allow insecure Proxmox API connections | `true` |
|
|
| `proxmox_ssh_user` | SSH user for Proxmox node | `root` |
|
|
| `proxmox_node` | Proxmox node name | - |
|
|
| `proxmox_storage` | Storage location for VM disks | `local` |
|
|
| `proxmox_network_bridge` | Network bridge for VMs | `vmbr40` |
|
|
| `proxmox_ccm_token_secret` | Proxmox API token for CCM (sensitive) | - |
|
|
| `proxmox_csi_token_secret` | Proxmox API token for CSI (sensitive) | - |
|
|
| `cluster_name` | Talos cluster name | - |
|
|
| `cluster_endpoint` | Cluster API endpoint | - |
|
|
| `vm_id_prefix` | Starting VM ID prefix | `800` |
|
|
| `talos_version` | Talos version to use | `v1.9.1` |
|
|
| `talos_iso_url` | Custom Talos ISO URL | `""` (uses default) |
|
|
|
|
### Network Configuration
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `controlplane_ips` | Static IPs for control plane nodes | `[]` (DHCP) |
|
|
| `worker_ips` | Static IPs for worker nodes | `[]` (DHCP) |
|
|
| `gateway` | Default gateway (required for static IPs) | `""` |
|
|
| `netmask` | Network mask in CIDR notation | `24` |
|
|
| `nameservers` | DNS nameservers | `["1.1.1.1", "8.8.8.8"]` |
|
|
| `cluster_vip` | Virtual IP for HA control plane | `""` (disabled) |
|
|
|
|
### Proxmox Integration
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `proxmox_region` | Region identifier for CCM | `proxmox` |
|
|
|
|
### VM Resources
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `controlplane_count` | Number of control plane nodes | `3` |
|
|
| `worker_count` | Number of worker nodes | `2` |
|
|
| `controlplane_cpu` | CPU cores per control plane | `2` |
|
|
| `controlplane_memory` | Memory (MB) per control plane | `4096` |
|
|
| `controlplane_disk_size` | Disk size per control plane | `20` |
|
|
| `worker_cpu` | CPU cores per worker | `4` |
|
|
| `worker_memory` | Memory (MB) per worker | `8192` |
|
|
| `worker_disk_size` | Disk size per worker | `10` |
|
|
|
|
### Static IP Configuration
|
|
|
|
For production deployments, use static IPs. **All three parameters (IPs, gateway, and netmask) must be configured together:**
|
|
|
|
```hcl
|
|
# Control plane IPs
|
|
controlplane_ips = [
|
|
"10.0.0.101",
|
|
"10.0.0.102",
|
|
"10.0.0.103"
|
|
]
|
|
|
|
# Worker IPs
|
|
worker_ips = [
|
|
"10.0.0.104",
|
|
"10.0.0.105"
|
|
]
|
|
|
|
# Network settings (required for static IPs)
|
|
gateway = "10.0.0.1" # Default gateway
|
|
netmask = 24 # CIDR notation (e.g., 24 = 255.255.255.0)
|
|
nameservers = ["1.1.1.1", "8.8.8.8"] # DNS servers
|
|
|
|
# Use VIP for control plane endpoint
|
|
cluster_vip = "10.0.0.100"
|
|
cluster_endpoint = "https://10.0.0.100:6443"
|
|
```
|
|
|
|
**Important**: When using static IPs, you must configure:
|
|
- `controlplane_ips` and/or `worker_ips` - List of IP addresses
|
|
- `gateway` - Network gateway IP address
|
|
- `netmask` - Network mask in CIDR notation (default: 24)
|
|
- `nameservers` - DNS servers (default: ["1.1.1.1", "8.8.8.8"])
|
|
|
|
If any of these are missing, the nodes will use DHCP instead.
|
|
|
|
### High Availability Setup
|
|
|
|
For HA control plane, configure a virtual IP:
|
|
|
|
```hcl
|
|
cluster_vip = "10.0.0.100"
|
|
cluster_endpoint = "https://10.0.0.100:6443"
|
|
controlplane_count = 3 # Minimum 3 for HA
|
|
```
|
|
|
|
### Custom Talos Version
|
|
|
|
```hcl
|
|
talos_version = "v1.9.1"
|
|
# Or use custom ISO URL
|
|
talos_iso_url = "https://custom-mirror.com/talos.iso"
|
|
```
|
|
|
|
## Advanced Configuration
|
|
|
|
### Custom Storage Backend
|
|
|
|
```hcl
|
|
proxmox_storage = "ceph-storage" # or "nfs-backup", etc.
|
|
```
|
|
|
|
### Custom Network Bridge
|
|
|
|
```hcl
|
|
proxmox_network_bridge = "vmbr1"
|
|
```
|
|
|
|
### Custom VM ID Range
|
|
|
|
```hcl
|
|
vm_id_prefix = 1000 # VMs will be 1000, 1001, 1002, etc.
|
|
```
|
|
|
|
### Proxmox API Token Setup
|
|
|
|
The CCM and CSI drivers require Proxmox API tokens for authentication. Generate tokens in Proxmox:
|
|
|
|
1. Navigate to Datacenter → Permissions → API Tokens
|
|
2. Create a token for CCM with appropriate permissions
|
|
3. Create a token for CSI with storage permissions
|
|
4. Add the token secrets to your `terraform.tfvars`:
|
|
|
|
```hcl
|
|
proxmox_ccm_token_secret = "your-ccm-api-token-secret"
|
|
proxmox_csi_token_secret = "your-csi-api-token-secret"
|
|
```
|
|
|
|
## Architecture
|
|
|
|
The project creates:
|
|
|
|
1. **Control Plane VMs** (default: 3)
|
|
- Run Kubernetes control plane components
|
|
- Can schedule workload pods if configured
|
|
- Participate in etcd cluster
|
|
- Run Proxmox CCM for cloud provider integration
|
|
|
|
2. **Worker VMs** (default: 2)
|
|
- Run application workloads
|
|
- Join the cluster automatically
|
|
- Support CSI for dynamic volume provisioning
|
|
|
|
3. **Talos Configuration**
|
|
- Machine secrets and certificates
|
|
- Node-specific configurations
|
|
- Client configurations (talosconfig, kubeconfig)
|
|
- Cloud provider configuration for CCM integration
|
|
|
|
4. **Proxmox Integration**
|
|
- **CCM (Cloud Controller Manager)**: Provides node lifecycle management and metadata
|
|
- **CSI (Container Storage Interface)**: Enables dynamic PV provisioning from Proxmox storage
|
|
|
|
## Workflow
|
|
|
|
1. **VM Creation**: VMs are created in Proxmox with Talos ISO attached
|
|
2. **Boot to Maintenance**: VMs boot into Talos maintenance mode
|
|
3. **Configuration Apply**: Terraform applies Talos machine configurations with cloud-provider settings
|
|
4. **Cluster Bootstrap**: First control plane node bootstraps the cluster
|
|
5. **Node Join**: Remaining nodes join automatically
|
|
6. **Kubeconfig Generation**: Cluster credentials are generated
|
|
7. **CCM Installation**: Proxmox Cloud Controller Manager is deployed (if enabled)
|
|
8. **CSI Installation**: Proxmox CSI driver and storage class are deployed (if enabled)
|
|
|
|
## Proxmox Integration Details
|
|
|
|
### Cloud Controller Manager (CCM)
|
|
|
|
The CCM provides:
|
|
- **Node Management**: Automatic node registration with Proxmox metadata
|
|
- **Node Labels**: Topology labels (region, zone, instance-type)
|
|
- **Node Lifecycle**: Proper handling of node additions and removals
|
|
|
|
Nodes are automatically labeled with:
|
|
```yaml
|
|
node.kubernetes.io/instance-type: proxmox
|
|
topology.kubernetes.io/region: <proxmox_region>
|
|
topology.kubernetes.io/zone: <proxmox_node>
|
|
```
|
|
|
|
### Container Storage Interface (CSI)
|
|
|
|
The CSI driver provides:
|
|
- **Dynamic Provisioning**: Automatically create volumes in Proxmox storage
|
|
- **Volume Expansion**: Support for expanding PVCs
|
|
- **Multiple Storage Backends**: Use any Proxmox storage (LVM, ZFS, Ceph, NFS, etc.)
|
|
|
|
Example usage:
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: PersistentVolumeClaim
|
|
metadata:
|
|
name: my-data
|
|
spec:
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
resources:
|
|
requests:
|
|
storage: 50Gi
|
|
storageClassName: proxmox-data
|
|
```
|
|
|
|
## Accessing the Cluster
|
|
|
|
### Talos CLI
|
|
|
|
```bash
|
|
# Export talosconfig
|
|
terraform output -raw talosconfig > ~/.talos/config
|
|
|
|
# Get nodes
|
|
talosctl get members
|
|
|
|
# Get service status
|
|
talosctl services
|
|
|
|
# Access logs
|
|
talosctl logs kubelet
|
|
```
|
|
|
|
### Kubernetes CLI
|
|
|
|
```bash
|
|
# Export kubeconfig
|
|
terraform output -raw kubeconfig > ~/.kube/config
|
|
|
|
# Get cluster info
|
|
kubectl cluster-info
|
|
kubectl get nodes -o wide
|
|
kubectl get pods -A
|
|
|
|
# Check Proxmox integrations
|
|
kubectl get pods -n kube-system | grep proxmox
|
|
kubectl get pods -n csi-proxmox
|
|
kubectl get storageclass
|
|
```
|
|
|
|
## Maintenance
|
|
|
|
### Upgrading Talos
|
|
|
|
```bash
|
|
# Update talos_version variable
|
|
talos_version = "v1.9.2"
|
|
|
|
# Apply changes
|
|
terraform apply
|
|
|
|
# Or upgrade manually
|
|
talosctl upgrade --image ghcr.io/siderolabs/installer:v1.9.2
|
|
```
|
|
|
|
### Scaling Workers
|
|
|
|
```bash
|
|
# Update worker_count
|
|
worker_count = 5
|
|
|
|
# Apply changes
|
|
terraform apply
|
|
```
|
|
|
|
### Removing the Cluster
|
|
|
|
```bash
|
|
terraform destroy
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### VMs not getting IP addresses
|
|
|
|
**For DHCP:**
|
|
- Check Proxmox network bridge configuration
|
|
- Verify DHCP server is running on the network
|
|
- Ensure VMs are connected to the correct network bridge
|
|
|
|
**For Static IPs:**
|
|
- Verify all required parameters are set: `controlplane_ips`/`worker_ips`, `gateway`, and `netmask`
|
|
- Check that IPs are in the correct subnet
|
|
- Ensure gateway IP is correct and reachable
|
|
- Verify no IP conflicts with existing devices
|
|
|
|
### Cannot connect to nodes
|
|
|
|
- Verify firewall rules allow port 50000 (Talos API)
|
|
- Check VM networking in Proxmox
|
|
- Ensure nodes are in maintenance mode: `talosctl version --nodes <ip>`
|
|
|
|
### Bootstrap fails
|
|
|
|
- Check control plane IPs are correct
|
|
- Verify cluster_endpoint is accessible
|
|
- Review logs: `talosctl logs etcd`
|
|
|
|
### ISO upload fails
|
|
|
|
- Verify SSH access to Proxmox node
|
|
- Check `/var/lib/vz/template/iso/` permissions
|
|
- Manually upload ISO if needed
|
|
|
|
### CCM/CSI not working
|
|
|
|
- Verify Proxmox API token secrets are correct
|
|
- Check that tokens have appropriate permissions in Proxmox
|
|
- Review template logs for CCM/CSI configuration
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
.
|
|
├── main.tf # Main VM and Talos resources
|
|
├── variables.tf # Input variables
|
|
├── outputs.tf # Output values (talosconfig, kubeconfig)
|
|
├── versions.tf # Provider versions (Talos, Proxmox)
|
|
├── locals.tf # Local values and computed variables
|
|
├── state.tf # Remote state configuration
|
|
├── terraform.tfvars # Your configuration (not in git)
|
|
├── terraform.tfvars.example # Example configuration template
|
|
├── templates/
|
|
│ ├── install-disk-and-hostname.yaml.tmpl # Hostname and disk config
|
|
│ ├── static-ip.yaml.tmpl # Static IP configuration
|
|
│ ├── vip-config.yaml.tmpl # VIP configuration for HA
|
|
│ └── proxmox-ccm.yaml.tmpl # Proxmox CCM/CSI configuration
|
|
└── files/
|
|
├── cp-scheduling.yaml # Control plane scheduling config
|
|
└── cloud-provider.yaml # Cloud provider config
|
|
```
|
|
|
|
## References
|
|
|
|
- [Talos Documentation](https://www.talos.dev/)
|
|
- [Talos Terraform Provider](https://registry.terraform.io/providers/siderolabs/talos)
|
|
- [Proxmox Terraform Provider](https://registry.terraform.io/providers/bpg/proxmox)
|
|
- [Proxmox CCM](https://github.com/sergelogvinov/proxmox-cloud-controller-manager)
|
|
- [Proxmox CSI](https://github.com/sergelogvinov/proxmox-csi-plugin)
|
|
- [Siderolabs Contrib Examples](https://github.com/siderolabs/contrib/tree/main/examples/terraform)
|
|
|
|
## License
|
|
|
|
Based on examples from [siderolabs/contrib](https://github.com/siderolabs/contrib)
|