Files

456 lines
13 KiB
Markdown

# Talos Cluster on Proxmox - Terraform Configuration
This Terraform project creates and provisions a Talos Kubernetes cluster on Proxmox VE with integrated Proxmox Cloud Controller Manager (CCM) and Container Storage Interface (CSI) driver.
## Features
- 🚀 **Automated VM provisioning** on Proxmox VE
- ☁️ **Proxmox Cloud Controller Manager** - Native Proxmox integration for Kubernetes
- 💾 **Proxmox CSI Driver** - Dynamic volume provisioning using Proxmox storage
- 🔄 **High Availability** - Multi-node control plane with optional VIP
- 🌐 **Flexible networking** - DHCP or static IP configuration
- 📦 **Full stack deployment** - From VMs to running Kubernetes cluster
## Prerequisites
1. **Proxmox VE** server with API access
2. **Terraform** >= 1.0
3. **SSH access** to Proxmox node
4. **Network requirements**:
- Available IP addresses for VMs (DHCP or static)
- Network connectivity between VMs
- Access to download Talos ISO (for initial setup)
## Quick Start
### 1. Create terraform.tfvars
Create a `terraform.tfvars` file with your Proxmox and cluster configuration:
```hcl
# Proxmox Connection
proxmox_endpoint = "https://proxmox.example.com:8006"
proxmox_username = "root@pam"
proxmox_password = "your-password"
proxmox_node = "pve"
# Proxmox API Tokens (required for CCM/CSI)
proxmox_ccm_token_secret = "your-ccm-token-secret"
proxmox_csi_token_secret = "your-csi-token-secret"
# Cluster Configuration
cluster_name = "talos-cluster"
cluster_endpoint = "https://10.0.0.100:6443"
# VM Configuration
controlplane_count = 3
worker_count = 2
# Network (DHCP - IPs will be auto-assigned)
# For static IPs, see advanced configuration below
```
### 2. Initialize and Apply
```bash
terraform init
terraform plan
terraform apply
```
### 3. Get Cluster Access
```bash
# Get talosconfig
terraform output -raw talosconfig > ~/.talos/config
# Get kubeconfig
terraform output -raw kubeconfig > ~/.kube/config
# Verify cluster
talosctl version --nodes <controlplane-ip>
kubectl get nodes
```
### 4. Verify Proxmox Integration
```bash
# Check CCM is running
kubectl get pods -n kube-system | grep proxmox-cloud-controller
# Check CSI is running
kubectl get pods -n csi-proxmox
# View available storage classes
kubectl get storageclass
# Create a test PVC
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: proxmox-data
EOF
```
## Configuration Options
### Basic Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `proxmox_endpoint` | Proxmox API endpoint | - |
| `proxmox_username` | Proxmox username | `root@pam` |
| `proxmox_password` | Proxmox password | - |
| `proxmox_insecure` | Allow insecure Proxmox API connections | `true` |
| `proxmox_ssh_user` | SSH user for Proxmox node | `root` |
| `proxmox_node` | Proxmox node name | - |
| `proxmox_storage` | Storage location for VM disks | `local` |
| `proxmox_network_bridge` | Network bridge for VMs | `vmbr40` |
| `proxmox_ccm_token_secret` | Proxmox API token for CCM (sensitive) | - |
| `proxmox_csi_token_secret` | Proxmox API token for CSI (sensitive) | - |
| `cluster_name` | Talos cluster name | - |
| `cluster_endpoint` | Cluster API endpoint | - |
| `vm_id_prefix` | Starting VM ID prefix | `800` |
| `talos_version` | Talos version to use | `v1.9.1` |
| `talos_iso_url` | Custom Talos ISO URL | `""` (uses default) |
### Network Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `controlplane_ips` | Static IPs for control plane nodes | `[]` (DHCP) |
| `worker_ips` | Static IPs for worker nodes | `[]` (DHCP) |
| `gateway` | Default gateway (required for static IPs) | `""` |
| `netmask` | Network mask in CIDR notation | `24` |
| `nameservers` | DNS nameservers | `["1.1.1.1", "8.8.8.8"]` |
| `cluster_vip` | Virtual IP for HA control plane | `""` (disabled) |
### Proxmox Integration
| Variable | Description | Default |
|----------|-------------|---------|
| `proxmox_region` | Region identifier for CCM | `proxmox` |
### VM Resources
| Variable | Description | Default |
|----------|-------------|---------|
| `controlplane_count` | Number of control plane nodes | `3` |
| `worker_count` | Number of worker nodes | `2` |
| `controlplane_cpu` | CPU cores per control plane | `2` |
| `controlplane_memory` | Memory (MB) per control plane | `4096` |
| `controlplane_disk_size` | Disk size per control plane | `20` |
| `worker_cpu` | CPU cores per worker | `4` |
| `worker_memory` | Memory (MB) per worker | `8192` |
| `worker_disk_size` | Disk size per worker | `10` |
### Static IP Configuration
For production deployments, use static IPs. **All three parameters (IPs, gateway, and netmask) must be configured together:**
```hcl
# Control plane IPs
controlplane_ips = [
"10.0.0.101",
"10.0.0.102",
"10.0.0.103"
]
# Worker IPs
worker_ips = [
"10.0.0.104",
"10.0.0.105"
]
# Network settings (required for static IPs)
gateway = "10.0.0.1" # Default gateway
netmask = 24 # CIDR notation (e.g., 24 = 255.255.255.0)
nameservers = ["1.1.1.1", "8.8.8.8"] # DNS servers
# Use VIP for control plane endpoint
cluster_vip = "10.0.0.100"
cluster_endpoint = "https://10.0.0.100:6443"
```
**Important**: When using static IPs, you must configure:
- `controlplane_ips` and/or `worker_ips` - List of IP addresses
- `gateway` - Network gateway IP address
- `netmask` - Network mask in CIDR notation (default: 24)
- `nameservers` - DNS servers (default: ["1.1.1.1", "8.8.8.8"])
If any of these are missing, the nodes will use DHCP instead.
### High Availability Setup
For HA control plane, configure a virtual IP:
```hcl
cluster_vip = "10.0.0.100"
cluster_endpoint = "https://10.0.0.100:6443"
controlplane_count = 3 # Minimum 3 for HA
```
### Custom Talos Version
```hcl
talos_version = "v1.9.1"
# Or use custom ISO URL
talos_iso_url = "https://custom-mirror.com/talos.iso"
```
## Advanced Configuration
### Custom Storage Backend
```hcl
proxmox_storage = "ceph-storage" # or "nfs-backup", etc.
```
### Custom Network Bridge
```hcl
proxmox_network_bridge = "vmbr1"
```
### Custom VM ID Range
```hcl
vm_id_prefix = 1000 # VMs will be 1000, 1001, 1002, etc.
```
### Proxmox API Token Setup
The CCM and CSI drivers require Proxmox API tokens for authentication. Generate tokens in Proxmox:
1. Navigate to Datacenter → Permissions → API Tokens
2. Create a token for CCM with appropriate permissions
3. Create a token for CSI with storage permissions
4. Add the token secrets to your `terraform.tfvars`:
```hcl
proxmox_ccm_token_secret = "your-ccm-api-token-secret"
proxmox_csi_token_secret = "your-csi-api-token-secret"
```
## Architecture
The project creates:
1. **Control Plane VMs** (default: 3)
- Run Kubernetes control plane components
- Can schedule workload pods if configured
- Participate in etcd cluster
- Run Proxmox CCM for cloud provider integration
2. **Worker VMs** (default: 2)
- Run application workloads
- Join the cluster automatically
- Support CSI for dynamic volume provisioning
3. **Talos Configuration**
- Machine secrets and certificates
- Node-specific configurations
- Client configurations (talosconfig, kubeconfig)
- Cloud provider configuration for CCM integration
4. **Proxmox Integration**
- **CCM (Cloud Controller Manager)**: Provides node lifecycle management and metadata
- **CSI (Container Storage Interface)**: Enables dynamic PV provisioning from Proxmox storage
## Workflow
1. **VM Creation**: VMs are created in Proxmox with Talos ISO attached
2. **Boot to Maintenance**: VMs boot into Talos maintenance mode
3. **Configuration Apply**: Terraform applies Talos machine configurations with cloud-provider settings
4. **Cluster Bootstrap**: First control plane node bootstraps the cluster
5. **Node Join**: Remaining nodes join automatically
6. **Kubeconfig Generation**: Cluster credentials are generated
7. **CCM Installation**: Proxmox Cloud Controller Manager is deployed (if enabled)
8. **CSI Installation**: Proxmox CSI driver and storage class are deployed (if enabled)
## Proxmox Integration Details
### Cloud Controller Manager (CCM)
The CCM provides:
- **Node Management**: Automatic node registration with Proxmox metadata
- **Node Labels**: Topology labels (region, zone, instance-type)
- **Node Lifecycle**: Proper handling of node additions and removals
Nodes are automatically labeled with:
```yaml
node.kubernetes.io/instance-type: proxmox
topology.kubernetes.io/region: <proxmox_region>
topology.kubernetes.io/zone: <proxmox_node>
```
### Container Storage Interface (CSI)
The CSI driver provides:
- **Dynamic Provisioning**: Automatically create volumes in Proxmox storage
- **Volume Expansion**: Support for expanding PVCs
- **Multiple Storage Backends**: Use any Proxmox storage (LVM, ZFS, Ceph, NFS, etc.)
Example usage:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: proxmox-data
```
## Accessing the Cluster
### Talos CLI
```bash
# Export talosconfig
terraform output -raw talosconfig > ~/.talos/config
# Get nodes
talosctl get members
# Get service status
talosctl services
# Access logs
talosctl logs kubelet
```
### Kubernetes CLI
```bash
# Export kubeconfig
terraform output -raw kubeconfig > ~/.kube/config
# Get cluster info
kubectl cluster-info
kubectl get nodes -o wide
kubectl get pods -A
# Check Proxmox integrations
kubectl get pods -n kube-system | grep proxmox
kubectl get pods -n csi-proxmox
kubectl get storageclass
```
## Maintenance
### Upgrading Talos
```bash
# Update talos_version variable
talos_version = "v1.9.2"
# Apply changes
terraform apply
# Or upgrade manually
talosctl upgrade --image ghcr.io/siderolabs/installer:v1.9.2
```
### Scaling Workers
```bash
# Update worker_count
worker_count = 5
# Apply changes
terraform apply
```
### Removing the Cluster
```bash
terraform destroy
```
## Troubleshooting
### VMs not getting IP addresses
**For DHCP:**
- Check Proxmox network bridge configuration
- Verify DHCP server is running on the network
- Ensure VMs are connected to the correct network bridge
**For Static IPs:**
- Verify all required parameters are set: `controlplane_ips`/`worker_ips`, `gateway`, and `netmask`
- Check that IPs are in the correct subnet
- Ensure gateway IP is correct and reachable
- Verify no IP conflicts with existing devices
### Cannot connect to nodes
- Verify firewall rules allow port 50000 (Talos API)
- Check VM networking in Proxmox
- Ensure nodes are in maintenance mode: `talosctl version --nodes <ip>`
### Bootstrap fails
- Check control plane IPs are correct
- Verify cluster_endpoint is accessible
- Review logs: `talosctl logs etcd`
### ISO upload fails
- Verify SSH access to Proxmox node
- Check `/var/lib/vz/template/iso/` permissions
- Manually upload ISO if needed
### CCM/CSI not working
- Verify Proxmox API token secrets are correct
- Check that tokens have appropriate permissions in Proxmox
- Review template logs for CCM/CSI configuration
## Project Structure
```
.
├── main.tf # Main VM and Talos resources
├── variables.tf # Input variables
├── outputs.tf # Output values (talosconfig, kubeconfig)
├── versions.tf # Provider versions (Talos, Proxmox)
├── locals.tf # Local values and computed variables
├── state.tf # Remote state configuration
├── terraform.tfvars # Your configuration (not in git)
├── terraform.tfvars.example # Example configuration template
├── templates/
│ ├── install-disk-and-hostname.yaml.tmpl # Hostname and disk config
│ ├── static-ip.yaml.tmpl # Static IP configuration
│ ├── vip-config.yaml.tmpl # VIP configuration for HA
│ └── proxmox-ccm.yaml.tmpl # Proxmox CCM/CSI configuration
└── files/
├── cp-scheduling.yaml # Control plane scheduling config
└── cloud-provider.yaml # Cloud provider config
```
## References
- [Talos Documentation](https://www.talos.dev/)
- [Talos Terraform Provider](https://registry.terraform.io/providers/siderolabs/talos)
- [Proxmox Terraform Provider](https://registry.terraform.io/providers/bpg/proxmox)
- [Proxmox CCM](https://github.com/sergelogvinov/proxmox-cloud-controller-manager)
- [Proxmox CSI](https://github.com/sergelogvinov/proxmox-csi-plugin)
- [Siderolabs Contrib Examples](https://github.com/siderolabs/contrib/tree/main/examples/terraform)
## License
Based on examples from [siderolabs/contrib](https://github.com/siderolabs/contrib)