add terraform project
This commit is contained in:
453
terraform/README.md
Normal file
453
terraform/README.md
Normal file
@@ -0,0 +1,453 @@
|
||||
# Talos Cluster on Proxmox - Terraform Configuration
|
||||
|
||||
This Terraform project creates and provisions a Talos Kubernetes cluster on Proxmox VE with integrated Proxmox Cloud Controller Manager (CCM) and Container Storage Interface (CSI) driver.
|
||||
|
||||
## Features
|
||||
|
||||
- 🚀 **Automated VM provisioning** on Proxmox VE
|
||||
- ☁️ **Proxmox Cloud Controller Manager** - Native Proxmox integration for Kubernetes
|
||||
- 💾 **Proxmox CSI Driver** - Dynamic volume provisioning using Proxmox storage
|
||||
- 🔄 **High Availability** - Multi-node control plane with optional VIP
|
||||
- 🌐 **Flexible networking** - DHCP or static IP configuration
|
||||
- 📦 **Full stack deployment** - From VMs to running Kubernetes cluster
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Proxmox VE** server with API access
|
||||
2. **Terraform** >= 1.0
|
||||
3. **SSH access** to Proxmox node
|
||||
4. **Network requirements**:
|
||||
- Available IP addresses for VMs (DHCP or static)
|
||||
- Network connectivity between VMs
|
||||
- Access to download Talos ISO (for initial setup)
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Create terraform.tfvars
|
||||
|
||||
Create a `terraform.tfvars` file with your Proxmox and cluster configuration:
|
||||
|
||||
```hcl
|
||||
# Proxmox Connection
|
||||
proxmox_endpoint = "https://proxmox.example.com:8006"
|
||||
proxmox_username = "root@pam"
|
||||
proxmox_password = "your-password"
|
||||
proxmox_node = "pve"
|
||||
|
||||
# Proxmox API Tokens (required for CCM/CSI)
|
||||
proxmox_ccm_token_secret = "your-ccm-token-secret"
|
||||
proxmox_csi_token_secret = "your-csi-token-secret"
|
||||
|
||||
# Cluster Configuration
|
||||
cluster_name = "talos-cluster"
|
||||
cluster_endpoint = "https://10.0.0.100:6443"
|
||||
|
||||
# VM Configuration
|
||||
controlplane_count = 3
|
||||
worker_count = 2
|
||||
|
||||
# Network (DHCP - IPs will be auto-assigned)
|
||||
# For static IPs, see advanced configuration below
|
||||
```
|
||||
|
||||
### 2. Initialize and Apply
|
||||
|
||||
```bash
|
||||
terraform init
|
||||
terraform plan
|
||||
terraform apply
|
||||
```
|
||||
|
||||
### 3. Get Cluster Access
|
||||
|
||||
```bash
|
||||
# Get talosconfig
|
||||
terraform output -raw talosconfig > ~/.talos/config
|
||||
|
||||
# Get kubeconfig
|
||||
terraform output -raw kubeconfig > ~/.kube/config
|
||||
|
||||
# Verify cluster
|
||||
talosctl version --nodes <controlplane-ip>
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
### 4. Verify Proxmox Integration
|
||||
|
||||
```bash
|
||||
# Check CCM is running
|
||||
kubectl get pods -n kube-system | grep proxmox-cloud-controller
|
||||
|
||||
# Check CSI is running
|
||||
kubectl get pods -n csi-proxmox
|
||||
|
||||
# View available storage classes
|
||||
kubectl get storageclass
|
||||
|
||||
# Create a test PVC
|
||||
kubectl apply -f - <<EOF
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: test-pvc
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
storageClassName: proxmox-data
|
||||
EOF
|
||||
```
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Basic Configuration
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `proxmox_endpoint` | Proxmox API endpoint | - |
|
||||
| `proxmox_username` | Proxmox username | `root@pam` |
|
||||
| `proxmox_password` | Proxmox password | - |
|
||||
| `proxmox_insecure` | Allow insecure Proxmox API connections | `true` |
|
||||
| `proxmox_ssh_user` | SSH user for Proxmox node | `root` |
|
||||
| `proxmox_node` | Proxmox node name | - |
|
||||
| `proxmox_storage` | Storage location for VM disks | `local` |
|
||||
| `proxmox_network_bridge` | Network bridge for VMs | `vmbr40` |
|
||||
| `proxmox_ccm_token_secret` | Proxmox API token for CCM (sensitive) | - |
|
||||
| `proxmox_csi_token_secret` | Proxmox API token for CSI (sensitive) | - |
|
||||
| `cluster_name` | Talos cluster name | - |
|
||||
| `cluster_endpoint` | Cluster API endpoint | - |
|
||||
| `vm_id_prefix` | Starting VM ID prefix | `800` |
|
||||
| `talos_version` | Talos version to use | `v1.9.1` |
|
||||
| `talos_iso_url` | Custom Talos ISO URL | `""` (uses default) |
|
||||
|
||||
### Network Configuration
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `controlplane_ips` | Static IPs for control plane nodes | `[]` (DHCP) |
|
||||
| `worker_ips` | Static IPs for worker nodes | `[]` (DHCP) |
|
||||
| `gateway` | Default gateway (required for static IPs) | `""` |
|
||||
| `netmask` | Network mask in CIDR notation | `24` |
|
||||
| `nameservers` | DNS nameservers | `["1.1.1.1", "8.8.8.8"]` |
|
||||
| `cluster_vip` | Virtual IP for HA control plane | `""` (disabled) |
|
||||
|
||||
### Proxmox Integration
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `proxmox_region` | Region identifier for CCM | `proxmox` |
|
||||
|
||||
### VM Resources
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `controlplane_count` | Number of control plane nodes | `3` |
|
||||
| `worker_count` | Number of worker nodes | `2` |
|
||||
| `controlplane_cpu` | CPU cores per control plane | `2` |
|
||||
| `controlplane_memory` | Memory (MB) per control plane | `4096` |
|
||||
| `controlplane_disk_size` | Disk size per control plane | `20` |
|
||||
| `worker_cpu` | CPU cores per worker | `4` |
|
||||
| `worker_memory` | Memory (MB) per worker | `8192` |
|
||||
| `worker_disk_size` | Disk size per worker | `10` |
|
||||
|
||||
### Static IP Configuration
|
||||
|
||||
For production deployments, use static IPs. **All three parameters (IPs, gateway, and netmask) must be configured together:**
|
||||
|
||||
```hcl
|
||||
# Control plane IPs
|
||||
controlplane_ips = [
|
||||
"10.0.0.101",
|
||||
"10.0.0.102",
|
||||
"10.0.0.103"
|
||||
]
|
||||
|
||||
# Worker IPs
|
||||
worker_ips = [
|
||||
"10.0.0.104",
|
||||
"10.0.0.105"
|
||||
]
|
||||
|
||||
# Network settings (required for static IPs)
|
||||
gateway = "10.0.0.1" # Default gateway
|
||||
netmask = 24 # CIDR notation (e.g., 24 = 255.255.255.0)
|
||||
nameservers = ["1.1.1.1", "8.8.8.8"] # DNS servers
|
||||
|
||||
# Use VIP for control plane endpoint
|
||||
cluster_vip = "10.0.0.100"
|
||||
cluster_endpoint = "https://10.0.0.100:6443"
|
||||
```
|
||||
|
||||
**Important**: When using static IPs, you must configure:
|
||||
- `controlplane_ips` and/or `worker_ips` - List of IP addresses
|
||||
- `gateway` - Network gateway IP address
|
||||
- `netmask` - Network mask in CIDR notation (default: 24)
|
||||
- `nameservers` - DNS servers (default: ["1.1.1.1", "8.8.8.8"])
|
||||
|
||||
If any of these are missing, the nodes will use DHCP instead.
|
||||
|
||||
### High Availability Setup
|
||||
|
||||
For HA control plane, configure a virtual IP:
|
||||
|
||||
```hcl
|
||||
cluster_vip = "10.0.0.100"
|
||||
cluster_endpoint = "https://10.0.0.100:6443"
|
||||
controlplane_count = 3 # Minimum 3 for HA
|
||||
```
|
||||
|
||||
### Custom Talos Version
|
||||
|
||||
```hcl
|
||||
talos_version = "v1.9.1"
|
||||
# Or use custom ISO URL
|
||||
talos_iso_url = "https://custom-mirror.com/talos.iso"
|
||||
```
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Custom Storage Backend
|
||||
|
||||
```hcl
|
||||
proxmox_storage = "ceph-storage" # or "nfs-backup", etc.
|
||||
```
|
||||
|
||||
### Custom Network Bridge
|
||||
|
||||
```hcl
|
||||
proxmox_network_bridge = "vmbr1"
|
||||
```
|
||||
|
||||
### Custom VM ID Range
|
||||
|
||||
```hcl
|
||||
vm_id_prefix = 1000 # VMs will be 1000, 1001, 1002, etc.
|
||||
```
|
||||
|
||||
### Proxmox API Token Setup
|
||||
|
||||
The CCM and CSI drivers require Proxmox API tokens for authentication. Generate tokens in Proxmox:
|
||||
|
||||
1. Navigate to Datacenter → Permissions → API Tokens
|
||||
2. Create a token for CCM with appropriate permissions
|
||||
3. Create a token for CSI with storage permissions
|
||||
4. Add the token secrets to your `terraform.tfvars`:
|
||||
|
||||
```hcl
|
||||
proxmox_ccm_token_secret = "your-ccm-api-token-secret"
|
||||
proxmox_csi_token_secret = "your-csi-api-token-secret"
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
The project creates:
|
||||
|
||||
1. **Control Plane VMs** (default: 3)
|
||||
- Run Kubernetes control plane components
|
||||
- Can schedule workload pods if configured
|
||||
- Participate in etcd cluster
|
||||
- Run Proxmox CCM for cloud provider integration
|
||||
|
||||
2. **Worker VMs** (default: 2)
|
||||
- Run application workloads
|
||||
- Join the cluster automatically
|
||||
- Support CSI for dynamic volume provisioning
|
||||
|
||||
3. **Talos Configuration**
|
||||
- Machine secrets and certificates
|
||||
- Node-specific configurations
|
||||
- Client configurations (talosconfig, kubeconfig)
|
||||
- Cloud provider configuration for CCM integration
|
||||
|
||||
4. **Proxmox Integration**
|
||||
- **CCM (Cloud Controller Manager)**: Provides node lifecycle management and metadata
|
||||
- **CSI (Container Storage Interface)**: Enables dynamic PV provisioning from Proxmox storage
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **VM Creation**: VMs are created in Proxmox with Talos ISO attached
|
||||
2. **Boot to Maintenance**: VMs boot into Talos maintenance mode
|
||||
3. **Configuration Apply**: Terraform applies Talos machine configurations with cloud-provider settings
|
||||
4. **Cluster Bootstrap**: First control plane node bootstraps the cluster
|
||||
5. **Node Join**: Remaining nodes join automatically
|
||||
6. **Kubeconfig Generation**: Cluster credentials are generated
|
||||
7. **CCM Installation**: Proxmox Cloud Controller Manager is deployed (if enabled)
|
||||
8. **CSI Installation**: Proxmox CSI driver and storage class are deployed (if enabled)
|
||||
|
||||
## Proxmox Integration Details
|
||||
|
||||
### Cloud Controller Manager (CCM)
|
||||
|
||||
The CCM provides:
|
||||
- **Node Management**: Automatic node registration with Proxmox metadata
|
||||
- **Node Labels**: Topology labels (region, zone, instance-type)
|
||||
- **Node Lifecycle**: Proper handling of node additions and removals
|
||||
|
||||
Nodes are automatically labeled with:
|
||||
```yaml
|
||||
node.kubernetes.io/instance-type: proxmox
|
||||
topology.kubernetes.io/region: <proxmox_region>
|
||||
topology.kubernetes.io/zone: <proxmox_node>
|
||||
```
|
||||
|
||||
### Container Storage Interface (CSI)
|
||||
|
||||
The CSI driver provides:
|
||||
- **Dynamic Provisioning**: Automatically create volumes in Proxmox storage
|
||||
- **Volume Expansion**: Support for expanding PVCs
|
||||
- **Multiple Storage Backends**: Use any Proxmox storage (LVM, ZFS, Ceph, NFS, etc.)
|
||||
|
||||
Example usage:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: my-data
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 50Gi
|
||||
storageClassName: proxmox-data
|
||||
```
|
||||
|
||||
## Accessing the Cluster
|
||||
|
||||
### Talos CLI
|
||||
|
||||
```bash
|
||||
# Export talosconfig
|
||||
terraform output -raw talosconfig > ~/.talos/config
|
||||
|
||||
# Get nodes
|
||||
talosctl get members
|
||||
|
||||
# Get service status
|
||||
talosctl services
|
||||
|
||||
# Access logs
|
||||
talosctl logs kubelet
|
||||
```
|
||||
|
||||
### Kubernetes CLI
|
||||
|
||||
```bash
|
||||
# Export kubeconfig
|
||||
terraform output -raw kubeconfig > ~/.kube/config
|
||||
|
||||
# Get cluster info
|
||||
kubectl cluster-info
|
||||
kubectl get nodes -o wide
|
||||
kubectl get pods -A
|
||||
|
||||
# Check Proxmox integrations
|
||||
kubectl get pods -n kube-system | grep proxmox
|
||||
kubectl get pods -n csi-proxmox
|
||||
kubectl get storageclass
|
||||
```
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Upgrading Talos
|
||||
|
||||
```bash
|
||||
# Update talos_version variable
|
||||
talos_version = "v1.9.2"
|
||||
|
||||
# Apply changes
|
||||
terraform apply
|
||||
|
||||
# Or upgrade manually
|
||||
talosctl upgrade --image ghcr.io/siderolabs/installer:v1.9.2
|
||||
```
|
||||
|
||||
### Scaling Workers
|
||||
|
||||
```bash
|
||||
# Update worker_count
|
||||
worker_count = 5
|
||||
|
||||
# Apply changes
|
||||
terraform apply
|
||||
```
|
||||
|
||||
### Removing the Cluster
|
||||
|
||||
```bash
|
||||
terraform destroy
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### VMs not getting IP addresses
|
||||
|
||||
**For DHCP:**
|
||||
- Check Proxmox network bridge configuration
|
||||
- Verify DHCP server is running on the network
|
||||
- Ensure VMs are connected to the correct network bridge
|
||||
|
||||
**For Static IPs:**
|
||||
- Verify all required parameters are set: `controlplane_ips`/`worker_ips`, `gateway`, and `netmask`
|
||||
- Check that IPs are in the correct subnet
|
||||
- Ensure gateway IP is correct and reachable
|
||||
- Verify no IP conflicts with existing devices
|
||||
|
||||
### Cannot connect to nodes
|
||||
|
||||
- Verify firewall rules allow port 50000 (Talos API)
|
||||
- Check VM networking in Proxmox
|
||||
- Ensure nodes are in maintenance mode: `talosctl version --nodes <ip>`
|
||||
|
||||
### Bootstrap fails
|
||||
|
||||
- Check control plane IPs are correct
|
||||
- Verify cluster_endpoint is accessible
|
||||
- Review logs: `talosctl logs etcd`
|
||||
|
||||
### ISO upload fails
|
||||
|
||||
- Verify SSH access to Proxmox node
|
||||
- Check `/var/lib/vz/template/iso/` permissions
|
||||
- Manually upload ISO if needed
|
||||
|
||||
### CCM/CSI not working
|
||||
|
||||
- Verify Proxmox API token secrets are correct
|
||||
- Check that tokens have appropriate permissions in Proxmox
|
||||
- Review template logs for CCM/CSI configuration
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
.
|
||||
├── main.tf # Main VM, Talos, CCM/CSI resources
|
||||
├── variables.tf # Input variables
|
||||
├── outputs.tf # Output values
|
||||
├── versions.tf # Provider versions (Talos, Proxmox, Helm, K8s)
|
||||
├── locals.tf # Local values
|
||||
├── terraform.tfvars # Your configuration (create this)
|
||||
├── templates/
|
||||
│ ├── install-disk-and-hostname.yaml.tmpl
|
||||
│ ├── static-ip.yaml.tmpl # Static IP configuration
|
||||
│ ├── node-labels.yaml.tmpl
|
||||
│ └── vip-config.yaml.tmpl
|
||||
└── files/
|
||||
├── cp-scheduling.yaml
|
||||
└── cloud-provider.yaml
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Talos Documentation](https://www.talos.dev/)
|
||||
- [Talos Terraform Provider](https://registry.terraform.io/providers/siderolabs/talos)
|
||||
- [Proxmox Terraform Provider](https://registry.terraform.io/providers/bpg/proxmox)
|
||||
- [Proxmox CCM](https://github.com/sergelogvinov/proxmox-cloud-controller-manager)
|
||||
- [Proxmox CSI](https://github.com/sergelogvinov/proxmox-csi-plugin)
|
||||
- [Siderolabs Contrib Examples](https://github.com/siderolabs/contrib/tree/main/examples/terraform)
|
||||
|
||||
## License
|
||||
|
||||
Based on examples from [siderolabs/contrib](https://github.com/siderolabs/contrib)
|
||||
Reference in New Issue
Block a user