ESXi to Proxmox Migration Plan
ESXi to Proxmox Migration Plan - UPDATED WITH APPLICATION LAYER
Created: 2025-12-28 Updated: 2025-12-28 (Application Layer Integration) Status: Ready for Execution
Executive Summary
Current Situation:
- 2x ESXi hosts (NUC9i9QNX) with production workloads
- 1x Proxmox staging server already running (Home Assistant + Frigate with Coral TPU)
- Complete application stack: Plex, media automation (Radarr/Sonarr/SABnzbd), Pi-hole DNS, Palo Alto firewall
- Critical External Dependency: NFS storage at 10.1.1.150 (27TB, 21TB used) - ALL media stored here
End Goal:
- 3-node Proxmox VE cluster with HA capability
- All applications migrated with zero data loss
- Hardware passthrough working (iGPU for Plex, Coral TPU for Frigate)
- NFS mounts reconfigured on all VMs
- Traefik reverse proxy with Cloudflare SSL working
Key Application Insights:
- VM Naming Clarified: “platinum” (10.1.1.125) = Plex, “iridium” (10.1.1.126) = Support services
- No Media Migration Needed: All media on NFS (10.1.1.150), only config/database migrations
- Docker Stack: Complete media automation on single VM with Traefik reverse proxy
- Service Dependencies: Mapped complete data flow from user request → Plex streaming
- Network Management: UniFi Controller on iridium VM manages network infrastructure
Migration Architecture - Application View
End-State Application Distribution
Node 1: proxmox-01 (10.1.1.120)
Production Workloads:
-
Plex Media Server (platinum VM → renamed “plex”)
- Intel iGPU passthrough for Quick Sync transcoding
- NFS mount: 10.1.1.150:/volume1/datastore/media → /mnt/media
- Database: Local on VM (~500MB)
-
Plex Support Services (iridium VM → renamed “plex-support”)
- Tautulli (Plex monitoring and statistics)
- Cloudflared (remote access tunnel)
- UniFi Controller (network management)
- No iGPU needed (can reclaim passthrough)
-
Media Automation Stack (docker VM → renamed “docker-media”)
- Radarr, Sonarr, SABnzbd (media management)
- Traefik reverse proxy (Cloudflare SSL)
- Ombi, Overseerr (user requests)
- Portainer, Watchtower, Uptime Kuma
- NFS mount: 10.1.1.150:/volume1/datastore/media → /mnt/media
-
Pi-hole DNS (pihole VM)
- Network-wide DNS and ad-blocking
- Critical: All network clients depend on this
-
Palo Alto Firewall (jarnetfw VM)
- Inter-VLAN routing
- NAT, security policies
- Critical: Network outage if offline
Node 2: proxmox-02 (10.1.1.121)
Production Workloads:
- Home Assistant + Frigate (home-sec VM, migrated from staging)
- USB controller passthrough for Coral TPU
- Docker containers: Home Assistant, Frigate, Mosquitto
- Local storage for Frigate recordings
Spare Capacity:
- Room for growth and lab VMs
Node 3: pve-staging (10.1.1.123)
Role: HA Witness + Light Workloads
- Quorum node for 3-node cluster
- K8s lab, templates, Docker services
Pre-Migration Decisions Required
1. Hardware Upgrade Timing ⚠️ CRITICAL DECISION
RECOMMENDATION: Install 2TB NVMe drives BEFORE migration (Option A)
- Clean Proxmox install on larger drives
- No storage migration later
- Order drives now, install before Phase 1
2. Proxmox Storage Backend
RECOMMENDATION: ZFS for NUCs (better for VMs, snapshots, replication)
- Built-in compression and checksums
- Native VM snapshot support
- Better for Plex database and Docker volumes
3. VM Renaming (RECOMMENDED)
Current naming is confusing due to ESXi VM names vs hostnames. Recommend renaming during migration:
- platinum (10.1.1.125) → “plex” (Plex Media Server)
- iridium (10.1.1.126) → “plex-support” (Tautulli, Cloudflared, UniFi Controller)
- docker (10.1.1.32) → “docker-media” (Media automation stack)
- pihole (10.1.1.35) → “pihole” (keep same)
- jarnetfw (10.1.1.103) → “paloalto-fw” (keep functionality same)
4. VMs to Decommission
Confirm these can be deleted (all currently offline on Host 2):
- server-2019 (old Blue Iris host - replaced by Frigate)
- home-security (old VM - replaced by Frigate on Proxmox)
- xsoar (purpose unknown, offline)
- win11-sse, win-10 (lab VMs, offline)
Phase 0: Preparation & Backups
0.1: Hardware Preparation
- Order 2x 2TB WD Blue SN580 NVMe drives
- Create Proxmox VE 8.x bootable USB installer
- Prepare external backup storage (USB drive or network backup location)
0.2: Pre-Migration Backups - CRITICAL
Backup Script
Create and run this backup script BEFORE any migration:
#!/bin/bash
# Pre-Migration Backup Script
# Run this BEFORE starting migration
BACKUP_DIR="/tmp/esxi-migration-backup-$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR
echo "=== Starting Pre-Migration Backups ==="
echo "Backup directory: $BACKUP_DIR"
# 1. Plex Database Backup
echo "1. Backing up Plex database..."
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 \
"sudo tar czf /tmp/plex-backup.tar.gz \
'/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Plug-in Support/Databases/' \
'/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Preferences.xml'"
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125:/tmp/plex-backup.tar.gz $BACKUP_DIR/
echo "✓ Plex backup complete"
# 2. Docker Compose + Env Files
echo "2. Backing up Docker compose and configs..."
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 \
"tar czf /tmp/docker-backup.tar.gz /home/luke/docker/"
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32:/tmp/docker-backup.tar.gz $BACKUP_DIR/
echo "✓ Docker backup complete"
# 3. Pi-hole Config
echo "3. Backing up Pi-hole configuration..."
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.35 \
"sudo tar czf /tmp/pihole-backup.tar.gz /etc/pihole/"
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.35:/tmp/pihole-backup.tar.gz $BACKUP_DIR/
echo "✓ Pi-hole backup complete"
# 4. Iridium VM (Tautulli, Cloudflared, UniFi)
echo "4. Backing up iridium services (Tautulli, UniFi, Cloudflared)..."
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 \
"sudo tar czf /tmp/iridium-backup.tar.gz /config /data 2>/dev/null || sudo tar czf /tmp/iridium-backup.tar.gz /opt /etc/cloudflared 2>/dev/null || echo 'Partial backup'"
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126:/tmp/iridium-backup.tar.gz $BACKUP_DIR/
echo "✓ Iridium backup complete (Tautulli, UniFi, Cloudflared configs)"
# 5. Home Assistant Backup
echo "5. Backing up Home Assistant..."
ssh -i ~/.ssh/esxi_migration_rsa ubuntu@10.1.1.208 \
"docker exec homeassistant ha backups new --name pre-migration"
echo "✓ Home Assistant backup created (stored in container)"
# 6. List all VMs for reference
echo "6. Documenting VM inventory..."
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 \
"vim-cmd vmsvc/getallvms" > $BACKUP_DIR/esxi-host1-vms.txt
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.121 \
"vim-cmd vmsvc/getallvms" > $BACKUP_DIR/esxi-host2-vms.txt
echo "✓ VM inventory documented"
# 7. Export ESXi network configs
echo "7. Exporting ESXi network configurations..."
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 \
"esxcli network vswitch standard list" > $BACKUP_DIR/esxi-host1-network.txt
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.121 \
"esxcli network vswitch standard list" > $BACKUP_DIR/esxi-host2-network.txt
echo "✓ Network configs exported"
echo ""
echo "=== Backup Complete ==="
echo "All backups saved to: $BACKUP_DIR"
echo ""
echo "Next steps:"
echo "1. Review backups in $BACKUP_DIR"
echo "2. Copy to external storage: cp -r $BACKUP_DIR /path/to/external/drive/"
echo "3. Verify Palo Alto firewall config is exported via web UI"
echo "4. Document current IP addresses and credentials"
Palo Alto Firewall Backup (Manual)
- Log into Palo Alto web UI (https://10.1.1.103)
- Export configuration: Device → Setup → Operations → Export named configuration
- Save to
$BACKUP_DIR/paloalto-config.xml - Screenshot all firewall rules (Policies → Security)
- Screenshot NAT policies (Policies → NAT)
- Document interface → VLAN mappings:
Example mapping: ethernet1/1 → VLAN 300 (Public) ethernet1/2 → VLAN 50 (Lab) ethernet1/3 → VLAN 0 (Management)
NFS Mount Verification
# Verify NFS is accessible from current VMs
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "df -h | grep 10.1.1.150"
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "df -h | grep 10.1.1.150"
# Test NFS from Proxmox staging (verify Proxmox can access NFS)
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.123 "mount -t nfs4 10.1.1.150:/volume1/datastore/media /mnt/test && ls /mnt/test && umount /mnt/test"
0.3: Environment Variables & Credentials Documentation
Create a secure document with:
- Cloudflare API key (for Traefik)
- Cloudflare email
- Domain name (${DOMAINNAME} from docker-compose.yml)
- SABnzbd Usenet credentials
- Palo Alto firewall admin credentials
- All VM root/admin passwords
- NFS server credentials (if any)
0.4: Copy External Dependencies
- Download docker-compose.yml from docker VM:
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32:/home/luke/docker/docker-compose.yml $BACKUP_DIR/ - Download .env file (if exists):
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32:/home/luke/docker/.env $BACKUP_DIR/ || echo "No .env file" - Download Traefik configs:
scp -r -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32:/home/luke/docker/traefik/ $BACKUP_DIR/
Duration: 2-3 hours Validation: Verify all backups exist and are not empty before proceeding
Phase 1: Install Proxmox on Host 2
1.1: Pre-Install Checks
- All Phase 0 backups completed and verified
- 2TB NVMe drive physically installed (if doing hardware upgrade)
- Proxmox USB installer prepared
- All offline VMs on Host 2 confirmed safe to delete
1.2: Proxmox Installation
Same as original plan - install Proxmox on ghost-esx-02 (10.1.1.121):
- Hostname:
proxmox-02 - IP:
10.1.1.121/24 - Gateway:
10.1.1.1 - Filesystem: ZFS (RAID0) with compression
1.3: Network Configuration
Configure dual 10GbE bond with VLAN-aware bridge (same as original plan)
1.4: Intel iGPU Passthrough Configuration
(Same as original plan - enable IOMMU, load VFIO, blacklist i915)
1.5: NFS Storage Testing - NEW
Critical: Test NFS mount BEFORE migrating VMs
# On proxmox-02:
# Install NFS client
apt install -y nfs-common
# Test NFS mount
mkdir -p /mnt/nfs-test
mount -t nfs4 10.1.1.150:/volume1/datastore/media /mnt/nfs-test
# Verify mount
df -h | grep 10.1.1.150
ls -lah /mnt/nfs-test
# Check read/write permissions
touch /mnt/nfs-test/proxmox-write-test.txt
rm /mnt/nfs-test/proxmox-write-test.txt
# Unmount test
umount /mnt/nfs-test
Expected Results:
- NFS mount successful
- Can read existing media files
- Can create/delete test files
- ~27TB total, ~21TB used
Validation Checklist:
- Proxmox web UI accessible at https://10.1.1.121:8006
- SSH access working
- Network connectivity (ping 10.1.1.1, 8.8.8.8)
- Intel iGPU shows as available for passthrough
- NFS mount working with read/write access
- Test VM boots successfully
Duration: 3-4 hours
Phase 2: Migrate VMs from Host 1 to Proxmox Host 2
Migration Strategy: Incremental approach, lowest to highest risk
2.1: Test Migration - “pihole” VM (DNS Service) - UPDATED
Why First:
- Relatively simple (single application, no storage dependencies)
- Can use DNS fallback (8.8.8.8) during migration
- Good test of migration process
Pre-Migration:
# Update DHCP to use fallback DNS temporarily (on Palo Alto or DHCP server)
# Add secondary DNS: 8.8.8.8
# Backup Pi-hole config (already done in Phase 0)
# Verify backup exists
ls -lh $BACKUP_DIR/pihole-backup.tar.gz
Migration Steps:
-
Stop Pi-hole VM on ESXi:
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vim-cmd vmsvc/power.off <VMID>" -
Export VM disk:
# On ESXi host: ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vmkfstools -i /vmfs/volumes/<datastore>/pihole/pihole.vmdk /vmfs/volumes/<datastore>/pihole/pihole-flat.vmdk" # Copy to Proxmox: scp -i ~/.ssh/esxi_migration_rsa root@10.1.1.120:/vmfs/volumes/<datastore>/pihole/*.vmdk /var/lib/vz/images/ -
Create VM on Proxmox:
# On proxmox-02: qm create 101 --name pihole --memory 1024 --cores 1 --net0 virtio,bridge=vmbr0,tag=0 # Import disk qm importdisk 101 /var/lib/vz/images/pihole.vmdk local-zfs qm set 101 --scsi0 local-zfs:vm-101-disk-0 qm set 101 --boot order=scsi0 qm set 101 --ostype l26 -
Start VM and validate:
qm start 101 # Wait 30 seconds, then test ping 10.1.1.35 ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.35 # Verify Pi-hole running ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.35 "pihole status" # Test DNS resolution nslookup google.com 10.1.1.35 -
Revert DHCP to use Pi-hole as primary DNS
Validation:
- Pi-hole VM boots on Proxmox
- Network connectivity (ping gateway, internet)
- Pi-hole web UI accessible (http://10.1.1.35/admin)
- DNS queries working
- Ad blocking functional
Rollback: Restart Pi-hole VM on ESXi if issues Downtime: ~15-20 minutes Duration: 1-2 hours with testing
2.2: Migrate “iridium” VM (Plex Support Services) - UPDATED ⭐ MEDIUM PRIORITY
Why Second:
- Important services but not critical infrastructure
- Supports Plex (should be migrated before Plex itself)
- UniFi Controller manages network devices
- Good complexity test without critical dependencies
Applications:
- Tautulli (Plex monitoring)
- Cloudflared (remote access tunnel)
- UniFi Controller (network management)
Pre-Migration Checklist:
- Tautulli/UniFi/Cloudflared configs backed up (Phase 0)
- Document Cloudflare tunnel token
- Note UniFi devices (will temporarily lose management)
Migration Steps:
-
Document current configuration:
# Check what's running: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "ps aux | grep -E 'tautulli|cloudflare|unifi' | grep -v grep" # Check Tautulli port: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "curl -I http://localhost:8181" # Document Cloudflare tunnel token (visible in process list) -
Stop services gracefully (if possible):
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "sudo s6-svc -d /run/s6-rc/servicedirs/svc-tautulli || echo 'Manual stop failed, VM shutdown will stop services'" -
Shut down VM on ESXi:
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vim-cmd vmsvc/power.off <VMID>" -
Export VM disk to Proxmox: (Same process as Pi-hole - export VMDK, copy to Proxmox, import)
-
Create VM on Proxmox:
# On proxmox-02: qm create 104 --name plex-support --memory 8192 --cores 4 --net0 virtio,bridge=vmbr0,tag=0 qm importdisk 104 /var/lib/vz/images/iridium.vmdk local-zfs qm set 104 --scsi0 local-zfs:vm-104-disk-0 qm set 104 --boot order=scsi0 qm set 104 --ostype l26 -
Start VM and verify services:
qm start 104 # Wait 60 seconds for boot sleep 60 # SSH into VM ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 -
Validate Tautulli:
# Check Tautulli is running: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "ps aux | grep tautulli | grep -v grep" # Access Tautulli web UI: curl -I http://10.1.1.126:8181 # Or open in browser: http://10.1.1.126:8181 # Verify Tautulli can connect to Plex (10.1.1.125): # Check Tautulli UI → Settings → Plex Media Server -
Validate Cloudflared:
# Check Cloudflared tunnel is running: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "ps aux | grep cloudflared | grep -v grep" # Test remote access (if configured): # Try accessing services via Cloudflare tunnel URL -
Validate UniFi Controller:
# Check UniFi Controller is running: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "ps aux | grep unifi | grep -v grep" # Access UniFi Controller web UI: # https://10.1.1.126:8443 # Verify UniFi devices are reconnecting: # UniFi UI → Devices → Check all devices show "Connected" # (May take 2-5 minutes for devices to reconnect)
Validation Checklist:
- iridium VM boots on Proxmox
- Network connectivity (ping gateway, internet)
- Tautulli web UI accessible (http://10.1.1.126:8181)
- Tautulli can connect to Plex server (10.1.1.125)
- Cloudflared tunnel running (check process)
- Remote access working (if configured)
- UniFi Controller web UI accessible (https://10.1.1.126:8443)
- UniFi devices reconnected (check controller UI)
- All network devices managed and healthy
Important Notes:
- UniFi Devices: Will briefly lose controller connection during migration
- Devices will auto-reconnect when controller comes back online
- No network outage (devices continue forwarding traffic)
- Management features temporarily unavailable during migration
Rollback: Restart iridium VM on ESXi if issues Downtime: ~30-45 minutes (UniFi management only) Duration: 1-2 hours with full validation
2.3: Migrate “docker” VM (Media Automation Stack) - UPDATED ⭐ HIGH PRIORITY
Complexity: HIGH - Multiple dependencies (NFS, Traefik, Cloudflare, Docker network) Risk: Media management offline during migration
Critical Dependencies:
- NFS mount for media (10.1.1.150)
- Cloudflare API credentials for Traefik SSL
- Docker volumes for all container configs
- Environment variables (.env file)
Pre-Migration Checklist:
- Docker compose file backed up
- .env file backed up (contains Cloudflare credentials, domain)
- All container config volumes backed up
- NFS mount tested on Proxmox (done in Phase 1.5)
- Cloudflare API key documented
Migration Steps:
-
Stop all Docker containers gracefully:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "cd /home/luke/docker && docker-compose down" -
Shut down VM on ESXi:
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vim-cmd vmsvc/power.off <VMID>" -
Export VM disk to Proxmox: (Same process as Pi-hole - export VMDK, copy to Proxmox, import)
-
Create VM on Proxmox:
# On proxmox-02: qm create 102 --name docker-media --memory 4096 --cores 2 --net0 virtio,bridge=vmbr0,tag=0 qm importdisk 102 /var/lib/vz/images/docker.vmdk local-zfs qm set 102 --scsi0 local-zfs:vm-102-disk-0 qm set 102 --boot order=scsi0 qm set 102 --ostype l26 -
Start VM and reconfigure NFS mount:
qm start 102 # SSH into VM ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 # Verify /etc/fstab has NFS mount: cat /etc/fstab | grep 10.1.1.150 # Should show: # 10.1.1.150:/volume1/datastore/media /mnt/media nfs4 defaults 0 0 # Mount NFS (should auto-mount from fstab, but verify): mount -a df -h | grep 10.1.1.150 # Verify media files accessible: ls /mnt/media/Movies ls /mnt/media/TV -
Verify Docker Compose and Environment:
# Verify docker-compose.yml exists: cat /home/luke/docker/docker-compose.yml | head -20 # Verify .env file exists (contains Cloudflare credentials): cat /home/luke/docker/.env | grep CLOUDFLARE # Should show: # CLOUDFLARE_EMAIL=your@email.com # CLOUDFLARE_API_KEY=your_api_key # DOMAINNAME=your.domain.com -
Start Docker containers:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "cd /home/luke/docker && docker-compose up -d" # Monitor container startup: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "docker ps -a" # Check logs for any errors: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "docker-compose logs -f --tail=50" -
Validate ALL services:
Traefik (Reverse Proxy):
# Check Traefik is running and has SSL certs: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "docker logs traefik 2>&1 | grep -i certificate" # Verify Traefik web UI accessible: curl -k https://traefik.${DOMAINNAME}Radarr (Movies):
# Verify Radarr is accessible: curl -I https://cyan.${DOMAINNAME} # Verify NFS mount visible in Radarr: # Access Radarr UI → Settings → Media Management → Root Folders # Should show: /mnt/media/MoviesSonarr (TV):
# Verify Sonarr is accessible: curl -I https://teal.${DOMAINNAME} # Verify NFS mount visible in Sonarr: # Access Sonarr UI → Settings → Media Management → Root Folders # Should show: /mnt/media/TVSABnzbd (Downloads):
# Verify SABnzbd is running: curl -I http://10.1.1.32:2099 # Verify download directories: # Access SABnzbd UI → Config → Folders # Should show: /downloads, /incomplete-downloadsPortainer:
# Access Portainer web UI: # http://10.1.1.32:9000Uptime Kuma:
curl -I https://status.${DOMAINNAME} -
Test Complete Data Flow:
- Add test movie request in Ombi/Overseerr
- Verify Radarr picks up request
- Verify SABnzbd can download (or start download)
- Verify files saved to NFS mount (/mnt/media/Downloads)
- Verify Radarr can move to /mnt/media/Movies
Validation Checklist:
- Docker VM boots on Proxmox
- NFS mount working (/mnt/media accessible)
- All Docker containers running (
docker psshows all as “Up”) - Traefik reverse proxy working (SSL certs valid)
- Radarr accessible and can see /mnt/media/Movies
- Sonarr accessible and can see /mnt/media/TV
- SABnzbd accessible and can download
- Ombi/Overseerr accessible for user requests
- Portainer accessible for Docker management
- Uptime Kuma monitoring working
- Cloudflare SSL certificates auto-renewing (check Traefik logs)
Rollback: Restart docker VM on ESXi, docker-compose up -d
Downtime: ~30-60 minutes
Duration: 2-3 hours with full validation
2.4: Migrate “platinum” VM (Plex Media Server) - UPDATED ⭐ CRITICAL
Complexity: HIGHEST - Requires iGPU passthrough + NFS mount Risk: Media streaming offline, hardware transcoding must work
Critical Dependencies:
- Intel UHD 630 iGPU passthrough (for Quick Sync transcoding)
- NFS mount for media libraries (10.1.1.150)
- Plex database (local on VM)
Pre-Migration Checklist:
- Plex database backed up (Phase 0)
- iGPU passthrough verified working on Proxmox Host 2 (Phase 1.4)
- NFS mount tested on Proxmox (Phase 1.5)
- Current transcoding settings documented
Migration Steps:
-
Document current Plex settings:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "cat '/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Preferences.xml' | grep -i transcode" # Note settings: # - HardwareAcceleratedCodecs (should be enabled) # - TranscoderH264BackgroundPreset # - TranscoderTempDirectory -
Stop Plex service gracefully:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "sudo systemctl stop plexmediaserver" # Wait 30 seconds for graceful shutdown sleep 30 -
Final Plex database backup:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 \ "sudo tar czf /tmp/plex-final-backup.tar.gz \ '/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/'" scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125:/tmp/plex-final-backup.tar.gz $BACKUP_DIR/ -
Shut down VM on ESXi:
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vim-cmd vmsvc/power.off <VMID>" -
Export VM disk to Proxmox: (Same process - export VMDK, copy to Proxmox)
-
Create VM on Proxmox with iGPU passthrough:
# On proxmox-02: qm create 103 --name plex --memory 8192 --cores 4 --net0 virtio,bridge=vmbr0,tag=0 qm importdisk 103 /var/lib/vz/images/iridium.vmdk local-zfs qm set 103 --scsi0 local-zfs:vm-103-disk-0 qm set 103 --boot order=scsi0 qm set 103 --ostype l26 # CRITICAL: Set machine type to q35 (required for PCIe passthrough): qm set 103 --machine q35 # Add Intel iGPU passthrough: # First, identify iGPU PCI address: lspci -nnk | grep -i vga # Should show: 00:02.0 VGA compatible controller: Intel Corporation ... [8086:3e9b] # Add PCI device to VM: qm set 103 --hostpci0 00:02.0,pcie=1,rombar=0 -
Start VM and verify boot:
qm start 103 # Monitor boot via console (if needed): qm terminal 103 # Wait 60 seconds for boot, then SSH: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 -
Verify iGPU is visible in guest OS:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "lspci | grep -i vga" # Should show: Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630] # Verify /dev/dri devices exist: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "ls -l /dev/dri" # Should show: renderD128, card0 # Install/verify vainfo (Intel GPU tools): ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "sudo apt install -y vainfo" # Verify Intel Quick Sync capabilities: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "vainfo" # Should show supported encode/decode profiles (H.264, HEVC, etc.) -
Verify NFS mount:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "df -h | grep 10.1.1.150" # Should show: 10.1.1.150:/volume1/datastore/media mounted on /mnt/media # Verify media files accessible: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "ls /mnt/media/Movies | head -10" ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "ls /mnt/media/TV | head -10" -
Start Plex and verify hardware transcoding:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "sudo systemctl start plexmediaserver" # Wait 30 seconds for Plex to start: sleep 30 # Verify Plex is running: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "sudo systemctl status plexmediaserver" # Access Plex web UI: # http://10.1.1.125:32400/web # Verify libraries are visible (media from NFS mount) -
Test hardware transcoding:
- Access Plex web UI
- Go to Settings → Transcoder
- Verify “Use hardware acceleration when available” is enabled
- Verify “Hardware transcoding device” shows Intel iGPU
- Play a video that requires transcoding (adjust quality to force transcode)
- Check transcoding session:
# While video is playing, check transcode session: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "ps aux | grep 'Plex Transcoder'" # Should show transcoder process # Verify hardware transcoding in Plex dashboard: # Settings → Status → Now Playing # Should show "(hw)" next to video codec if using hardware transcode # Verify low CPU usage (hardware transcoding offloads to iGPU): ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "top -bn1 | head -20" # CPU usage should be <20% during 4K transcode if hw acceleration working
-
Validate Plex Media Scanner:
# Trigger manual library scan: # Plex UI → Libraries → [Library Name] → Scan Library Files # Verify scan works (can read NFS mount) # Check Plex logs if scan fails: ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 \ "tail -100 '/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Logs/Plex Media Scanner.log'"
Validation Checklist:
- Plex VM boots on Proxmox
- Intel iGPU visible in guest OS (
lspcishows iGPU) -
/dev/dridevices exist (renderD128, card0) -
vainfoshows Intel Quick Sync encode/decode profiles - NFS mount working (/mnt/media accessible)
- All Plex libraries visible in UI
- Plex can scan media files from NFS
- Hardware transcoding enabled in Plex settings
- Test transcode session shows “(hw)” indicator
- CPU usage low during transcode (<20% for 4K)
- Plex accessible from local network
- Plex accessible from internet (if remote access configured)
Troubleshooting iGPU Passthrough: If iGPU not visible or hardware transcoding not working:
- Verify IOMMU enabled:
dmesg | grep -i iommu - Verify i915 driver blacklisted on host:
lsmod | grep i915(should be empty) - Verify vfio-pci driver loaded:
lspci -nnk | grep -A3 VGA - Check VM machine type is q35:
qm config 103 | grep machine - Check permissions on /dev/dri in guest:
ls -l /dev/dri - Add Plex user to render group in guest:
sudo usermod -aG render plex
Rollback: Restart Plex VM on ESXi Downtime: ~1-2 hours Duration: 3-4 hours with full testing
2.5: Migrate “jarnetfw” VM (Palo Alto Firewall) - UPDATED ⭐ CRITICAL
Complexity: CRITICAL - Network outage affects ALL services Risk: HIGHEST - Inter-VLAN routing down during migration Timing: OFF-HOURS / PLANNED MAINTENANCE WINDOW REQUIRED
Critical Requirements:
- Document ALL interface → VLAN mappings
- Export Palo Alto configuration
- Plan communication (network outage window)
- Keep ESXi Host 1 available for emergency rollback
Pre-Migration Checklist:
- ALL other VMs successfully migrated to Proxmox (Pi-hole, Docker, Plex)
- Palo Alto config exported (done in Phase 0)
- Interface mappings documented
- Firewall rules screenshot
- NAT policies screenshot
- Scheduled maintenance window communicated
- Rollback plan ready (ESXi Host 1 kept online for 48 hours)
Interface Mapping Documentation (CRITICAL): Before migration, document EXACT interface mappings:
# On Palo Alto (via CLI or web UI), document:
# Interface | VLAN | IP Address | Role
# --------- | ---- | ---------- | ----
# ethernet1/1 | VLAN 300 | 10.1.300.1/24 | Public DMZ
# ethernet1/2 | VLAN 50 | 10.1.50.1/24 | Lab Network
# ethernet1/3 | VLAN 0 | 10.1.1.103/24 | Management
# ethernet1/4 | VLAN 4095 | Trunk | Internal
# (Example - document YOUR actual mappings)
Migration Steps:
-
Final configuration export:
# Via Palo Alto web UI: # Device → Setup → Operations → Export named configuration snapshot # Save to: $BACKUP_DIR/paloalto-final-config-$(date +%Y%m%d).xml -
Announce maintenance window:
- Network outage expected: 30-60 minutes
- All inter-VLAN traffic will be down
- Internet access will be down
- Schedule during lowest usage period
-
Shut down Palo Alto VM (network outage begins):
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vim-cmd vmsvc/power.off <VMID>" # Network outage begins NOW -
Export VM disk to Proxmox: (Same process - but work quickly, network is down)
-
Create VM on Proxmox with MULTIPLE network interfaces:
# On proxmox-02: qm create 104 --name paloalto-fw --memory 7168 --cores 4 --ostype other qm importdisk 104 /var/lib/vz/images/jarnetfw.vmdk local-zfs qm set 104 --scsi0 local-zfs:vm-104-disk-0 qm set 104 --boot order=scsi0 # CRITICAL: Add network interfaces with correct VLAN tags # Match your documented interface mappings! # Management interface (VLAN 0): qm set 104 --net0 virtio,bridge=vmbr0,tag=0 # Public interface (VLAN 300): qm set 104 --net1 virtio,bridge=vmbr0,tag=300 # Lab interface (VLAN 50): qm set 104 --net2 virtio,bridge=vmbr0,tag=50 # Trunk interface (VLAN 4095): qm set 104 --net3 virtio,bridge=vmbr0,tag=4095 # Add more interfaces as needed to match your config -
Start Palo Alto VM:
qm start 104 # Wait 2-3 minutes for Palo Alto to boot (slow boot) sleep 180 -
Verify management interface accessible:
# Try to ping Palo Alto management IP: ping -c 5 10.1.1.103 # Access web UI: # https://10.1.1.103 # Login and verify -
Verify ALL interfaces are UP:
# Via Palo Alto web UI: # Network → Interfaces # Verify all ethernet interfaces show "up" status # Verify VLAN assignments match documented mappings -
Test inter-VLAN routing:
# From a client on VLAN 0, ping a device on VLAN 50: ping 10.1.50.x # From a client on VLAN 50, ping internet: ping 8.8.8.8 # Test each VLAN can reach: # - Other VLANs (if policy allows) # - Internet (if NAT configured) # - Gateway (Palo Alto interface IP) -
Verify firewall rules working:
# Via Palo Alto web UI: # Monitor → Traffic # Generate test traffic and verify rules are being hit # Verify NAT working (if configured): # Monitor → Session Browser # Check outbound sessions show NAT translation -
Verify all migrated VMs can communicate:
- Plex VM can reach internet (for metadata, posters)
- Docker VM can reach internet (for container updates, Cloudflare API)
- Pi-hole can reach internet (for DNS resolution)
- All VMs can reach NFS server (10.1.1.150)
- Client devices can reach all services
Validation Checklist:
- Palo Alto VM boots on Proxmox
- Management web UI accessible (https://10.1.1.103)
- ALL network interfaces show “up” status
- VLAN tags correctly assigned to interfaces
- Inter-VLAN routing working (test each VLAN pair)
- Internet access working from all VLANs
- NAT policies working (if configured)
- Firewall rules working (monitor traffic logs)
- All previously migrated VMs still accessible
- No network errors in Palo Alto logs
Emergency Rollback Plan: If Palo Alto migration fails and network is down:
- DO NOT DELETE ESXi HOST 1 YET
- Power on ESXi Host 1
- Start jarnetfw VM on ESXi
- Wait 2-3 minutes for boot
- Network should restore
- Investigate Proxmox issue before retry
Downtime: ~30-60 minutes (network outage) Duration: 2-3 hours with full validation Recommendation: Do NOT proceed to Phase 3 until Palo Alto is stable for 24-48 hours
Phase 3: Install Proxmox on Host 1
Prerequisites:
- ALL critical VMs running successfully on Proxmox Host 2 for 24-48 hours
- Plex, Docker, Pi-hole, Palo Alto all stable and validated
- No network issues
- No service degradation
Steps: Same as Phase 1, but for ghost-esxi-01 (10.1.1.120)
- Hostname:
proxmox-01 - IP:
10.1.1.120/24 - Same network config, iGPU passthrough setup, NFS testing
Duration: 3-4 hours
Phase 4: Create 3-Node Proxmox Cluster
Prerequisites:
- Both NUCs running Proxmox successfully
- Stable network connectivity between all 3 nodes
- All VMs operational on proxmox-02
Steps: (Same as original plan)
- Initialize cluster on proxmox-01
- Join proxmox-02 to cluster
- Join pve-staging to cluster
- Verify 3-node cluster
- Test VM migration between nodes
Application Considerations:
- Before migrating VMs between nodes: Stop the VM, migrate, then start
- For Plex: Test iGPU passthrough on destination node first
- For Docker: Verify NFS mount on destination node first
Duration: 2-3 hours
Phase 5: Rebalance Workloads
5.1: Migrate Plex to Node 1 (Recommended)
Why: Free up resources on Node 2 for Home Assistant/Frigate Steps:
- Stop Plex VM on proxmox-02
- Migrate VM to proxmox-01 (via Proxmox UI or
qm migrate) - Verify iGPU passthrough still works on Node 1
- Verify NFS mount still works
- Test hardware transcoding
- Start Plex and validate
Duration: 1-2 hours
5.2: Migrate Home Assistant + Frigate from Staging to Node 2
Complexity: HIGH - Coral TPU USB passthrough required Prerequisites:
- Plex migrated to Node 1 (or sufficient resources on Node 2)
- USB controller passthrough tested on Node 2
USB Passthrough Preparation:
# On proxmox-02:
lsusb
# Identify Coral TPU: Bus 002 Device 003: ID 1a6e:089a Global Unichip Corp.
# Identify USB controller:
lspci | grep USB
# Note PCI address (e.g., 00:14.0)
Migration Steps:
-
Stop home-sec VM on staging:
# On pve-staging: qm stop 103 -
Backup VM (via Proxmox backup):
vzdump 103 --storage local --mode stop -
Migrate to Node 2:
# Method 1: Restore from backup qmrestore /var/lib/vz/dump/vzdump-qemu-103-*.vma.zst 203 --storage local-zfs # OR Method 2: If cluster created, use qm migrate: qm migrate 103 proxmox-02 -
Reconfigure USB passthrough on Node 2:
# On proxmox-02: # Update VM config to pass through USB controller qm set 203 --hostpci0 00:14.0 # OR pass through specific USB device: qm set 203 --usb0 host=1a6e:089a -
Start VM and verify Coral TPU:
qm start 203 # SSH into VM: ssh -i ~/.ssh/esxi_migration_rsa ubuntu@10.1.1.208 # Verify Coral TPU visible: lsusb | grep "Global Unichip" # Should show: Bus 002 Device 003: ID 1a6e:089a Global Unichip Corp. # Verify Frigate detects Coral: docker logs frigate 2>&1 | grep -i coral # Should show: "Coral detected" -
Validate Home Assistant and Frigate:
- Home Assistant web UI accessible (http://10.1.1.208:8123)
- Frigate web UI accessible
- Frigate detects Coral TPU (check Frigate logs)
- Camera streams visible
- Object detection working (person, car, etc.)
- Recordings working
Duration: 2-3 hours with testing
Phase 6: Final Cleanup and Validation
6.1: VM Cleanup
- Remove decommissioned VMs from Proxmox inventory
- Remove old ESXi VMs from ESXi hosts (if keeping ESXi as backup)
- Clean up test VMs
6.2: Documentation Updates
- Update network documentation with new VM IDs
- Document final cluster architecture
- Update credentials document with any new passwords
- Document lessons learned
6.3: Backup Configuration
# Backup Proxmox cluster config:
tar czf /root/proxmox-cluster-backup-$(date +%Y%m%d).tar.gz /etc/pve
# Copy to external storage:
scp /root/proxmox-cluster-backup-*.tar.gz user@backup-server:/backups/
6.4: Final Application Validation
Complete Service Test:
-
Media Pipeline End-to-End Test:
- User requests movie via Overseerr
- Radarr searches and sends to SABnzbd
- SABnzbd downloads to NFS (/mnt/media/Downloads)
- Radarr imports to /mnt/media/Movies
- Plex scans and adds movie
- User plays movie with hardware transcoding
- Expected Duration: 5-30 minutes (depending on download speed)
-
Network Services:
- Pi-hole DNS working (test from client:
nslookup google.com 10.1.1.35) - Palo Alto inter-VLAN routing working
- All VLANs can reach internet
- Firewall rules enforced
- Pi-hole DNS working (test from client:
-
Smart Home:
- Home Assistant responsive
- Frigate object detection working
- Camera recordings saving
- Coral TPU inference working (check Frigate stats)
-
Reverse Proxy:
- Traefik SSL certificates valid
- All services accessible via domain names
- Cloudflare DNS-01 challenge working
6.5: Monitoring Setup
- Configure Uptime Kuma monitoring for all services
- Set up Proxmox email alerts (optional)
- Configure backup schedules in Proxmox
Risk Assessment & Mitigation - Application Layer
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| NFS mount fails on Proxmox | Low | Critical | Test NFS in Phase 1.5 before VM migration |
| Plex iGPU passthrough fails | Medium | High | Test on staging first; keep ESXi available for rollback |
| Docker containers fail to start | Medium | High | Backup docker-compose.yml and .env; test individually |
| Traefik SSL certificates fail | Medium | Medium | Verify Cloudflare API key; manual cert generation possible |
| Coral TPU passthrough fails | Medium | High | Keep Home Assistant on staging until validated |
| Palo Alto network config wrong | Low | Critical | Document ALL interface mappings; test each VLAN |
| Plex database corruption | Very Low | High | Multiple backups before/during migration |
| SABnzbd loses download queue | Low | Medium | Export queue before migration; can re-add manually |
Application-Specific Rollback Plans
Plex Rollback
If Plex fails on Proxmox:
- Stop Plex VM on Proxmox
- Start original iridium VM on ESXi Host 1
- Restore Plex database from backup (if needed)
- Users can resume streaming immediately
Docker Stack Rollback
If Docker stack fails:
- Stop docker VM on Proxmox
- Start original docker VM on ESXi Host 1
- Run
docker-compose up -d - Services restored within 5 minutes
Palo Alto Rollback
If network fails:
- Shut down Palo Alto VM on Proxmox
- Start jarnetfw VM on ESXi Host 1
- Network restored within 2-3 minutes
Timeline Estimate - Application-Focused
| Week | Phase | Activities | Time | Critical Path |
|---|---|---|---|---|
| 1 | Phase 0 | Backups (Plex DB, Docker configs, Pi-hole) | 3-4 hours | Pre-req for all |
| 2 | Phase 1 | Install Proxmox on Host 2, test NFS | 4-5 hours | NFS test critical |
| 3 | Phase 2.1 | Migrate Pi-hole | 1-2 hours | Test migration process |
| 4 | Phase 2.2 | Migrate Docker stack (test Traefik/NFS) | 3-4 hours | Complex, high risk |
| 5 | Phase 2.3 | Migrate Plex (test iGPU + NFS) | 3-4 hours | Highest complexity |
| 6 | Phase 2.4 | Migrate Palo Alto (MAINTENANCE WINDOW) | 2-3 hours | Network outage |
| 7 | Validation | Monitor all services for stability | Ongoing | 1 week stability |
| 8 | Phase 3 | Install Proxmox on Host 1 | 3-4 hours | - |
| 9 | Phase 4 | Create cluster, migrate Plex to Node 1 | 3-4 hours | - |
| 10 | Phase 5 | Migrate Home Assistant/Frigate to Node 2 | 3-4 hours | Coral TPU test |
| 11 | Phase 6 | Final validation and cleanup | 2-3 hours | End-to-end test |
Total: ~30-40 hours over 11 weeks (comfortable pace) Fast-Track: 4-5 weekends (~25-30 hours total)
Critical Success Factors - Application Layer
Must-Have Before Starting:
- ✅ NFS accessible from Proxmox - Test in Phase 1.5
- ✅ Cloudflare API credentials documented - Needed for Traefik SSL
- ✅ Plex database backed up - Multiple backups
- ✅ Docker compose and .env files backed up - Critical for stack restore
- ✅ Palo Alto config exported - Network restoration depends on this
- ✅ iGPU passthrough working on Proxmox - Plex depends on this
Validation Gates (Do Not Proceed Until Complete):
- After Phase 1: NFS mount working on Proxmox
- After Phase 2.2: All Docker containers running, Traefik SSL working
- After Phase 2.3: Plex hardware transcoding working with iGPU
- After Phase 2.4: Network stable for 24-48 hours, all VLANs working
- After Phase 5.2: Coral TPU working, Frigate object detection confirmed
Next Steps
-
Review this updated plan and confirm:
- VM renaming strategy acceptable
- Downtime windows identified for Palo Alto migration
- Cloudflare API credentials available
- Understand NFS dependency (no media migration needed)
-
Answer remaining questions:
- Decommission server-2019, xsoar, home-security VMs (all offline)?
- Preferred cluster name and node naming convention?
- UniFi network devices inventory (for validation after migration)
-
Order hardware:
- 2x 2TB NVMe drives (if doing hardware upgrade)
-
Schedule maintenance windows:
- Palo Alto migration (30-60 min network outage)
- Plex migration (1-2 hour streaming outage)
Once approved, we can begin Phase 0 preparation!