Dec 28, 2025

ESXi to Proxmox Migration Plan

ESXi to Proxmox Migration Plan - UPDATED WITH APPLICATION LAYER

Created: 2025-12-28 Updated: 2025-12-28 (Application Layer Integration) Status: Ready for Execution

Executive Summary

Current Situation:

2x ESXi hosts (NUC9i9QNX) with production workloads
1x Proxmox staging server already running (Home Assistant + Frigate with Coral TPU)
Complete application stack: Plex, media automation (Radarr/Sonarr/SABnzbd), Pi-hole DNS, Palo Alto firewall
Critical External Dependency: NFS storage at 10.1.1.150 (27TB, 21TB used) - ALL media stored here

End Goal:

3-node Proxmox VE cluster with HA capability
All applications migrated with zero data loss
Hardware passthrough working (iGPU for Plex, Coral TPU for Frigate)
NFS mounts reconfigured on all VMs
Traefik reverse proxy with Cloudflare SSL working

Key Application Insights:

VM Naming Clarified: “platinum” (10.1.1.125) = Plex, “iridium” (10.1.1.126) = Support services
No Media Migration Needed: All media on NFS (10.1.1.150), only config/database migrations
Docker Stack: Complete media automation on single VM with Traefik reverse proxy
Service Dependencies: Mapped complete data flow from user request → Plex streaming
Network Management: UniFi Controller on iridium VM manages network infrastructure

Migration Architecture - Application View

End-State Application Distribution

Node 1: proxmox-01 (10.1.1.120)

Production Workloads:

Plex Media Server (platinum VM → renamed “plex”)
- Intel iGPU passthrough for Quick Sync transcoding
- NFS mount: 10.1.1.150:/volume1/datastore/media → /mnt/media
- Database: Local on VM (~500MB)
Plex Support Services (iridium VM → renamed “plex-support”)
- Tautulli (Plex monitoring and statistics)
- Cloudflared (remote access tunnel)
- UniFi Controller (network management)
- No iGPU needed (can reclaim passthrough)
Media Automation Stack (docker VM → renamed “docker-media”)
- Radarr, Sonarr, SABnzbd (media management)
- Traefik reverse proxy (Cloudflare SSL)
- Ombi, Overseerr (user requests)
- Portainer, Watchtower, Uptime Kuma
- NFS mount: 10.1.1.150:/volume1/datastore/media → /mnt/media
Pi-hole DNS (pihole VM)
- Network-wide DNS and ad-blocking
- Critical: All network clients depend on this
Palo Alto Firewall (jarnetfw VM)
- Inter-VLAN routing
- NAT, security policies
- Critical: Network outage if offline

Node 2: proxmox-02 (10.1.1.121)

Production Workloads:

Home Assistant + Frigate (home-sec VM, migrated from staging)
- USB controller passthrough for Coral TPU
- Docker containers: Home Assistant, Frigate, Mosquitto
- Local storage for Frigate recordings

Spare Capacity:

Room for growth and lab VMs

Node 3: pve-staging (10.1.1.123)

Role: HA Witness + Light Workloads

Quorum node for 3-node cluster
K8s lab, templates, Docker services

Pre-Migration Decisions Required

1. Hardware Upgrade Timing ⚠️ CRITICAL DECISION

RECOMMENDATION: Install 2TB NVMe drives BEFORE migration (Option A)

Clean Proxmox install on larger drives
No storage migration later
Order drives now, install before Phase 1

2. Proxmox Storage Backend

RECOMMENDATION: ZFS for NUCs (better for VMs, snapshots, replication)

Built-in compression and checksums
Native VM snapshot support
Better for Plex database and Docker volumes

3. VM Renaming (RECOMMENDED)

Current naming is confusing due to ESXi VM names vs hostnames. Recommend renaming during migration:

platinum (10.1.1.125) → “plex” (Plex Media Server)
iridium (10.1.1.126) → “plex-support” (Tautulli, Cloudflared, UniFi Controller)
docker (10.1.1.32) → “docker-media” (Media automation stack)
pihole (10.1.1.35) → “pihole” (keep same)
jarnetfw (10.1.1.103) → “paloalto-fw” (keep functionality same)

4. VMs to Decommission

Confirm these can be deleted (all currently offline on Host 2):

server-2019 (old Blue Iris host - replaced by Frigate)
home-security (old VM - replaced by Frigate on Proxmox)
xsoar (purpose unknown, offline)
win11-sse, win-10 (lab VMs, offline)

Phase 0: Preparation & Backups

0.1: Hardware Preparation

Order 2x 2TB WD Blue SN580 NVMe drives
Create Proxmox VE 8.x bootable USB installer
Prepare external backup storage (USB drive or network backup location)

0.2: Pre-Migration Backups - CRITICAL

Backup Script

Create and run this backup script BEFORE any migration:

#!/bin/bash
# Pre-Migration Backup Script
# Run this BEFORE starting migration

BACKUP_DIR="/tmp/esxi-migration-backup-$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR

echo "=== Starting Pre-Migration Backups ==="
echo "Backup directory: $BACKUP_DIR"

# 1. Plex Database Backup
echo "1. Backing up Plex database..."
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 \
  "sudo tar czf /tmp/plex-backup.tar.gz \
  '/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Plug-in Support/Databases/' \
  '/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Preferences.xml'"
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125:/tmp/plex-backup.tar.gz $BACKUP_DIR/
echo "✓ Plex backup complete"

# 2. Docker Compose + Env Files
echo "2. Backing up Docker compose and configs..."
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 \
  "tar czf /tmp/docker-backup.tar.gz /home/luke/docker/"
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32:/tmp/docker-backup.tar.gz $BACKUP_DIR/
echo "✓ Docker backup complete"

# 3. Pi-hole Config
echo "3. Backing up Pi-hole configuration..."
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.35 \
  "sudo tar czf /tmp/pihole-backup.tar.gz /etc/pihole/"
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.35:/tmp/pihole-backup.tar.gz $BACKUP_DIR/
echo "✓ Pi-hole backup complete"

# 4. Iridium VM (Tautulli, Cloudflared, UniFi)
echo "4. Backing up iridium services (Tautulli, UniFi, Cloudflared)..."
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 \
  "sudo tar czf /tmp/iridium-backup.tar.gz /config /data 2>/dev/null || sudo tar czf /tmp/iridium-backup.tar.gz /opt /etc/cloudflared 2>/dev/null || echo 'Partial backup'"
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126:/tmp/iridium-backup.tar.gz $BACKUP_DIR/
echo "✓ Iridium backup complete (Tautulli, UniFi, Cloudflared configs)"

# 5. Home Assistant Backup
echo "5. Backing up Home Assistant..."
ssh -i ~/.ssh/esxi_migration_rsa ubuntu@10.1.1.208 \
  "docker exec homeassistant ha backups new --name pre-migration"
echo "✓ Home Assistant backup created (stored in container)"

# 6. List all VMs for reference
echo "6. Documenting VM inventory..."
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 \
  "vim-cmd vmsvc/getallvms" > $BACKUP_DIR/esxi-host1-vms.txt
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.121 \
  "vim-cmd vmsvc/getallvms" > $BACKUP_DIR/esxi-host2-vms.txt
echo "✓ VM inventory documented"

# 7. Export ESXi network configs
echo "7. Exporting ESXi network configurations..."
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 \
  "esxcli network vswitch standard list" > $BACKUP_DIR/esxi-host1-network.txt
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.121 \
  "esxcli network vswitch standard list" > $BACKUP_DIR/esxi-host2-network.txt
echo "✓ Network configs exported"

echo ""
echo "=== Backup Complete ==="
echo "All backups saved to: $BACKUP_DIR"
echo ""
echo "Next steps:"
echo "1. Review backups in $BACKUP_DIR"
echo "2. Copy to external storage: cp -r $BACKUP_DIR /path/to/external/drive/"
echo "3. Verify Palo Alto firewall config is exported via web UI"
echo "4. Document current IP addresses and credentials"

Palo Alto Firewall Backup (Manual)

Log into Palo Alto web UI (https://10.1.1.103)
Export configuration: Device → Setup → Operations → Export named configuration
Save to $BACKUP_DIR/paloalto-config.xml
Screenshot all firewall rules (Policies → Security)
Screenshot NAT policies (Policies → NAT)

Document interface → VLAN mappings:

Example mapping:
ethernet1/1 → VLAN 300 (Public)
ethernet1/2 → VLAN 50 (Lab)
ethernet1/3 → VLAN 0 (Management)

NFS Mount Verification

# Verify NFS is accessible from current VMs
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "df -h | grep 10.1.1.150"
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "df -h | grep 10.1.1.150"

# Test NFS from Proxmox staging (verify Proxmox can access NFS)
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.123 "mount -t nfs4 10.1.1.150:/volume1/datastore/media /mnt/test && ls /mnt/test && umount /mnt/test"

0.3: Environment Variables & Credentials Documentation

Create a secure document with:

Cloudflare API key (for Traefik)
Cloudflare email
Domain name (${DOMAINNAME} from docker-compose.yml)
SABnzbd Usenet credentials
Palo Alto firewall admin credentials
All VM root/admin passwords
NFS server credentials (if any)

0.4: Copy External Dependencies

Download docker-compose.yml from docker VM:

scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32:/home/luke/docker/docker-compose.yml $BACKUP_DIR/

Download .env file (if exists):

scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32:/home/luke/docker/.env $BACKUP_DIR/ || echo "No .env file"

Download Traefik configs:

scp -r -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32:/home/luke/docker/traefik/ $BACKUP_DIR/

Duration: 2-3 hours Validation: Verify all backups exist and are not empty before proceeding

Phase 1: Install Proxmox on Host 2

1.1: Pre-Install Checks

All Phase 0 backups completed and verified
2TB NVMe drive physically installed (if doing hardware upgrade)
Proxmox USB installer prepared
All offline VMs on Host 2 confirmed safe to delete

1.2: Proxmox Installation

Same as original plan - install Proxmox on ghost-esx-02 (10.1.1.121):

Hostname: proxmox-02
IP: 10.1.1.121/24
Gateway: 10.1.1.1
Filesystem: ZFS (RAID0) with compression

1.3: Network Configuration

Configure dual 10GbE bond with VLAN-aware bridge (same as original plan)

1.4: Intel iGPU Passthrough Configuration

(Same as original plan - enable IOMMU, load VFIO, blacklist i915)

1.5: NFS Storage Testing - NEW

Critical: Test NFS mount BEFORE migrating VMs

# On proxmox-02:
# Install NFS client
apt install -y nfs-common

# Test NFS mount
mkdir -p /mnt/nfs-test
mount -t nfs4 10.1.1.150:/volume1/datastore/media /mnt/nfs-test

# Verify mount
df -h | grep 10.1.1.150
ls -lah /mnt/nfs-test

# Check read/write permissions
touch /mnt/nfs-test/proxmox-write-test.txt
rm /mnt/nfs-test/proxmox-write-test.txt

# Unmount test
umount /mnt/nfs-test

Expected Results:

NFS mount successful
Can read existing media files
Can create/delete test files
~27TB total, ~21TB used

Validation Checklist:

Proxmox web UI accessible at https://10.1.1.121:8006
SSH access working
Network connectivity (ping 10.1.1.1, 8.8.8.8)
Intel iGPU shows as available for passthrough
NFS mount working with read/write access
Test VM boots successfully

Duration: 3-4 hours

Phase 2: Migrate VMs from Host 1 to Proxmox Host 2

Migration Strategy: Incremental approach, lowest to highest risk

2.1: Test Migration - “pihole” VM (DNS Service) - UPDATED

Why First:

Relatively simple (single application, no storage dependencies)
Can use DNS fallback (8.8.8.8) during migration
Good test of migration process

Pre-Migration:

# Update DHCP to use fallback DNS temporarily (on Palo Alto or DHCP server)
# Add secondary DNS: 8.8.8.8

# Backup Pi-hole config (already done in Phase 0)
# Verify backup exists
ls -lh $BACKUP_DIR/pihole-backup.tar.gz

Migration Steps:

Stop Pi-hole VM on ESXi:

ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vim-cmd vmsvc/power.off <VMID>"

Export VM disk:

# On ESXi host:
ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vmkfstools -i /vmfs/volumes/<datastore>/pihole/pihole.vmdk /vmfs/volumes/<datastore>/pihole/pihole-flat.vmdk"

# Copy to Proxmox:
scp -i ~/.ssh/esxi_migration_rsa root@10.1.1.120:/vmfs/volumes/<datastore>/pihole/*.vmdk /var/lib/vz/images/

Create VM on Proxmox:

# On proxmox-02:
qm create 101 --name pihole --memory 1024 --cores 1 --net0 virtio,bridge=vmbr0,tag=0

# Import disk
qm importdisk 101 /var/lib/vz/images/pihole.vmdk local-zfs
qm set 101 --scsi0 local-zfs:vm-101-disk-0
qm set 101 --boot order=scsi0
qm set 101 --ostype l26

Start VM and validate:

qm start 101

# Wait 30 seconds, then test
ping 10.1.1.35
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.35

# Verify Pi-hole running
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.35 "pihole status"

# Test DNS resolution
nslookup google.com 10.1.1.35

Revert DHCP to use Pi-hole as primary DNS

Validation:

Pi-hole VM boots on Proxmox
Network connectivity (ping gateway, internet)
Pi-hole web UI accessible (http://10.1.1.35/admin)
DNS queries working
Ad blocking functional

Rollback: Restart Pi-hole VM on ESXi if issues Downtime: ~15-20 minutes Duration: 1-2 hours with testing

2.2: Migrate “iridium” VM (Plex Support Services) - UPDATED ⭐ MEDIUM PRIORITY

Why Second:

Important services but not critical infrastructure
Supports Plex (should be migrated before Plex itself)
UniFi Controller manages network devices
Good complexity test without critical dependencies

Applications:

Tautulli (Plex monitoring)
Cloudflared (remote access tunnel)
UniFi Controller (network management)

Pre-Migration Checklist:

Tautulli/UniFi/Cloudflared configs backed up (Phase 0)
Document Cloudflare tunnel token
Note UniFi devices (will temporarily lose management)

Migration Steps:

Document current configuration:

# Check what's running:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "ps aux | grep -E 'tautulli|cloudflare|unifi' | grep -v grep"

# Check Tautulli port:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "curl -I http://localhost:8181"

# Document Cloudflare tunnel token (visible in process list)

Stop services gracefully (if possible):

ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "sudo s6-svc -d /run/s6-rc/servicedirs/svc-tautulli || echo 'Manual stop failed, VM shutdown will stop services'"

Shut down VM on ESXi:

ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vim-cmd vmsvc/power.off <VMID>"

Export VM disk to Proxmox: (Same process as Pi-hole - export VMDK, copy to Proxmox, import)

Create VM on Proxmox:

# On proxmox-02:
qm create 104 --name plex-support --memory 8192 --cores 4 --net0 virtio,bridge=vmbr0,tag=0
qm importdisk 104 /var/lib/vz/images/iridium.vmdk local-zfs
qm set 104 --scsi0 local-zfs:vm-104-disk-0
qm set 104 --boot order=scsi0
qm set 104 --ostype l26

Start VM and verify services:

qm start 104

# Wait 60 seconds for boot
sleep 60

# SSH into VM
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126

Validate Tautulli:

# Check Tautulli is running:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "ps aux | grep tautulli | grep -v grep"

# Access Tautulli web UI:
curl -I http://10.1.1.126:8181
# Or open in browser: http://10.1.1.126:8181

# Verify Tautulli can connect to Plex (10.1.1.125):
# Check Tautulli UI → Settings → Plex Media Server

Validate Cloudflared:

# Check Cloudflared tunnel is running:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "ps aux | grep cloudflared | grep -v grep"

# Test remote access (if configured):
# Try accessing services via Cloudflare tunnel URL

Validate UniFi Controller:

# Check UniFi Controller is running:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.126 "ps aux | grep unifi | grep -v grep"

# Access UniFi Controller web UI:
# https://10.1.1.126:8443

# Verify UniFi devices are reconnecting:
# UniFi UI → Devices → Check all devices show "Connected"
# (May take 2-5 minutes for devices to reconnect)

Validation Checklist:

iridium VM boots on Proxmox
Network connectivity (ping gateway, internet)
Tautulli web UI accessible (http://10.1.1.126:8181)
Tautulli can connect to Plex server (10.1.1.125)
Cloudflared tunnel running (check process)
Remote access working (if configured)
UniFi Controller web UI accessible (https://10.1.1.126:8443)
UniFi devices reconnected (check controller UI)
All network devices managed and healthy

Important Notes:

UniFi Devices: Will briefly lose controller connection during migration
Devices will auto-reconnect when controller comes back online
No network outage (devices continue forwarding traffic)
Management features temporarily unavailable during migration

Rollback: Restart iridium VM on ESXi if issues Downtime: ~30-45 minutes (UniFi management only) Duration: 1-2 hours with full validation

2.3: Migrate “docker” VM (Media Automation Stack) - UPDATED ⭐ HIGH PRIORITY

Complexity: HIGH - Multiple dependencies (NFS, Traefik, Cloudflare, Docker network) Risk: Media management offline during migration

Critical Dependencies:

NFS mount for media (10.1.1.150)
Cloudflare API credentials for Traefik SSL
Docker volumes for all container configs
Environment variables (.env file)

Pre-Migration Checklist:

Docker compose file backed up
.env file backed up (contains Cloudflare credentials, domain)
All container config volumes backed up
NFS mount tested on Proxmox (done in Phase 1.5)
Cloudflare API key documented

Migration Steps:

Stop all Docker containers gracefully:

ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "cd /home/luke/docker && docker-compose down"

Shut down VM on ESXi:

ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vim-cmd vmsvc/power.off <VMID>"

Export VM disk to Proxmox: (Same process as Pi-hole - export VMDK, copy to Proxmox, import)

Create VM on Proxmox:

# On proxmox-02:
qm create 102 --name docker-media --memory 4096 --cores 2 --net0 virtio,bridge=vmbr0,tag=0
qm importdisk 102 /var/lib/vz/images/docker.vmdk local-zfs
qm set 102 --scsi0 local-zfs:vm-102-disk-0
qm set 102 --boot order=scsi0
qm set 102 --ostype l26

Start VM and reconfigure NFS mount:

qm start 102

# SSH into VM
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32

# Verify /etc/fstab has NFS mount:
cat /etc/fstab | grep 10.1.1.150

# Should show:
# 10.1.1.150:/volume1/datastore/media /mnt/media nfs4 defaults 0 0

# Mount NFS (should auto-mount from fstab, but verify):
mount -a
df -h | grep 10.1.1.150

# Verify media files accessible:
ls /mnt/media/Movies
ls /mnt/media/TV

Verify Docker Compose and Environment:

# Verify docker-compose.yml exists:
cat /home/luke/docker/docker-compose.yml | head -20

# Verify .env file exists (contains Cloudflare credentials):
cat /home/luke/docker/.env | grep CLOUDFLARE

# Should show:
# CLOUDFLARE_EMAIL=your@email.com
# CLOUDFLARE_API_KEY=your_api_key
# DOMAINNAME=your.domain.com

Start Docker containers:

ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "cd /home/luke/docker && docker-compose up -d"

# Monitor container startup:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "docker ps -a"

# Check logs for any errors:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "docker-compose logs -f --tail=50"

Validate ALL services:

Traefik (Reverse Proxy):

# Check Traefik is running and has SSL certs:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.32 "docker logs traefik 2>&1 | grep -i certificate"

# Verify Traefik web UI accessible:
curl -k https://traefik.${DOMAINNAME}

Radarr (Movies):

# Verify Radarr is accessible:
curl -I https://cyan.${DOMAINNAME}

# Verify NFS mount visible in Radarr:
# Access Radarr UI → Settings → Media Management → Root Folders
# Should show: /mnt/media/Movies

Sonarr (TV):

# Verify Sonarr is accessible:
curl -I https://teal.${DOMAINNAME}

# Verify NFS mount visible in Sonarr:
# Access Sonarr UI → Settings → Media Management → Root Folders
# Should show: /mnt/media/TV

SABnzbd (Downloads):

# Verify SABnzbd is running:
curl -I http://10.1.1.32:2099

# Verify download directories:
# Access SABnzbd UI → Config → Folders
# Should show: /downloads, /incomplete-downloads

Portainer:

# Access Portainer web UI:
# http://10.1.1.32:9000

Uptime Kuma:

curl -I https://status.${DOMAINNAME}

Test Complete Data Flow:
- Add test movie request in Ombi/Overseerr
- Verify Radarr picks up request
- Verify SABnzbd can download (or start download)
- Verify files saved to NFS mount (/mnt/media/Downloads)
- Verify Radarr can move to /mnt/media/Movies

Validation Checklist:

Rollback: Restart docker VM on ESXi, docker-compose up -d Downtime: ~30-60 minutes Duration: 2-3 hours with full validation

2.4: Migrate “platinum” VM (Plex Media Server) - UPDATED ⭐ CRITICAL

Complexity: HIGHEST - Requires iGPU passthrough + NFS mount Risk: Media streaming offline, hardware transcoding must work

Critical Dependencies:

Intel UHD 630 iGPU passthrough (for Quick Sync transcoding)
NFS mount for media libraries (10.1.1.150)
Plex database (local on VM)

Pre-Migration Checklist:

Plex database backed up (Phase 0)
iGPU passthrough verified working on Proxmox Host 2 (Phase 1.4)
NFS mount tested on Proxmox (Phase 1.5)
Current transcoding settings documented

Migration Steps:

Document current Plex settings:

ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "cat '/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Preferences.xml' | grep -i transcode"

# Note settings:
# - HardwareAcceleratedCodecs (should be enabled)
# - TranscoderH264BackgroundPreset
# - TranscoderTempDirectory

Stop Plex service gracefully:

ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "sudo systemctl stop plexmediaserver"

# Wait 30 seconds for graceful shutdown
sleep 30

Final Plex database backup:

ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 \
  "sudo tar czf /tmp/plex-final-backup.tar.gz \
  '/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/'"
scp -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125:/tmp/plex-final-backup.tar.gz $BACKUP_DIR/

Shut down VM on ESXi:

ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vim-cmd vmsvc/power.off <VMID>"

Export VM disk to Proxmox: (Same process - export VMDK, copy to Proxmox)

Create VM on Proxmox with iGPU passthrough:

# On proxmox-02:
qm create 103 --name plex --memory 8192 --cores 4 --net0 virtio,bridge=vmbr0,tag=0
qm importdisk 103 /var/lib/vz/images/iridium.vmdk local-zfs
qm set 103 --scsi0 local-zfs:vm-103-disk-0
qm set 103 --boot order=scsi0
qm set 103 --ostype l26

# CRITICAL: Set machine type to q35 (required for PCIe passthrough):
qm set 103 --machine q35

# Add Intel iGPU passthrough:
# First, identify iGPU PCI address:
lspci -nnk | grep -i vga
# Should show: 00:02.0 VGA compatible controller: Intel Corporation ... [8086:3e9b]

# Add PCI device to VM:
qm set 103 --hostpci0 00:02.0,pcie=1,rombar=0

Start VM and verify boot:

qm start 103

# Monitor boot via console (if needed):
qm terminal 103

# Wait 60 seconds for boot, then SSH:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125

Verify iGPU is visible in guest OS:

ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "lspci | grep -i vga"
# Should show: Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630]

# Verify /dev/dri devices exist:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "ls -l /dev/dri"
# Should show: renderD128, card0

# Install/verify vainfo (Intel GPU tools):
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "sudo apt install -y vainfo"

# Verify Intel Quick Sync capabilities:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "vainfo"
# Should show supported encode/decode profiles (H.264, HEVC, etc.)

Verify NFS mount:

ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "df -h | grep 10.1.1.150"
# Should show: 10.1.1.150:/volume1/datastore/media mounted on /mnt/media

# Verify media files accessible:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "ls /mnt/media/Movies | head -10"
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "ls /mnt/media/TV | head -10"

Start Plex and verify hardware transcoding:

ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "sudo systemctl start plexmediaserver"

# Wait 30 seconds for Plex to start:
sleep 30

# Verify Plex is running:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "sudo systemctl status plexmediaserver"

# Access Plex web UI:
# http://10.1.1.125:32400/web

# Verify libraries are visible (media from NFS mount)

Test hardware transcoding:

Access Plex web UI
Go to Settings → Transcoder
Verify “Use hardware acceleration when available” is enabled
Verify “Hardware transcoding device” shows Intel iGPU
Play a video that requires transcoding (adjust quality to force transcode)

Check transcoding session:

# While video is playing, check transcode session:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "ps aux | grep 'Plex Transcoder'"
# Should show transcoder process

# Verify hardware transcoding in Plex dashboard:
# Settings → Status → Now Playing
# Should show "(hw)" next to video codec if using hardware transcode

# Verify low CPU usage (hardware transcoding offloads to iGPU):
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 "top -bn1 | head -20"
# CPU usage should be <20% during 4K transcode if hw acceleration working

Validate Plex Media Scanner:

# Trigger manual library scan:
# Plex UI → Libraries → [Library Name] → Scan Library Files

# Verify scan works (can read NFS mount)
# Check Plex logs if scan fails:
ssh -i ~/.ssh/esxi_migration_rsa luke@10.1.1.125 \
  "tail -100 '/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Logs/Plex Media Scanner.log'"

Validation Checklist:

Troubleshooting iGPU Passthrough: If iGPU not visible or hardware transcoding not working:

Verify IOMMU enabled: dmesg | grep -i iommu
Verify i915 driver blacklisted on host: lsmod | grep i915 (should be empty)
Verify vfio-pci driver loaded: lspci -nnk | grep -A3 VGA
Check VM machine type is q35: qm config 103 | grep machine
Check permissions on /dev/dri in guest: ls -l /dev/dri
Add Plex user to render group in guest: sudo usermod -aG render plex

Rollback: Restart Plex VM on ESXi Downtime: ~1-2 hours Duration: 3-4 hours with full testing

2.5: Migrate “jarnetfw” VM (Palo Alto Firewall) - UPDATED ⭐ CRITICAL

Complexity: CRITICAL - Network outage affects ALL services Risk: HIGHEST - Inter-VLAN routing down during migration Timing: OFF-HOURS / PLANNED MAINTENANCE WINDOW REQUIRED

Critical Requirements:

Document ALL interface → VLAN mappings
Export Palo Alto configuration
Plan communication (network outage window)
Keep ESXi Host 1 available for emergency rollback

Pre-Migration Checklist:

ALL other VMs successfully migrated to Proxmox (Pi-hole, Docker, Plex)
Palo Alto config exported (done in Phase 0)
Interface mappings documented
Firewall rules screenshot
NAT policies screenshot
Scheduled maintenance window communicated
Rollback plan ready (ESXi Host 1 kept online for 48 hours)

Interface Mapping Documentation (CRITICAL): Before migration, document EXACT interface mappings:

# On Palo Alto (via CLI or web UI), document:
# Interface | VLAN | IP Address | Role
# --------- | ---- | ---------- | ----
# ethernet1/1 | VLAN 300 | 10.1.300.1/24 | Public DMZ
# ethernet1/2 | VLAN 50 | 10.1.50.1/24 | Lab Network
# ethernet1/3 | VLAN 0 | 10.1.1.103/24 | Management
# ethernet1/4 | VLAN 4095 | Trunk | Internal
# (Example - document YOUR actual mappings)

Migration Steps:

Final configuration export:

# Via Palo Alto web UI:
# Device → Setup → Operations → Export named configuration snapshot
# Save to: $BACKUP_DIR/paloalto-final-config-$(date +%Y%m%d).xml

Announce maintenance window:
- Network outage expected: 30-60 minutes
- All inter-VLAN traffic will be down
- Internet access will be down
- Schedule during lowest usage period

Shut down Palo Alto VM (network outage begins):

ssh -i ~/.ssh/esxi_migration_rsa root@10.1.1.120 "vim-cmd vmsvc/power.off <VMID>"
# Network outage begins NOW

Export VM disk to Proxmox: (Same process - but work quickly, network is down)

Create VM on Proxmox with MULTIPLE network interfaces:

# On proxmox-02:
qm create 104 --name paloalto-fw --memory 7168 --cores 4 --ostype other
qm importdisk 104 /var/lib/vz/images/jarnetfw.vmdk local-zfs
qm set 104 --scsi0 local-zfs:vm-104-disk-0
qm set 104 --boot order=scsi0

# CRITICAL: Add network interfaces with correct VLAN tags
# Match your documented interface mappings!

# Management interface (VLAN 0):
qm set 104 --net0 virtio,bridge=vmbr0,tag=0

# Public interface (VLAN 300):
qm set 104 --net1 virtio,bridge=vmbr0,tag=300

# Lab interface (VLAN 50):
qm set 104 --net2 virtio,bridge=vmbr0,tag=50

# Trunk interface (VLAN 4095):
qm set 104 --net3 virtio,bridge=vmbr0,tag=4095

# Add more interfaces as needed to match your config

Start Palo Alto VM:

qm start 104

# Wait 2-3 minutes for Palo Alto to boot (slow boot)
sleep 180

Verify management interface accessible:

# Try to ping Palo Alto management IP:
ping -c 5 10.1.1.103

# Access web UI:
# https://10.1.1.103
# Login and verify

Verify ALL interfaces are UP:

# Via Palo Alto web UI:
# Network → Interfaces
# Verify all ethernet interfaces show "up" status

# Verify VLAN assignments match documented mappings

Test inter-VLAN routing:

# From a client on VLAN 0, ping a device on VLAN 50:
ping 10.1.50.x

# From a client on VLAN 50, ping internet:
ping 8.8.8.8

# Test each VLAN can reach:
# - Other VLANs (if policy allows)
# - Internet (if NAT configured)
# - Gateway (Palo Alto interface IP)

Verify firewall rules working:

# Via Palo Alto web UI:
# Monitor → Traffic
# Generate test traffic and verify rules are being hit

# Verify NAT working (if configured):
# Monitor → Session Browser
# Check outbound sessions show NAT translation

Verify all migrated VMs can communicate:
- Plex VM can reach internet (for metadata, posters)
- Docker VM can reach internet (for container updates, Cloudflare API)
- Pi-hole can reach internet (for DNS resolution)
- All VMs can reach NFS server (10.1.1.150)
- Client devices can reach all services

Validation Checklist:

Emergency Rollback Plan: If Palo Alto migration fails and network is down:

DO NOT DELETE ESXi HOST 1 YET
Power on ESXi Host 1
Start jarnetfw VM on ESXi
Wait 2-3 minutes for boot
Network should restore
Investigate Proxmox issue before retry

Downtime: ~30-60 minutes (network outage) Duration: 2-3 hours with full validation Recommendation: Do NOT proceed to Phase 3 until Palo Alto is stable for 24-48 hours

Phase 3: Install Proxmox on Host 1

Prerequisites:

ALL critical VMs running successfully on Proxmox Host 2 for 24-48 hours
Plex, Docker, Pi-hole, Palo Alto all stable and validated
No network issues
No service degradation

Steps: Same as Phase 1, but for ghost-esxi-01 (10.1.1.120)

Hostname: proxmox-01
IP: 10.1.1.120/24
Same network config, iGPU passthrough setup, NFS testing

Duration: 3-4 hours

Phase 4: Create 3-Node Proxmox Cluster

Prerequisites:

Both NUCs running Proxmox successfully
Stable network connectivity between all 3 nodes
All VMs operational on proxmox-02

Steps: (Same as original plan)

Initialize cluster on proxmox-01
Join proxmox-02 to cluster
Join pve-staging to cluster
Verify 3-node cluster
Test VM migration between nodes

Application Considerations:

Before migrating VMs between nodes: Stop the VM, migrate, then start
For Plex: Test iGPU passthrough on destination node first
For Docker: Verify NFS mount on destination node first

Duration: 2-3 hours

Phase 5: Rebalance Workloads

5.1: Migrate Plex to Node 1 (Recommended)

Why: Free up resources on Node 2 for Home Assistant/Frigate Steps:

Stop Plex VM on proxmox-02
Migrate VM to proxmox-01 (via Proxmox UI or qm migrate)
Verify iGPU passthrough still works on Node 1
Verify NFS mount still works
Test hardware transcoding
Start Plex and validate

Duration: 1-2 hours

5.2: Migrate Home Assistant + Frigate from Staging to Node 2

Complexity: HIGH - Coral TPU USB passthrough required Prerequisites:

Plex migrated to Node 1 (or sufficient resources on Node 2)
USB controller passthrough tested on Node 2

USB Passthrough Preparation:

# On proxmox-02:
lsusb
# Identify Coral TPU: Bus 002 Device 003: ID 1a6e:089a Global Unichip Corp.

# Identify USB controller:
lspci | grep USB
# Note PCI address (e.g., 00:14.0)

Migration Steps:

Stop home-sec VM on staging:
```
# On pve-staging:
qm stop 103
```
Backup VM (via Proxmox backup):
```
vzdump 103 --storage local --mode stop
```

Migrate to Node 2:

# Method 1: Restore from backup
qmrestore /var/lib/vz/dump/vzdump-qemu-103-*.vma.zst 203 --storage local-zfs

# OR Method 2: If cluster created, use qm migrate:
qm migrate 103 proxmox-02

Reconfigure USB passthrough on Node 2:

# On proxmox-02:
# Update VM config to pass through USB controller
qm set 203 --hostpci0 00:14.0

# OR pass through specific USB device:
qm set 203 --usb0 host=1a6e:089a

Start VM and verify Coral TPU:

qm start 203

# SSH into VM:
ssh -i ~/.ssh/esxi_migration_rsa ubuntu@10.1.1.208

# Verify Coral TPU visible:
lsusb | grep "Global Unichip"
# Should show: Bus 002 Device 003: ID 1a6e:089a Global Unichip Corp.

# Verify Frigate detects Coral:
docker logs frigate 2>&1 | grep -i coral
# Should show: "Coral detected"

Validate Home Assistant and Frigate:
- Home Assistant web UI accessible (http://10.1.1.208:8123)
- Frigate web UI accessible
- Frigate detects Coral TPU (check Frigate logs)
- Camera streams visible
- Object detection working (person, car, etc.)
- Recordings working

Duration: 2-3 hours with testing

Phase 6: Final Cleanup and Validation

6.1: VM Cleanup

Remove decommissioned VMs from Proxmox inventory
Remove old ESXi VMs from ESXi hosts (if keeping ESXi as backup)
Clean up test VMs

6.2: Documentation Updates

Update network documentation with new VM IDs
Document final cluster architecture
Update credentials document with any new passwords
Document lessons learned

6.3: Backup Configuration

# Backup Proxmox cluster config:
tar czf /root/proxmox-cluster-backup-$(date +%Y%m%d).tar.gz /etc/pve

# Copy to external storage:
scp /root/proxmox-cluster-backup-*.tar.gz user@backup-server:/backups/

6.4: Final Application Validation

Complete Service Test:

Media Pipeline End-to-End Test:
- User requests movie via Overseerr
- Radarr searches and sends to SABnzbd
- SABnzbd downloads to NFS (/mnt/media/Downloads)
- Radarr imports to /mnt/media/Movies
- Plex scans and adds movie
- User plays movie with hardware transcoding
- Expected Duration: 5-30 minutes (depending on download speed)
Network Services:
- Pi-hole DNS working (test from client: nslookup google.com 10.1.1.35)
- Palo Alto inter-VLAN routing working
- All VLANs can reach internet
- Firewall rules enforced
Smart Home:
- Home Assistant responsive
- Frigate object detection working
- Camera recordings saving
- Coral TPU inference working (check Frigate stats)
Reverse Proxy:
- Traefik SSL certificates valid
- All services accessible via domain names
- Cloudflare DNS-01 challenge working

6.5: Monitoring Setup

Configure Uptime Kuma monitoring for all services
Set up Proxmox email alerts (optional)
Configure backup schedules in Proxmox

Risk Assessment & Mitigation - Application Layer

Risk	Likelihood	Impact	Mitigation
NFS mount fails on Proxmox	Low	Critical	Test NFS in Phase 1.5 before VM migration
Plex iGPU passthrough fails	Medium	High	Test on staging first; keep ESXi available for rollback
Docker containers fail to start	Medium	High	Backup docker-compose.yml and .env; test individually
Traefik SSL certificates fail	Medium	Medium	Verify Cloudflare API key; manual cert generation possible
Coral TPU passthrough fails	Medium	High	Keep Home Assistant on staging until validated
Palo Alto network config wrong	Low	Critical	Document ALL interface mappings; test each VLAN
Plex database corruption	Very Low	High	Multiple backups before/during migration
SABnzbd loses download queue	Low	Medium	Export queue before migration; can re-add manually

Application-Specific Rollback Plans

Plex Rollback

If Plex fails on Proxmox:

Stop Plex VM on Proxmox
Start original iridium VM on ESXi Host 1
Restore Plex database from backup (if needed)
Users can resume streaming immediately

Docker Stack Rollback

If Docker stack fails:

Stop docker VM on Proxmox
Start original docker VM on ESXi Host 1
Run docker-compose up -d
Services restored within 5 minutes

Palo Alto Rollback

If network fails:

Shut down Palo Alto VM on Proxmox
Start jarnetfw VM on ESXi Host 1
Network restored within 2-3 minutes

Timeline Estimate - Application-Focused

Week	Phase	Activities	Time	Critical Path
1	Phase 0	Backups (Plex DB, Docker configs, Pi-hole)	3-4 hours	Pre-req for all
2	Phase 1	Install Proxmox on Host 2, test NFS	4-5 hours	NFS test critical
3	Phase 2.1	Migrate Pi-hole	1-2 hours	Test migration process
4	Phase 2.2	Migrate Docker stack (test Traefik/NFS)	3-4 hours	Complex, high risk
5	Phase 2.3	Migrate Plex (test iGPU + NFS)	3-4 hours	Highest complexity
6	Phase 2.4	Migrate Palo Alto (MAINTENANCE WINDOW)	2-3 hours	Network outage
7	Validation	Monitor all services for stability	Ongoing	1 week stability
8	Phase 3	Install Proxmox on Host 1	3-4 hours	-
9	Phase 4	Create cluster, migrate Plex to Node 1	3-4 hours	-
10	Phase 5	Migrate Home Assistant/Frigate to Node 2	3-4 hours	Coral TPU test
11	Phase 6	Final validation and cleanup	2-3 hours	End-to-end test

Total: ~30-40 hours over 11 weeks (comfortable pace) Fast-Track: 4-5 weekends (~25-30 hours total)

Critical Success Factors - Application Layer

Must-Have Before Starting:

✅ NFS accessible from Proxmox - Test in Phase 1.5
✅ Cloudflare API credentials documented - Needed for Traefik SSL
✅ Plex database backed up - Multiple backups
✅ Docker compose and .env files backed up - Critical for stack restore
✅ Palo Alto config exported - Network restoration depends on this
✅ iGPU passthrough working on Proxmox - Plex depends on this

Validation Gates (Do Not Proceed Until Complete):

After Phase 1: NFS mount working on Proxmox
After Phase 2.2: All Docker containers running, Traefik SSL working
After Phase 2.3: Plex hardware transcoding working with iGPU
After Phase 2.4: Network stable for 24-48 hours, all VLANs working
After Phase 5.2: Coral TPU working, Frigate object detection confirmed

Next Steps

Review this updated plan and confirm:
- VM renaming strategy acceptable
- Downtime windows identified for Palo Alto migration
- Cloudflare API credentials available
- Understand NFS dependency (no media migration needed)
Answer remaining questions:
- Decommission server-2019, xsoar, home-security VMs (all offline)?
- Preferred cluster name and node naming convention?
- UniFi network devices inventory (for validation after migration)
Order hardware:
- 2x 2TB NVMe drives (if doing hardware upgrade)
Schedule maintenance windows:
- Palo Alto migration (30-60 min network outage)
- Plex migration (1-2 hour streaming outage)

Once approved, we can begin Phase 0 preparation!