Dec 28, 2025

ESXi to Proxmox Migration Plan REVISED

ESXi to Proxmox Migration Plan - REVISED

Created: 2025-12-28 Revised: 2025-12-28 Status: Draft - Ready for User Review

Executive Summary

Current Situation:

2x ESXi hosts (NUC9i9QNX) with production workloads
1x Proxmox staging server already running (Home Assistant + Frigate with Coral TPU)
Blue Iris → Frigate migration already complete ✅
Proxmox platform validated ✅

End Goal:

3-node Proxmox VE cluster with HA capability
Nodes 1 & 2 (NUCs): Production workhorses
Node 3 (staging): HA witness (QDevice) + light workloads

Migration Approach:

Install Proxmox on Host 2 first (all VMs offline - lowest risk)
Migrate critical VMs from Host 1 → Proxmox Host 2
Install Proxmox on Host 1
Create 3-node cluster
Migrate Home Assistant/Frigate from staging → NUC
Rebalance workloads

Migration Architecture

End-State Configuration

Node 1: proxmox-01 (was ghost-esxi-01, 10.1.1.120)

Hardware:

Intel NUC9i9QNX
8C/16T i9-9980HK
64GB RAM
2TB NVMe (upgrade from 1TB)
Dual 10GbE + 1GbE
Intel UHD 630 iGPU (passthrough capable)

Workloads:

Plex (iGPU passthrough for Quick Sync)
Docker stack (Radarr, Sonarr, SABnzbd, etc.)
Pi-hole
Palo Alto firewall
Lab VMs (as needed)

Node 2: proxmox-02 (was ghost-esx-02, 10.1.1.121)

Hardware:

Intel NUC9i9QNX
8C/16T i9-9980HK
64GB RAM
2TB NVMe (upgrade from 1TB)
Dual 10GbE + 1GbE
Intel UHD 630 iGPU (passthrough capable)

Workloads:

Home Assistant + Frigate (migrated from staging)
Spare capacity for growth
Lab VMs (as needed)

Node 3: pve-staging (remains, 10.1.1.123)

Hardware:

Intel Core i5-8400T (6 cores)
32GB RAM
~900GB storage (LVM-Thin)
Single 1GbE
USB controller for Coral TPU (until Frigate migrates)

Role:

Quorum/witness for 3-node cluster (QDevice)
Docker services
K8s lab
Templates
Temporary home for Frigate until NUC migration

Cluster Topology:

┌─────────────────┐       ┌─────────────────┐       ┌─────────────────┐
│  proxmox-01     │◄─────►│  proxmox-02     │◄─────►│  pve-staging    │
│  10.1.1.120     │       │  10.1.1.121     │       │  10.1.1.123     │
│  NUC (Prod)     │       │  NUC (Prod)     │       │  Witness/QDev   │
│  HA Member      │       │  HA Member      │       │  Quorum only    │
└─────────────────┘       └─────────────────┘       └─────────────────┘
         ▲                         ▲                         ▲
         └─────────────────────────┴─────────────────────────┘
                        10GbE Network

Pre-Migration Decisions Required

1. Hardware Upgrade Timing ⚠️ CRITICAL DECISION

Option A: Install 2TB drives BEFORE migration (RECOMMENDED)

✅ Clean Proxmox install on larger drives
✅ No need to resize/migrate storage later
✅ More headroom during migration
❌ Adds time/complexity upfront
Timeline: Order drives now, install before starting

Option B: Upgrade DURING migration (Hybrid)

Install 2TB in Host 2 when wiping for Proxmox
Keep 1TB in Host 1 temporarily
Upgrade Host 1 drive later (requires re-migration)
⚠️ Inconsistent storage capacity during migration

Option C: Upgrade AFTER migration

❌ More complex - requires VM migration again
❌ Less space during critical migration phase
❌ Not recommended

RECOMMENDATION: Option A - Install 2TB drives first

2. Proxmox Storage Backend

Option A: ZFS (single disk)

✅ Built-in snapshots and compression
✅ Data integrity (checksums)
✅ Better VM performance
✅ Native replication support
❌ ~5-10% overhead for single disk
RECOMMENDED for your use case

Option B: LVM-Thin (like your staging server)

✅ Familiar (already using on staging)
✅ Slightly more usable space
✅ Thin provisioning
❌ No native compression
❌ Fewer snapshot features
Alternative if you want consistency with staging

RECOMMENDATION: ZFS for NUCs, keep LVM-Thin on staging

3. Network Configuration

Proxmox vmbr0 Configuration (both NUCs):

Bond: 2x 10GbE in balance-alb or LACP (if switch supports)
VLAN-aware bridge: Yes
VLANs: 0 (mgmt), 50 (lab), 300 (public), 4095 (trunk)
MTU: 9000 (jumbo frames, matching current ESXi)

Proxmox vmbr1 Configuration (optional):

1x 10GbE dedicated for Corosync/migration traffic
Low latency, isolated from VM traffic
RECOMMENDED for cluster stability

Migration Phases

Phase 0: Preparation (1-2 weeks)

Hardware:

Order 2x 2TB WD Blue SN580 NVMe drives
Create Proxmox VE 8.x bootable USB installer
Prepare backup storage for critical VMs

Validation:

Test iGPU passthrough on pve-staging (if possible - different CPU though)
Document current Plex transcoding settings
Document Palo Alto firewall configuration
Export all ESXi VM configurations
Take screenshots of ESXi network/storage configs

Information Gathering:

Clarify “iridium” VM purpose (currently unknown)
Confirm server-2019 and xsoar VMs can be deleted/offline
Identify acceptable downtime windows for critical services
Verify NFS backup (10.1.1.150) is not needed

Backups:

Export critical VM disk images (Plex, Docker, Palo Alto)
Backup Plex database externally
Document all VM IP addresses and network configs
Save Docker compose files / container configs

Phase 1: Install Proxmox on Host 2 (ghost-esx-02)

Why Host 2 First:

✅ All VMs currently powered off (zero downtime)
✅ Larger storage (2.79TB vs 931GB) - better for receiving migrated VMs
✅ Lower risk - no critical services running

Steps:

Pre-install backup (if needed):
- home-security VM is replaced by Frigate (can delete)
- server-2019: Likely old Blue Iris host (confirm, then delete)
- xsoar, win11-sse, win-10: Confirm these can be offline permanently
Install new 2TB NVMe drive (if doing Option A)
Boot Proxmox installer:
- Hostname: proxmox-02 (or pve-02)
- IP: 10.1.1.121/24 (keep same IP)
- Gateway: 10.1.1.1
- DNS: 10.1.1.1 (or current DNS server)
- Disk: Select 2TB NVMe
- Filesystem: ZFS (RAID0) - single disk with compression
- Country/Timezone: Set appropriately
- Root password: Secure password

Post-install configuration:

# Update system
apt update && apt full-upgrade -y

# Configure repositories (remove enterprise repo if no subscription)
# Edit /etc/apt/sources.list.d/pve-enterprise.list and comment out
# Add no-subscription repo
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > /etc/apt/sources.list.d/pve-no-subscription.list
apt update

# Install useful packages
apt install -y vim htop iotop tmux

Configure networking:

# Edit /etc/network/interfaces
# Create vmbr0 with dual 10GbE bond + VLAN-aware
# Create vmbr1 for migration/Corosync (optional but recommended)

Example /etc/network/interfaces:

auto lo
iface lo inet loopback

# 1GbE - emergency access only
auto eno1
iface eno1 inet manual

# 10GbE bond for VM traffic
auto bond0
iface bond0 inet manual
    bond-slaves enp1s0f0 enp1s0f1
    bond-miimon 100
    bond-mode balance-alb
    bond-xmit-hash-policy layer2+3
    mtu 9000

# VLAN-aware bridge for VMs
auto vmbr0
iface vmbr0 inet static
    address 10.1.1.121/24
    gateway 10.1.1.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    mtu 9000

Configure Intel iGPU for passthrough:

# Enable IOMMU
vi /etc/default/grub
# Add to GRUB_CMDLINE_LINUX_DEFAULT:
# intel_iommu=on iommu=pt

update-grub

# Add VFIO modules
vi /etc/modules
# Add:
# vfio
# vfio_iommu_type1
# vfio_pci
# vfio_virqfd

# Blacklist i915 driver (iGPU)
echo "blacklist i915" >> /etc/modprobe.d/blacklist.conf

# Update initramfs
update-initramfs -u -k all

# Reboot
reboot

Verify iGPU is available for passthrough:

lspci -nnk | grep -i vga
# Should show vfio-pci driver for Intel UHD 630

Test VM creation:
- Create small test VM
- Verify networking works
- Test VLAN tagging
- Validate storage performance

Validation Checklist:

Proxmox web UI accessible at https://10.1.1.121:8006
SSH access working
Network connectivity (ping gateway, internet)
Intel iGPU shows as available for passthrough (lspci)
Storage pool visible and healthy
Test VM boots and has network connectivity

Duration: 2-4 hours

Phase 2: Migrate VMs from Host 1 to Proxmox Host 2

Migration Order (lowest to highest risk):

2.1: Test Migration - “iridium” (Low Risk, Unknown Purpose)

Export from ESXi via OVF or disk copy
Import to Proxmox Host 2
Test boot and functionality
Validate migration process
Downtime: ~15-30 minutes

2.2: Pi-hole (Medium Risk, DNS Service)

Risk: DNS disruption during migration
Mitigation: Update DHCP to use backup DNS (8.8.8.8) temporarily
Export VM from ESXi
Import to Proxmox Host 2
Reconfigure network (static IP, VLAN tag if needed)
Test DNS resolution
Downtime: ~15-20 minutes

2.3: Docker Stack (High Risk, Media Services)

Risk: Radarr, Sonarr, SABnzbd, Overseerr, etc. offline
Impact: Downloads/media management paused
Export VM from ESXi (may be large if Docker volumes are on VM disk)
Import to Proxmox Host 2
Start VM and verify all containers come up
Test media stack functionality
Downtime: ~30-60 minutes

2.4: Plex Server “xeon” (HIGH RISK, iGPU Passthrough Required)

Risk: Media streaming offline, iGPU passthrough must work
Complexity: HIGHEST - hardware passthrough critical
Prerequisites:
- Verify iGPU passthrough working on Proxmox Host 2
- Backup Plex database to external storage
- Document current transcoding settings

Migration Steps:

Prepare Plex for migration:
- Stop Plex service on ESXi VM
- Backup Plex database: /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/
- Note transcoder settings (Hardware acceleration: Quick Sync)
Export VM:
- Shut down “xeon” VM on ESXi
- Export VMDK to Proxmox Host 2 via SCP/NFS
Import and configure on Proxmox:
- Create new VM on Proxmox (match CPU/RAM: 4 vCPU, 8GB RAM)
- Machine type: q35 (required for PCIe passthrough)
- Import disk image
- Add iGPU passthrough:
  - Add PCI device: Intel UHD 630 (00:02.0)
  - Enable “All Functions”, “Primary GPU”, “PCI-Express”
- Network: Configure bridge with appropriate VLAN
Boot and validate:
- Start VM
- Check iGPU is visible in guest OS: lspci | grep VGA
- Install/update Intel Graphics drivers in Ubuntu
- Start Plex
- Verify hardware transcoding: Play media and check transcoder shows “(hw)”
Test transcoding:
- Play 4K video and verify Quick Sync is being used
- Check CPU usage (should be low with hw transcoding)
- Verify vainfo shows available encode/decode profiles

Downtime: ~1-2 hours Rollback: Keep ESXi host 1 bootable until validated

2.5: Palo Alto Firewall “jarnetfw” (CRITICAL - LAST)

Risk: CRITICAL - Network outage for all VLANs
Impact: All inter-VLAN routing down
Migration Window: Off-hours/planned outage required

Prerequisites:

ALL other VMs successfully migrated and validated
Document complete firewall configuration
Export Palo Alto config backup
Plan communication to users (if applicable)

Migration Steps:

Backup current state:
- Export Palo Alto configuration via web UI
- Screenshot all firewall rules/NAT/routing
- Document interface → VLAN mappings
Export VM:
- Shut down jarnetfw VM (network outage begins)
- Export VMDK to Proxmox Host 2
Import and configure:
- Create VM on Proxmox (4 vCPU, 7GB RAM)
- Import disk
- Critical: Map network interfaces correctly
  - Match VLAN tags to ESXi port groups
  - VM network interface 1 → vmbr0.300 (Public)
  - VM network interface 2 → vmbr0.50 (Lab)
  - etc.
Boot and validate:
- Start VM
- Check Palo Alto web UI is accessible
- Verify all interfaces are UP
- Test inter-VLAN routing
- Test internet connectivity from each VLAN
- Verify firewall rules are working

Downtime: ~30-60 minutes (network outage) Rollback Plan: If fails, revert to ESXi host 1 (keep it available for 24-48 hours)

Phase 3: Install Proxmox on Host 1 (ghost-esxi-01)

Why After Host 2:

All critical VMs now running on Proxmox Host 2
Host 1 can be wiped with confidence
Lower pressure/risk

Steps:

Final validation:
- Verify ALL migrated VMs running successfully on Host 2
- Confirm Plex transcoding working
- Confirm Palo Alto firewall routing working
- Confirm Docker services accessible
Export remaining VMs (if needed):
- Win-11, Win7-Victim (lab VMs) - only if you want to keep them
Install new 2TB NVMe drive (if not done yet)
Install Proxmox VE:
- Same process as Host 2
- Hostname: proxmox-01 (or pve-01)
- IP: 10.1.1.120/24
- Gateway: 10.1.1.1
- Filesystem: ZFS (RAID0)
Configure networking:
- Match Host 2 configuration (dual 10GbE bond, VLAN-aware bridge)
- MTU 9000 for jumbo frames
Configure iGPU passthrough:
- Same steps as Host 2
- Enable IOMMU, load VFIO modules, blacklist i915
Test and validate:
- Create test VM
- Verify iGPU available for passthrough
- Verify network connectivity

Duration: 2-4 hours

Phase 4: Create 3-Node Proxmox Cluster

Prerequisites:

Both NUCs running Proxmox successfully
All critical VMs operational on Host 2
Stable network connectivity between all 3 nodes

Steps:

Initialize cluster on Node 1:

# On proxmox-01 (10.1.1.120):
pvecm create homelab-cluster

# Verify cluster status
pvecm status

Join Node 2 to cluster:

# On proxmox-02 (10.1.1.121):
pvecm add 10.1.1.120

# Enter root password for proxmox-01
# Wait for join to complete

Join Node 3 (staging) to cluster:

# On pve-staging (10.1.1.123):
pvecm add 10.1.1.120

# Enter root password
# Wait for join to complete

Verify 3-node cluster:

# On any node:
pvecm status
pvecm nodes

# Should show all 3 nodes online
# Quorum should be 2/3

Configure node priorities (optional but recommended):
- Set staging node as lower priority for resource allocation
- Configure HA groups if desired
Test cluster:
- View all nodes in web UI
- Test VM migration between Node 1 and Node 2
- Verify quorum works if one node goes down

Notes on 3-Node Cluster:

Quorum: Need 2/3 nodes online for cluster to function
HA: Can survive 1 node failure
No QDevice needed with 3 nodes (odd number provides quorum)
Staging node: Can have lower specs, only needs to vote for quorum

Duration: 1-2 hours

Phase 5: Migrate Plex Back to Node 1 (Optional but Recommended)

Why:

Free up resources on Node 2 for Home Assistant migration
Better workload distribution
Node 1 has iGPU, ideal for Plex

Steps:

Stop Plex VM on Node 2
Migrate VM to Node 1 via Proxmox (live or offline migration)
Verify iGPU passthrough still works on Node 1
Test Plex transcoding
Start Plex and validate

Duration: 30-60 minutes

Phase 6: Migrate Home Assistant + Frigate to Node 2

Why:

Move production workload from staging to more powerful NUC
Free up staging server for witness role + light workloads
Better long-term architecture

Challenges:

Coral TPU: Staging has USB controller passthrough (PCI 00:14)
- NUCs may have different PCI topology
- May need to pass through entire USB controller or individual USB port
Static IP: Home Assistant has static IP (10.1.1.208)
Uptime: High-priority service (CCTV)

Steps:

Prepare Node 2 for USB passthrough:

# On proxmox-02:
lsusb
# Identify Coral TPU device

# Find USB controller PCI ID
lspci | grep USB

# Configure USB controller for passthrough (similar to staging)
# OR use USB device ID passthrough (easier but less stable)

Stop home-sec VM on staging:
```
# On pve-staging:
qm stop 103
```

Export and migrate VM:

# Method 1: Backup/Restore via Proxmox
vzdump 103 --storage local --mode stop
# Copy backup to Node 2
# Restore on Node 2

# Method 2: qm migrate (if cluster is already created)
qm migrate 103 proxmox-02 --online

Reconfigure VM on Node 2:
- Update PCI passthrough to match Node 2’s USB controller PCI ID
- Verify network configuration (static IP 10.1.1.208)
- Ensure cloud-init settings preserved
Boot and validate:
- Start VM on Node 2
- Check Coral TPU is visible: lsusb in guest OS
- Check Home Assistant accessible via web UI
- Verify Frigate detects Coral TPU:
  - Check Frigate logs
  - Verify “Coral” detector is active
  - Test object detection on camera feeds
Monitor for 24-48 hours:
- Verify camera streams stable
- Verify object detection working
- Check for any USB passthrough issues

Downtime: ~30-60 minutes Fallback: Can revert to staging server if issues

Duration: 2-3 hours with testing

Phase 7: Final Rebalancing and Cleanup

Workload Distribution Review:

Node	VMs	vCPU Total	RAM Total	Notes
proxmox-01	Plex, Docker, Pi-hole, Palo Alto	~11 vCPU	~20GB	iGPU for Plex
proxmox-02	Home Assistant/Frigate	~4 vCPU	~8GB	USB for Coral TPU, room to grow
pve-staging	Docker-host-1, K8s lab, templates	~4 vCPU active	~8GB active	Witness + dev/test

Cleanup Tasks:

Delete old ESXi VMs from inventory (if safe)
Remove home-security, server-2019 VMs from ESXi host 2
Clean up iridium VM if purpose unknown/no longer needed
Update documentation with final IP addresses
Update DNS records if any changed
Configure Proxmox backups (PBS or external)
Test HA failover (optional)

Risk Assessment & Mitigation

Risk	Likelihood	Impact	Mitigation
iGPU passthrough fails on Proxmox	Medium	High	Test on staging first; rollback to ESXi if needed
Coral TPU passthrough fails on NUC	Medium	High	Keep Home Assistant on staging until validated
Palo Alto firewall migration fails	Low	Critical	Thorough testing; off-hours migration; keep ESXi Host 1 available for rollback
Network misconfiguration breaks VLANs	Medium	High	Document all VLAN mappings; test each VLAN post-migration
Data loss during VM export/import	Very Low	Critical	Backup all VMs before migration; verify checksums
Cluster split-brain	Low	Medium	3 nodes provide quorum; monitor cluster health
Storage performance degradation	Low	Medium	Benchmark ZFS before/after; tune ZFS ARC if needed

Rollback Plans

Phase 2 Rollback (VMs on Proxmox Host 2)

If any VM migration fails: Keep ESXi Host 1 running
If Plex iGPU fails: Revert Plex to ESXi Host 1
If Palo Alto fails: Emergency reboot of ESXi Host 1 to restore networking

Phase 6 Rollback (Home Assistant/Frigate)

If Coral TPU fails on Node 2: Migrate VM back to staging server
Fallback: Staging server remains operational during migration

Complete Rollback

Worst case: Both NUCs available, can reinstall ESXi if total failure
Likelihood: Very low - incremental approach minimizes risk

Timeline Estimate

Conservative Timeline (Recommended):

Week	Phase	Activities	Time
1	Phase 0	Hardware prep, backups, planning	4-6 hours
2	Phase 1	Install Proxmox on Host 2, configure networking	4-6 hours
3	Phase 2.1-2.3	Migrate iridium, Pi-hole, Docker	2-3 hours
4	Phase 2.4	Migrate Plex (iGPU passthrough)	2-4 hours
5	Phase 2.5	Migrate Palo Alto (critical - careful!)	2-3 hours
6	Phase 3	Install Proxmox on Host 1	3-4 hours
7	Phase 4	Create cluster, migrate Plex back to Node 1	2-3 hours
8	Phase 6	Migrate Home Assistant/Frigate to Node 2	3-4 hours
9	Phase 7	Final rebalancing, testing, documentation	2-3 hours

Total: ~25-36 hours over 9 weeks (comfortable pace with validation)

Fast-Track: Could compress to 3-4 weekends (~20-25 hours total) if comfortable with risk

Questions for User - Action Items

Before proceeding, please answer:

Critical Decisions

NVMe upgrade timing: Order 2TB drives now and install before migration? (RECOMMENDED: Yes)
Storage backend: ZFS or LVM-Thin for NUC nodes? (RECOMMENDED: ZFS)
Downtime windows: What times are acceptable for Plex/Palo Alto/Home Assistant outages?

VM Clarifications

“iridium” VM: What is this for? Can it be offline temporarily?
“server-2019” VM: Old Blue Iris host? Can we delete it?
“xsoar” VM: Keep or decommission?
NFS backup (10.1.1.150): Still in use? Purpose?

Nice-to-Have

Cluster name: “homelab-cluster” or different name?
Node naming: proxmox-01/02 or different convention?
HA preferences: Do you want automatic HA failover or manual?

Next Steps

Once you answer the questions above, I can:

✅ Create detailed step-by-step runbooks for each phase
✅ Generate network configuration templates
✅ Document iGPU passthrough configuration scripts
✅ Build VM migration checklists
✅ Create validation/testing scripts
✅ Design backup strategy for Proxmox cluster

Let’s align on the critical decisions, then we can start Phase 0!