We had a host in a 3-node vSAN cluster (VC is 7.0.0d and all hosts are at 7.0b) that quit responding (while the VMs continued to run). After some troubleshooting, we got it back up, but I moved off the VMs and rebooted it just for good measure. After a reboot, the host came back up, but now vSAN is having issues…
Specifically, vSAN Health is reporting that “EPD Status” under vSAN Daemon Liveness is “Abnormal”. Everything else is reporting as “Healthy”.
When SSHed into the host, I get the following:
Running /etc/init.d/epd status gives a response of:
epd is not running
CLOMD is running, however.
If I try starting epd using /etc/init.d/epd start, I get:
INIT: EPD uses a ramdisk for the db file
INIT: No persistent storage found to backup the DB into.
Thinking one of my disks might be full, I tried checking the disk space. Very oddly, running df -h (or, really, any variant of the df command) gives:
VmFileSystem: Slow refresh failed: Cannot open volume: /vmfs/volumes/5efa1a50-890ef4b7-dce3-001b21baacac
Error when running esxcli, return status was: 1
Error getting data for filesystem on ‘/vmfs/volumes/5efa1a50-890ef4b7-dce3-001b21baacac’: Cannot open volume: /vmfs/volumes/5efa1a50-890ef4b7-dce3-001b21baacac, skipping.
I never actually get to see the free space of the physical disks.
Running vdf -h gives a very long output (attached), but I notice no scratch partition.
Any thoughts? Is epd as a separate service a new thing with vSphere 7? If so, that’s likely why I cannot find a ton of troubleshooting information via Google.