I’ve got a big (2.5TB) legacy server in an old data center that I need to move to our new data center. I have almost no knowledge about the infrastructure in our old data center and it seems that there is no working backup there (I wasn’t working here when that data center was set up and nothing is documented).
The old data center is running a vCenter 5.5.0 with 2 ESX hosts, the new data center uses vCenter 7 with 4 ESX hosts. I planned on using VEEAM Backup to move the server from the old data center to our new one (back up from old data center, restore on new data center). I am using VEEAM because the vCenter in the old data center is so old, that I have trouble getting the Standalone converter to both read from the old data center and copy to the new one (SSL issues for example).
I’ve got a 1GBit connection between the 2 data centers. On a weekend I attempted to start a backup while the server was running in the hopes that I could do incremental backups before finally migrating the server to the new data center. That approach failed horribly.
During the VEEAM backup job the legacy server stopped responding. Upon further inspection I noticed that the HA cluster, for whatever reason, decided to do a failover to the other ESX node. After a few minutes, it switched back to the other ESX again. It kept doing that, because the disk was now corrupt and the server kept freezing during the boot procedure. I am still not sure what caused the corruption, maybe there was an pre-exsting condition with the VMWare disk files that caused the corruption.
It took us quite a while to figure out what needed to be done to get the server back running again, but once the server was running again we had the following disk files lying around:
What catched my eye is that despite the fact that vCenter doesn’t show any more snapshots, there is still a SERVERNAME_2-000001.vmdk there, suggesting a snapshot of disk SERVERNAME_2. But that is not the case, it is the actual disk of the operating system whereas SERVERNAME_2.vmdk is the disk of an application data partition. It is actually referenced in the VMX file:
scsi0.virtualDev = “lsilogic”
scsi0.present = “TRUE”
scsi0:0.deviceType = “scsi-hardDisk”
scsi0:0.fileName = “SERVERNAME_2-000001.vmdk”
scsi0:0.present = “TRUE”
scsi0:0.redo = “”
scsi0.pciSlotNumber = “16”
scsi0:1.deviceType = “scsi-hardDisk”
scsi0:1.fileName = “SERVERNAME.vmdk”
scsi0:1.ctkEnabled = “TRUE”
scsi0:1.present = “TRUE”
scsi0:1.redo = “”
sched.scsi0:1.throughputCap = “off”
sched.scsi0:1.shares = “normal”
scsi0:2.deviceType = “scsi-hardDisk”
scsi0:2.fileName = “SERVERNAME_1.vmdk”
scsi0:2.ctkEnabled = “TRUE”
scsi0:2.present = “TRUE”
scsi0:2.redo = “”
scsi0:3.deviceType = “scsi-hardDisk”
scsi0:3.fileName = “SERVERNAME_2.vmdk”
scsi0:3.ctkEnabled = “TRUE”
scsi0:3.present = “TRUE”
scsi0:3.redo = “”
vCenter tells me that the server needs a disk consolidation, but I’m too afraid to run the consolidation, fearing that it might attempt to actually consolidate disks SERVERNAME_2-000001 and SERVERNAME_2 together and that this might have been the issue to begin with (when starting the VEEAM backup job).
I have no idea if the names were already botched up in the beginning or if that is an outcome of the failed backup job. I’ve been reading a few KB articles and some seem to suggest to create a new snapshot, and then delete it, though that seems a bit risky for me on this server, as I don’t have a backup. For the same reason, I don’t want to run the consolidate option in the snapshots menu.
How does vCenter / ESX actually detect that a consolidation is needed? Is it just based on the file names? If I were to rename SERVERNAME_2-000001 to SERVERNAME_3 and then update the VMX file, should that be working and the vCenter warning be gone?
Any help is greatly appreciated.