Hi,
i installed im my lab with a vSphere 7.0.1 enviroment the NetApp VSC 9.7.1 wich brings the VASA provider for VVOLs to thest them on my NetApp Storage connected via NFS.
Installation works without issues and even the migration from the NFS 4.1 datastores to the VVOL datastore.
Then i noticed a problem with the VCSA: VCSA on a VVOL? Is this supported?
After a little bit more testing i noticed that this is a general vMotion problem when the VM is on a VVOL datastore. When the same VM is on a FS 4.1 datastore i have no issue.
I opened a case first at NetApp, but they didin’t found something in the logs and told me to open a case at vmware too. There i opened one week ago a case and provided all logs. But they took a very long time to decide which department should work on the case. Yesterday i talked first time with the support and today we finished the tests to be clear, yes the problem is only during a normal vMotion and only when the VM is on a VVOL. So he will forward the case to the storage guys.
While i’m waiting for the support, i think i write down here my issue perhaps someone esle had a similar problem…
Problem:
vMotion from one “host” to a other “host” stuck at 85% and then VM freezes for about 30 sec, then the vMotion continues and finishes. Then the VM is running again.
Here i have some log parts:
Log of the VM from source-host:
2020-10-14T15:52:14.200Z| vmx| W003: VMX has left the building: 0.
VMKernel from source-host:
2020-10-14T15:52:14.251Z cpu4:2105329)VVol: VVolRemoveDev:7163: Unlinking (VVOL_OBJTYPE_VMDK) VVol device rfc4122.80207299-548e-459c-bc0c-4d45318cfae2
2020-10-14T15:52:14.332Z cpu18:2099869)VVol: VVolRemoveDev:7163: Unlinking (VVOL_OBJTYPE_CONFIG) VVol device rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0
The VM has left the host at 17:52:14, so it must be started in the same sec on the destination…
Log of the VM from the destination-host:
2020-10-14T15:52:14.190Z| vcpu-0| I005: Transitioned vmx/execState/val to poweredOn
2020-10-14T15:52:14.191Z| vcpu-0| I005: MigrateSetState: Transitioning from state 12 to 0.
2020-10-14T15:52:54.205Z| vmx| I005: DiskUpgradeMultiwriter: Upgraded open disk ‘scsi0:0’ from multiwriter.
Here is a large gap between the sec 14 and 54 in the log, there is no message.
VMKernel from the destination-host:
2020-10-14T15:52:12.956Z cpu3:2103898)VVol: VVolMakeDev:6740: Creating a device for rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0 (Type VVOL_OBJTYPE_CONFIG)
2020-10-14T15:52:13.264Z cpu16:2103911)VVol: VVolMakeDev:6740: Creating a device for rfc4122.80207299-548e-459c-bc0c-4d45318cfae2 (Type VVOL_OBJTYPE_UNKNOWN)
2020-10-14T15:52:14.190Z cpu25:2103920)Hbr: 3731: Migration end received (worldID=2103906) (migrateType=1) (event=1) (isSource=0) (sharedConfig=1)
2020-10-14T15:52:14.191Z cpu8:2103915)VMotion: 3230: 8288837917254555216 D: VMotion bandwidth in last 1s: 27 MB/s,
2020-10-14T15:52:14.194Z cpu3:2103923)Swap: vm 2103906: 5135: Finish swapping in migration swap file. (faulted 0 pages). Success.
2020-10-14T15:52:44.200Z cpu25:2103905)NFSLock: 3302: lock .lck-1c7bdce900000000 expired: counter prev 584 3fc5805f-1e9c2009-3763-ac1f6bc58788 : curr 584 3fc5805f-1e9c2009-3763-ac1f6bc58788 (loop count 3)
This message i’m wondering about…
Hostd from the destination-host:
2020-10-14T15:52:13.138Z verbose hostd[2099792] [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] VMotion destination started; powering on
2020-10-14T15:52:13.213Z info hostd[2100209] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] VigorMigrateNotifyCb:: hostlog state changed from emigrating to none
2020-10-14T15:52:54.219Z verbose hostd[2100094] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] VMotionStatusCb [8288837917254555216]: Succeeded
2020-10-14T15:52:54.219Z verbose hostd[2100094] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] VMotionStatusCb: Firing ResolveCb
2020-10-14T15:52:54.219Z info hostd[2100094] [Originator@6876 sub=Vcsvc.VMotionDst.8288837917254555216] ResolveCb: VMX reports needsUnregister = false for migrateType MIGRATE_TYPE_VMOTION
2020-10-14T15:52:54.219Z info hostd[2100094] [Originator@6876 sub=Vcsvc.VMotionDst.8288837917254555216] ResolveCb: Succeeded
2020-10-14T15:52:54.220Z info hostd[2100094] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] Disk access enabled.
2020-10-14T15:52:54.221Z info hostd[2100094] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] State Transition (VM_STATE_IMMIGRATING -> VM_STATE_ON)
2020-10-14T15:52:54.225Z info hostd[2100094] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] Send config update invoked
Here the same gap. Here i’m wondering abut the message „Disk access enabled“ in the sec 54, why so late?
The main question, what happens between the sec 14 und 54 and how to fix that?
Kind regards
Stefan