vMotion issue when VM on a VVOL Datastore

Hi,

 

i installed im my lab with a vSphere 7.0.1 enviroment the NetApp VSC 9.7.1 wich brings the VASA provider for VVOLs to thest them on my NetApp Storage connected via NFS.

 

Installation works without issues and even the migration from the NFS 4.1 datastores to the VVOL datastore.

 

Then i noticed a problem with the VCSA: VCSA on a VVOL? Is this supported?

 

After a little bit more testing i noticed that this is a general vMotion problem when the VM is on a VVOL datastore. When the same VM is on a FS 4.1 datastore i have no issue.

 

I opened a case first at NetApp, but they didin’t found something in the logs and told me to open a case at vmware too. There i opened one week ago a case and provided all logs. But they took a very long time to decide which department should work on the case. Yesterday i talked first time with the support and today we finished the tests to be clear, yes the problem is only during a normal vMotion and only when the VM is on a VVOL. So he will forward the case to the storage guys.

 

While i’m waiting for the support, i think i write down here my issue perhaps someone esle had a similar problem…

 

Problem:

 

vMotion from one “host” to a other “host” stuck at 85% and then VM freezes for about 30 sec, then the vMotion continues and finishes. Then the VM is running again.

 

Here i have some log parts:

 

Log of the VM from source-host:

2020-10-14T15:52:14.200Z| vmx| W003: VMX has left the building: 0.

 

VMKernel from source-host:

2020-10-14T15:52:14.251Z cpu4:2105329)VVol: VVolRemoveDev:7163: Unlinking (VVOL_OBJTYPE_VMDK) VVol device rfc4122.80207299-548e-459c-bc0c-4d45318cfae2

2020-10-14T15:52:14.332Z cpu18:2099869)VVol: VVolRemoveDev:7163: Unlinking (VVOL_OBJTYPE_CONFIG) VVol device rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0

 

The VM has left the host at 17:52:14, so it must be started in the same sec on the destination…

 

Log of the VM from the destination-host:

2020-10-14T15:52:14.190Z| vcpu-0| I005: Transitioned vmx/execState/val to poweredOn

2020-10-14T15:52:14.191Z| vcpu-0| I005: MigrateSetState: Transitioning from state 12 to 0.

2020-10-14T15:52:54.205Z| vmx| I005: DiskUpgradeMultiwriter: Upgraded open disk ‘scsi0:0’ from multiwriter.

 

Here is a large gap between the sec 14 and 54 in the log, there is no message.

 

VMKernel from the destination-host:

2020-10-14T15:52:12.956Z cpu3:2103898)VVol: VVolMakeDev:6740: Creating a device for rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0 (Type VVOL_OBJTYPE_CONFIG)

2020-10-14T15:52:13.264Z cpu16:2103911)VVol: VVolMakeDev:6740: Creating a device for rfc4122.80207299-548e-459c-bc0c-4d45318cfae2 (Type VVOL_OBJTYPE_UNKNOWN)

2020-10-14T15:52:14.190Z cpu25:2103920)Hbr: 3731: Migration end received (worldID=2103906) (migrateType=1) (event=1) (isSource=0) (sharedConfig=1)

2020-10-14T15:52:14.191Z cpu8:2103915)VMotion: 3230: 8288837917254555216 D: VMotion bandwidth in last 1s: 27 MB/s,

2020-10-14T15:52:14.194Z cpu3:2103923)Swap: vm 2103906: 5135: Finish swapping in migration swap file. (faulted 0 pages). Success.

2020-10-14T15:52:44.200Z cpu25:2103905)NFSLock: 3302: lock .lck-1c7bdce900000000 expired: counter prev 584 3fc5805f-1e9c2009-3763-ac1f6bc58788 : curr 584 3fc5805f-1e9c2009-3763-ac1f6bc58788 (loop count 3)

 

This message i’m wondering about…

 

Hostd from the destination-host:

2020-10-14T15:52:13.138Z verbose hostd[2099792] [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] VMotion destination started; powering on

2020-10-14T15:52:13.213Z info hostd[2100209] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] VigorMigrateNotifyCb:: hostlog state changed from emigrating to none

2020-10-14T15:52:54.219Z verbose hostd[2100094] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] VMotionStatusCb [8288837917254555216]: Succeeded

2020-10-14T15:52:54.219Z verbose hostd[2100094] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] VMotionStatusCb: Firing ResolveCb

2020-10-14T15:52:54.219Z info hostd[2100094] [Originator@6876 sub=Vcsvc.VMotionDst.8288837917254555216] ResolveCb: VMX reports needsUnregister = false for migrateType MIGRATE_TYPE_VMOTION

2020-10-14T15:52:54.219Z info hostd[2100094] [Originator@6876 sub=Vcsvc.VMotionDst.8288837917254555216] ResolveCb: Succeeded

2020-10-14T15:52:54.220Z info hostd[2100094] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] Disk access enabled.

2020-10-14T15:52:54.221Z info hostd[2100094] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] State Transition (VM_STATE_IMMIGRATING -> VM_STATE_ON)

2020-10-14T15:52:54.225Z info hostd[2100094] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 – Web-Server.vmx] Send config update invoked

 

Here the same gap. Here i’m wondering abut the message „Disk access enabled“ in the sec 54, why so late?

 

The main question, what happens between the sec 14 und 54 and how to fix that?

 

Kind regards

Stefan

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.