Created attachment 1358114 [details] engin.log and screen shot Description of problem: Sometimes created situation that HA VM with lease could not be started (screen-shot and engine.log are attached) Version-Release number of selected component (if applicable): rhvm-4.2.0-0.5.master.el7.noarch How reproducible:80% Steps to Reproduce: 1.Create VM with scsi disk . 2.Check HA with lease scsi_0. Wait till the task is completed 3.Then try to run VM Actual results: Error: Cannot run VM. Invalid VM lease. Please note that it may take few minutes to create the lease. Expected results: VM must be run Additional info: in engine.log(attached) we see : START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='76185518-a843-4ea5-83cf-3758068a241d'}), log id: 265fc1b6 2017-11-23 11:06:36,211+02 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-20) [d766f754-ffa6-4ffa-9f0c-46d12c7c5524] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 265fc1b6 2017-11-23 11:06:36,282+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] (default task-20) [d766f754-ffa6-4ffa-9f0c-46d12c7c5524] Validation of action 'RunVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE 2017-11-23 11:06:36,284+02 INFO [org.ovirt.engine.core.bll.RunVmCommand] (default task-20) [d766f754-ffa6-4ffa-9f0c-46d12c7c5524] Lock freed to object 'EngineLock:{exclusiveLocks='[76185518-a843-4ea5-83cf-3758068a241d=VM]', sharedLocks=''}'
Tal, anything to improve around lease allocation?
About the actual creation? I really doubt, it should be a quick operation, the task polling takes the most of the time I guess
so why does the message say "Please note that it may take few minutes to create the lease." :) Without a way how to detect the lease is still being created it is problematic to not lock the VM. I suppose it's a similar situation as with disks, but there you have ImageLocked state you can check for individual disks, here you have nothing to look at If it is indeed quick we can just lock the VM for the duration, or provide a different way how to check if lease is ready?
The bug is not about the timing. sometimes after adding the lease the VM could not be started ever. just now I've reproduced this behavior: 1. VM is up. add HA, nfs_0 lease . Stop VM 2. Try to run - got the messages that it takes the time to add the lease . but actually there are no tasks in progress . this VM will not be run no matter how long to wait. I learned that there is a command to see the leases on host , and actually I can't see this added lease there : sanlock client status daemon 13448199-e9e0-4368-8763-6a2df3fedc9c.cougar05.s p -1 helper p -1 listener p -1 status s 19cc5ef8-f4ea-463a-b6fd-025c624dfbbf:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Compute__NFS_GE_compute-ge-3_nfs__1/19cc5ef8-f4ea-463a-b6fd-025c624dfbbf/dom_md/ids:0 s 26d8f98f-ff9c-444e-bb9c-b60c84b5dc10:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_QE__images_GEs_GE__compute__3/26d8f98f-ff9c-444e-bb9c-b60c84b5dc10/dom_md/ids:0 s ee3ad6ec-c3f6-4801-997a-b8027d99837b:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Compute__NFS_GE_compute-ge-3_nfs__0/ee3ad6ec-c3f6-4801-997a-b8027d99837b/dom_md/ids:0 s c0d9097d-09a3-476f-9209-8853ba1205e9:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Compute__NFS_GE_compute-ge-3_nfs__2/c0d9097d-09a3-476f-9209-8853ba1205e9/dom_md/ids:0 s 585d4e5d-2c4d-4c3a-9683-50db5d87a4cd:2:/dev/585d4e5d-2c4d-4c3a-9683-50db5d87a4cd/ids:0 s 946c3c7b-5175-4833-9419-a3eb124c6171:2:/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com\:_virt__local__ge1__volume__0/946c3c7b-5175-4833-9419-a3eb124c6171/dom_md/ids:0 From engine log: 2017-11-30 10:51:43,597+02 INFO [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-3) [dfc758d7-3235-4c65-8d54-0ccb33766ad7] Lock Acquired to object 'EngineLock:{exclusiveLocks='[vm_from_templ1=VM_NAME]', sharedLocks='[76185518-a843-4ea5-83cf-3758068a241d=VM]'}' 2017-11-30 10:51:43,679+02 INFO [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-3) [dfc758d7-3235-4c65-8d54-0ccb33766ad7] Running command: UpdateVmCommand internal: false. Entities affected : ID: 76185518-a843-4ea5-83cf-3758068a241d Type: VMAction group EDIT_VM_PROPERTIES with role type USER 2017-11-30 10:51:43,785+02 INFO [org.ovirt.engine.core.bll.UpdateRngDeviceCommand] (default task-3) [7789d8f0] Running command: UpdateRngDeviceCommand internal: true. Entities affected : ID: 76185518-a843-4ea5-83cf-3758068a241d Type: VMAction group EDIT_VM_PROPERTIES with role type USER 2017-11-30 10:51:43,998+02 INFO [org.ovirt.engine.core.bll.UpdateGraphicsDeviceCommand] (default task-3) [53ef1c0c] Running command: UpdateGraphicsDeviceCommand internal: true. Entities affected : ID: 76185518-a843-4ea5-83cf-3758068a241d Type: VMAction group EDIT_VM_PROPERTIES with role type USER 2017-11-30 10:51:44,031+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3) [53ef1c0c] EVENT_ID: USER_UPDATE_VM(35), VM vm_from_templ1 configuration was updated by admin@internal-authz. 2017-11-30 10:51:44,039+02 INFO [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-3) [53ef1c0c] Lock freed to object 'EngineLock:{exclusiveLocks='[vm_from_templ1=VM_NAME]', sharedLocks='[76185518-a843-4ea5-83cf-3758068a241d=VM]'}'
The root cause for this bug and for https://bugzilla.redhat.com/show_bug.cgi?id=1507214 seems the same. The issue in both of the bugs is that when starting a VM update and switching the VM status or the status is not UP / DOWN, the creation of the lease does not take place while the engine set the selected storage domain as the lease holder. Then, when trying to run the VM, there is a validation that check if the lease info that should be initialize at the end of the AddVmLease command is not null. This validation fails and the error which present is that the lease is invalid and the VM cannot run. So I suggest to set the bug as dependent on / duplication of bug https://bugzilla.redhat.com/show_bug.cgi?id=1507214.
Polina, did you stop the VM right after the VM update?
Hi Eyal, here is the scenario where it is reproducible quite easily: 1. run VM , get Edit window, check HA and choose lease nfs_0 (one of three nfs leases nfs_0, nfs_1, nfs_2). As result you get "Pending VM changes dialog", Ok in it. 2. Power off the VM , Run again - the VM never will be ran. You've got window: Error while executing action: golden_env_mixed_virtio_2: Cannot run VM. Invalid VM lease. Please note that it may take few minutes to create the lease.
verified in ovirt-engine-4.2.1.1-0.1.el7.noarch
This bugzilla is included in oVirt 4.1.9 release, published on Jan 24th 2018. Since the problem described in this bug report should be resolved in oVirt 4.1.9 release, published on Jan 24th 2018, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.