Description of problem: After upgrade from 4.1 to 4.2 RHV HA VMs fail to start, reporting "Invalid VM lease. Please note that it may take few minutes to create the lease." Version-Release number of selected component (if applicable): RHV 4.2.7 How reproducible: 100% at this customer site Steps to Reproduce: 1. Have HA VMs running with a VM Lease 2. Update from RHV 4.1 to 4.2 3. Try to reboot the VMs Actual results: "VMNAME: Cannot run VM. Invalid VM lease. Please note that it may take few minutes to create the lease." Expected results: VMs should start just fine after an upgrade of RHV. Additional info: An HA VM that has a VM lease should normally have a non-null lease_sd_id field in the vm_static table and a non-null lease_info field in the vm_dynamic table in the RHV database. In this case, the lease_info field in the vm_dynamic table in the RHV database was empty.
logs?
I am one of the affected customers, ovirt-log-collector and a few sos reports are attached to case 02272652
A simple workaround, until a proper fix is introduced, could be to change the storage domain that the lease is configured on for all the HA VMs with a lease that did not restart since the upgrade to 4.2 was made (i.e., those with pending configuration).
If you only have one storage domain you can also just disable the lease and re-enable it after the first configuration change has finished. For the devs: If any additional logs are needed feel free to reach out to me directly.
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both
probably related to the change early in 4.2 - https://gerrit.ovirt.org/#/c/86504/
(In reply to Michal Skrivanek from comment #11) > probably related to the change early in 4.2 - > https://gerrit.ovirt.org/#/c/86504/ I'll still be the one to blame, yet I think it is most likely a consequence of: https://gerrit.ovirt.org/#/c/79226/. In 4.1 the lease_info was not stored on the engine side. In 4.2 and above, we expect the lease_info to be stored in the database - VMs with a lease that were created in 4.1 will lack it so we should probably either recreate the lease or fetch the lease_info from the host.
*** Bug 1697313 has been marked as a duplicate of this bug. ***
sync2jira
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops
(In reply to RHV Bugzilla Automation and Verification Bot from comment #34) > WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following > reason: > > [Found non-acked flags: '{}', ] > > For more info please contact: rhv-devops: Bug status wasn't > changed from MODIFIED to ON_QA due to the following reason: > > [Found non-acked flags: '{}', ] > > For more info please contact: rhv-devops Who handles these flags?
As mentioned before, this is pm_ack and qa_ack
(In reply to Ryan Barry from comment #36) > As mentioned before, this is pm_ack and qa_ack Could we set the qa flags accordingly?
(In reply to Ryan Barry from comment #36) > As mentioned before, this is pm_ack and qa_ack Can we set the PM flags accordingly?
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops
Hi, The bug is still reproducible. I've installed the baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.1/rhv-4.1.11-3/ (rhel-7.5 ) , created and run the HA VM with lease. Then update the RHV to http://bob-dr.lab.eng.brq.redhat.com/builds/4.2/rhv-4.2.13-2 (rhel-7.6). The VMs are up. but after shutdown such VM could not be started. 2020-02-24 13:03:57,014+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM vm_4.1_with_lease due to a failed validation: [Cannot run VM. Invalid VM lease. Please note that it may take few minutes to create the lease.] (User: admin@internal-authz). 2020-02-24 13:03:57,014+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] Validation of action 'RunVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE The setup is available now. Please let me know if you would like to see.
(In reply to Polina from comment #45) > Hi, > > The bug is still reproducible. > > I've installed the > baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.1/rhv-4.1.11-3/ (rhel-7.5 > ) , created and run the HA VM with lease. > Then update the RHV to > http://bob-dr.lab.eng.brq.redhat.com/builds/4.2/rhv-4.2.13-2 (rhel-7.6). > The VMs are up. but after shutdown such VM could not be started. > > 2020-02-24 13:03:57,014+02 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] EVENT_ID: > USER_FAILED_RUN_VM(54), Failed to run VM vm_4.1_with_lease due to a failed > validation: [Cannot run VM. Invalid VM lease. Please note that it may take > few minutes to create the lease.] (User: admin@internal-authz). > 2020-02-24 13:03:57,014+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] > (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] Validation of action > 'RunVm' failed for user admin@internal-authz. Reasons: > VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE > > The setup is available now. Please let me know if you would like to see. I checked Polina's environment. She is testing with version 4.2.8.9-0.1.el7ev, but we did not back port the fix to 4.2. The fix was back ported to Branch 4.3 [1] Please retest the upgrade with version 4.3.5 or more Thank you. [1] https://gerrit.ovirt.org/#/c/100549/
(In reply to Steven Rosenberg from comment #46) > (In reply to Polina from comment #45) > > Hi, > > > > The bug is still reproducible. > > > > I've installed the > > baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.1/rhv-4.1.11-3/ (rhel-7.5 > > ) , created and run the HA VM with lease. > > Then update the RHV to > > http://bob-dr.lab.eng.brq.redhat.com/builds/4.2/rhv-4.2.13-2 (rhel-7.6). > > The VMs are up. but after shutdown such VM could not be started. > > > > 2020-02-24 13:03:57,014+02 ERROR > > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > > (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] EVENT_ID: > > USER_FAILED_RUN_VM(54), Failed to run VM vm_4.1_with_lease due to a failed > > validation: [Cannot run VM. Invalid VM lease. Please note that it may take > > few minutes to create the lease.] (User: admin@internal-authz). > > 2020-02-24 13:03:57,014+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] > > (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] Validation of action > > 'RunVm' failed for user admin@internal-authz. Reasons: > > VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE > > > > The setup is available now. Please let me know if you would like to see. > > I checked Polina's environment. She is testing with version > 4.2.8.9-0.1.el7ev, but we did not back port the fix to 4.2. The fix was back > ported to Branch 4.3 [1] > > Please retest the upgrade with version 4.3.5 or more > > Thank you. > > [1] https://gerrit.ovirt.org/#/c/100549/
after upgrade from 4.1 to 4.3 the HA VMs with leases are started and rebooted successfully. Such an upgrade is performed through the 4.2, like 4.1 -> 4.2 -> 4.3. So, for the VMs that were created in 4.1 and rebooted in 4.3, everything is correct. If the VMs were rebooted in the middle on 4.2 and failed there, such VMs could not be started also in 4.3. Please confirm that this verifies the fix.
(In reply to Polina from comment #48) > after upgrade from 4.1 to 4.3 the HA VMs with leases are started and > rebooted successfully. > Such an upgrade is performed through the 4.2, like 4.1 -> 4.2 -> 4.3. > So, for the VMs that were created in 4.1 and rebooted in 4.3, everything is > correct. > If the VMs were rebooted in the middle on 4.2 and failed there, such VMs > could not be started also in 4.3. > > Please confirm that this verifies the fix. Yes VMs upgraded from 4.1 to 4.2 will fail in 4.2. Only when we upgrade further to 4.3 will the engine be backward compatible. So yes this verifies the functionality.
Keeping this summary here: Cause: When the lease info data was moved from the VM Static to VM Dynamic DB table, there was no consideration that upgrading from 4.1 to later versions would leave the lease info data empty when a lease Storage Domain ID had been specified. This caused the validation to fail when the VM is launched so that the VM can no longer run without the user resetting the lease Storage Domain ID. Consequence: The HA VMs with lease Storage IDs fail to execute when run. Fix: By removing the validation and adding automatic regeneration of the lease information data when the lease Storage Domain ID is set, by the lease information data is missing, the VM now has the lease information data it needs to run.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246