Bug 1659574
| Summary: | Highly Available (HA) VMs with a VM lease failed to start after a 4.1 to 4.3 upgrade. | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Frank DeLorey <fdelorey> | |
| Component: | vdsm | Assignee: | Steven Rosenberg <srosenbe> | |
| Status: | CLOSED ERRATA | QA Contact: | Polina <pagranat> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.2.7 | CC: | ahadas, fdelorey, jortialc, klaas, lleistne, lrotenbe, lsurette, mavital, michal.skrivanek, mjankula, mkalinin, mtessun, pagranat, pvilayat, srevivo, srosenbe, ycui | |
| Target Milestone: | ovirt-4.4.0 | Keywords: | ZStream | |
| Target Release: | 4.3.0 | Flags: | lsvaty:
testing_plan_complete-
|
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Previously, after upgrading RHV 4.1 to a later version, high-availability virtual machines (HA VMs) failed validation and did not run. To run the VMs, the user had to reset the lease Storage Domain ID. The current release fixes this issue: It removes the validation and regenerates the lease information data when the lease Storage Domain ID is set. After upgrading RHV 4.1, HA VMs with lease Storage Domain IDs run.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1716951 (view as bug list) | Environment: | ||
| Last Closed: | 2020-08-04 13:26:25 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1703275, 1716951 | |||
|
Description
Frank DeLorey
2018-12-14 17:09:59 UTC
logs? I am one of the affected customers, ovirt-log-collector and a few sos reports are attached to case 02272652 A simple workaround, until a proper fix is introduced, could be to change the storage domain that the lease is configured on for all the HA VMs with a lease that did not restart since the upgrade to 4.2 was made (i.e., those with pending configuration). If you only have one storage domain you can also just disable the lease and re-enable it after the first configuration change has finished. For the devs: If any additional logs are needed feel free to reach out to me directly. Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both probably related to the change early in 4.2 - https://gerrit.ovirt.org/#/c/86504/ (In reply to Michal Skrivanek from comment #11) > probably related to the change early in 4.2 - > https://gerrit.ovirt.org/#/c/86504/ I'll still be the one to blame, yet I think it is most likely a consequence of: https://gerrit.ovirt.org/#/c/79226/. In 4.1 the lease_info was not stored on the engine side. In 4.2 and above, we expect the lease_info to be stored in the database - VMs with a lease that were created in 4.1 will lack it so we should probably either recreate the lease or fetch the lease_info from the host. *** Bug 1697313 has been marked as a duplicate of this bug. *** sync2jira sync2jira WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
(In reply to RHV Bugzilla Automation and Verification Bot from comment #34) > WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following > reason: > > [Found non-acked flags: '{}', ] > > For more info please contact: rhv-devops: Bug status wasn't > changed from MODIFIED to ON_QA due to the following reason: > > [Found non-acked flags: '{}', ] > > For more info please contact: rhv-devops Who handles these flags? As mentioned before, this is pm_ack and qa_ack (In reply to Ryan Barry from comment #36) > As mentioned before, this is pm_ack and qa_ack Could we set the qa flags accordingly? (In reply to Ryan Barry from comment #36) > As mentioned before, this is pm_ack and qa_ack Can we set the PM flags accordingly? WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{}', ]
For more info please contact: rhv-devops
Hi, The bug is still reproducible. I've installed the baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.1/rhv-4.1.11-3/ (rhel-7.5 ) , created and run the HA VM with lease. Then update the RHV to http://bob-dr.lab.eng.brq.redhat.com/builds/4.2/rhv-4.2.13-2 (rhel-7.6). The VMs are up. but after shutdown such VM could not be started. 2020-02-24 13:03:57,014+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM vm_4.1_with_lease due to a failed validation: [Cannot run VM. Invalid VM lease. Please note that it may take few minutes to create the lease.] (User: admin@internal-authz). 2020-02-24 13:03:57,014+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] Validation of action 'RunVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE The setup is available now. Please let me know if you would like to see. (In reply to Polina from comment #45) > Hi, > > The bug is still reproducible. > > I've installed the > baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.1/rhv-4.1.11-3/ (rhel-7.5 > ) , created and run the HA VM with lease. > Then update the RHV to > http://bob-dr.lab.eng.brq.redhat.com/builds/4.2/rhv-4.2.13-2 (rhel-7.6). > The VMs are up. but after shutdown such VM could not be started. > > 2020-02-24 13:03:57,014+02 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] EVENT_ID: > USER_FAILED_RUN_VM(54), Failed to run VM vm_4.1_with_lease due to a failed > validation: [Cannot run VM. Invalid VM lease. Please note that it may take > few minutes to create the lease.] (User: admin@internal-authz). > 2020-02-24 13:03:57,014+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] > (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] Validation of action > 'RunVm' failed for user admin@internal-authz. Reasons: > VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE > > The setup is available now. Please let me know if you would like to see. I checked Polina's environment. She is testing with version 4.2.8.9-0.1.el7ev, but we did not back port the fix to 4.2. The fix was back ported to Branch 4.3 [1] Please retest the upgrade with version 4.3.5 or more Thank you. [1] https://gerrit.ovirt.org/#/c/100549/ (In reply to Steven Rosenberg from comment #46) > (In reply to Polina from comment #45) > > Hi, > > > > The bug is still reproducible. > > > > I've installed the > > baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.1/rhv-4.1.11-3/ (rhel-7.5 > > ) , created and run the HA VM with lease. > > Then update the RHV to > > http://bob-dr.lab.eng.brq.redhat.com/builds/4.2/rhv-4.2.13-2 (rhel-7.6). > > The VMs are up. but after shutdown such VM could not be started. > > > > 2020-02-24 13:03:57,014+02 ERROR > > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > > (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] EVENT_ID: > > USER_FAILED_RUN_VM(54), Failed to run VM vm_4.1_with_lease due to a failed > > validation: [Cannot run VM. Invalid VM lease. Please note that it may take > > few minutes to create the lease.] (User: admin@internal-authz). > > 2020-02-24 13:03:57,014+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] > > (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] Validation of action > > 'RunVm' failed for user admin@internal-authz. Reasons: > > VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE > > > > The setup is available now. Please let me know if you would like to see. > > I checked Polina's environment. She is testing with version > 4.2.8.9-0.1.el7ev, but we did not back port the fix to 4.2. The fix was back > ported to Branch 4.3 [1] > > Please retest the upgrade with version 4.3.5 or more > > Thank you. > > [1] https://gerrit.ovirt.org/#/c/100549/ after upgrade from 4.1 to 4.3 the HA VMs with leases are started and rebooted successfully. Such an upgrade is performed through the 4.2, like 4.1 -> 4.2 -> 4.3. So, for the VMs that were created in 4.1 and rebooted in 4.3, everything is correct. If the VMs were rebooted in the middle on 4.2 and failed there, such VMs could not be started also in 4.3. Please confirm that this verifies the fix. (In reply to Polina from comment #48) > after upgrade from 4.1 to 4.3 the HA VMs with leases are started and > rebooted successfully. > Such an upgrade is performed through the 4.2, like 4.1 -> 4.2 -> 4.3. > So, for the VMs that were created in 4.1 and rebooted in 4.3, everything is > correct. > If the VMs were rebooted in the middle on 4.2 and failed there, such VMs > could not be started also in 4.3. > > Please confirm that this verifies the fix. Yes VMs upgraded from 4.1 to 4.2 will fail in 4.2. Only when we upgrade further to 4.3 will the engine be backward compatible. So yes this verifies the functionality. Keeping this summary here: Cause: When the lease info data was moved from the VM Static to VM Dynamic DB table, there was no consideration that upgrading from 4.1 to later versions would leave the lease info data empty when a lease Storage Domain ID had been specified. This caused the validation to fail when the VM is launched so that the VM can no longer run without the user resetting the lease Storage Domain ID. Consequence: The HA VMs with lease Storage IDs fail to execute when run. Fix: By removing the validation and adding automatic regeneration of the lease information data when the lease Storage Domain ID is set, by the lease information data is missing, the VM now has the lease information data it needs to run. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246 |