Bug 1659574 - Highly Available (HA) VMs with a VM lease failed to start after a 4.1 to 4.3 upgrade.
Summary: Highly Available (HA) VMs with a VM lease failed to start after a 4.1 to 4.3 ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.2.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.4.0
: 4.3.0
Assignee: Steven Rosenberg
QA Contact: Polina
URL:
Whiteboard:
: 1697313 (view as bug list)
Depends On:
Blocks: gss_rhv_4_3_4 1716951
TreeView+ depends on / blocked
 
Reported: 2018-12-14 17:09 UTC by Frank DeLorey
Modified: 2022-03-13 16:31 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, after upgrading RHV 4.1 to a later version, high-availability virtual machines (HA VMs) failed validation and did not run. To run the VMs, the user had to reset the lease Storage Domain ID. The current release fixes this issue: It removes the validation and regenerates the lease information data when the lease Storage Domain ID is set. After upgrading RHV 4.1, HA VMs with lease Storage Domain IDs run.
Clone Of:
: 1716951 (view as bug list)
Environment:
Last Closed: 2020-08-04 13:26:25 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-36746 0 None None None 2021-09-09 15:33:13 UTC
Red Hat Knowledge Base (Solution) 3487811 0 None None None 2018-12-14 17:09:58 UTC
Red Hat Product Errata RHEA-2020:3246 0 None None None 2020-08-04 13:26:51 UTC
oVirt gerrit 100409 0 'None' MERGED engine: Upgrade HA VM lease failure during Launch 2021-01-05 10:22:04 UTC
oVirt gerrit 100549 0 'None' MERGED engine: Upgrade HA VM lease failure during Launch 2021-01-05 10:22:43 UTC

Description Frank DeLorey 2018-12-14 17:09:59 UTC
Description of problem:

After upgrade from 4.1 to 4.2 RHV HA VMs fail to start, reporting "Invalid VM lease. Please note that it may take few minutes to create the lease."

Version-Release number of selected component (if applicable):

RHV 4.2.7

How reproducible:

100% at this customer site

Steps to Reproduce:
1. Have HA VMs running with a VM Lease
2. Update from RHV 4.1 to 4.2
3. Try to reboot the VMs

Actual results:

"VMNAME: Cannot run VM. Invalid VM lease. Please note that it may take few minutes to create the lease."


Expected results:

VMs should start just fine after an upgrade of RHV.


Additional info:



An HA VM that has a VM lease should normally have a non-null lease_sd_id field in the vm_static table and a non-null lease_info field in the vm_dynamic table in the RHV database.

In this case, the lease_info field in the vm_dynamic table in the RHV database was empty.

Comment 3 Michal Skrivanek 2018-12-16 06:39:34 UTC
logs?

Comment 4 Klaas Demter 2018-12-16 10:41:39 UTC
I am one of the affected customers, ovirt-log-collector and a few sos reports are attached to case 02272652

Comment 5 Arik 2018-12-16 15:59:15 UTC
A simple workaround, until a proper fix is introduced, could be to change the storage domain that the lease is configured on for all the HA VMs with a lease that did not restart since the upgrade to 4.2 was made (i.e., those with pending configuration).

Comment 6 Klaas Demter 2018-12-17 06:43:50 UTC
If you only have one storage domain you can also just disable the lease and re-enable it after the first configuration change has finished.

For the devs: If any additional logs are needed feel free to reach out to me directly.

Comment 7 Ryan Barry 2019-01-21 14:54:06 UTC
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both

Comment 11 Michal Skrivanek 2019-03-26 12:52:26 UTC
probably related to the change early in 4.2 - https://gerrit.ovirt.org/#/c/86504/

Comment 12 Arik 2019-03-26 13:24:37 UTC
(In reply to Michal Skrivanek from comment #11)
> probably related to the change early in 4.2 -
> https://gerrit.ovirt.org/#/c/86504/

I'll still be the one to blame, yet I think it is most likely a consequence of: https://gerrit.ovirt.org/#/c/79226/.
In 4.1 the lease_info was not stored on the engine side.
In 4.2 and above, we expect the lease_info to be stored in the database - VMs with a lease that were created in 4.1 will lack it so we should probably either recreate the lease or fetch the lease_info from the host.

Comment 13 Ryan Barry 2019-04-08 17:34:23 UTC
*** Bug 1697313 has been marked as a duplicate of this bug. ***

Comment 24 Daniel Gur 2019-08-28 13:12:50 UTC
sync2jira

Comment 25 Daniel Gur 2019-08-28 13:17:02 UTC
sync2jira

Comment 26 RHV bug bot 2019-10-22 17:25:49 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 27 RHV bug bot 2019-10-22 17:39:40 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 28 RHV bug bot 2019-10-22 17:46:51 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 29 RHV bug bot 2019-10-22 18:02:40 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 33 RHV bug bot 2019-11-19 11:53:05 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 34 RHV bug bot 2019-11-19 12:03:05 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 35 Steven Rosenberg 2019-11-20 13:49:16 UTC
(In reply to RHV Bugzilla Automation and Verification Bot from comment #34)
> WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following
> reason:
> 
> [Found non-acked flags: '{}', ]
> 
> For more info please contact: rhv-devops: Bug status wasn't
> changed from MODIFIED to ON_QA due to the following reason:
> 
> [Found non-acked flags: '{}', ]
> 
> For more info please contact: rhv-devops

Who handles these flags?

Comment 36 Ryan Barry 2019-11-20 14:47:40 UTC
As mentioned before, this is pm_ack and qa_ack

Comment 37 Steven Rosenberg 2019-11-20 17:04:00 UTC
(In reply to Ryan Barry from comment #36)
> As mentioned before, this is pm_ack and qa_ack

Could we set the qa flags accordingly?

Comment 38 Steven Rosenberg 2019-11-21 08:20:41 UTC
(In reply to Ryan Barry from comment #36)
> As mentioned before, this is pm_ack and qa_ack

Can we set the PM flags accordingly?

Comment 40 RHV bug bot 2019-12-13 13:17:35 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 41 RHV bug bot 2019-12-20 17:46:44 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 42 RHV bug bot 2020-01-08 14:50:14 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 43 RHV bug bot 2020-01-08 15:20:37 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 44 RHV bug bot 2020-01-24 19:51:57 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 45 Polina 2020-02-24 11:18:36 UTC
Hi,
 
The bug is still reproducible.

I've installed the baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.1/rhv-4.1.11-3/ (rhel-7.5 ) , created and run the HA VM with lease.
Then update the RHV to http://bob-dr.lab.eng.brq.redhat.com/builds/4.2/rhv-4.2.13-2 (rhel-7.6).
The VMs are up. but after shutdown such VM could not be started.

2020-02-24 13:03:57,014+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM vm_4.1_with_lease due to a failed validation: [Cannot run VM. Invalid VM lease. Please note that it may take few minutes to create the lease.] (User: admin@internal-authz).
2020-02-24 13:03:57,014+02 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] Validation of action 'RunVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE

The setup is available now. Please let me know if you would like to see.

Comment 46 Steven Rosenberg 2020-02-24 16:03:53 UTC
(In reply to Polina from comment #45)
> Hi,
>  
> The bug is still reproducible.
> 
> I've installed the
> baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.1/rhv-4.1.11-3/ (rhel-7.5
> ) , created and run the HA VM with lease.
> Then update the RHV to
> http://bob-dr.lab.eng.brq.redhat.com/builds/4.2/rhv-4.2.13-2 (rhel-7.6).
> The VMs are up. but after shutdown such VM could not be started.
> 
> 2020-02-24 13:03:57,014+02 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] EVENT_ID:
> USER_FAILED_RUN_VM(54), Failed to run VM vm_4.1_with_lease due to a failed
> validation: [Cannot run VM. Invalid VM lease. Please note that it may take
> few minutes to create the lease.] (User: admin@internal-authz).
> 2020-02-24 13:03:57,014+02 WARN  [org.ovirt.engine.core.bll.RunVmCommand]
> (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] Validation of action
> 'RunVm' failed for user admin@internal-authz. Reasons:
> VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE
> 
> The setup is available now. Please let me know if you would like to see.

I checked Polina's environment. She is testing with version 4.2.8.9-0.1.el7ev, but we did not back port the fix to 4.2. The fix was back ported to Branch 4.3 [1]

Please retest the upgrade with version 4.3.5 or more

Thank you.

[1] https://gerrit.ovirt.org/#/c/100549/

Comment 47 Steven Rosenberg 2020-02-24 16:05:41 UTC
(In reply to Steven Rosenberg from comment #46)
> (In reply to Polina from comment #45)
> > Hi,
> >  
> > The bug is still reproducible.
> > 
> > I've installed the
> > baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.1/rhv-4.1.11-3/ (rhel-7.5
> > ) , created and run the HA VM with lease.
> > Then update the RHV to
> > http://bob-dr.lab.eng.brq.redhat.com/builds/4.2/rhv-4.2.13-2 (rhel-7.6).
> > The VMs are up. but after shutdown such VM could not be started.
> > 
> > 2020-02-24 13:03:57,014+02 ERROR
> > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> > (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] EVENT_ID:
> > USER_FAILED_RUN_VM(54), Failed to run VM vm_4.1_with_lease due to a failed
> > validation: [Cannot run VM. Invalid VM lease. Please note that it may take
> > few minutes to create the lease.] (User: admin@internal-authz).
> > 2020-02-24 13:03:57,014+02 WARN  [org.ovirt.engine.core.bll.RunVmCommand]
> > (default task-2) [1182d179-9513-4bd4-a23d-8a2b3f14e9b0] Validation of action
> > 'RunVm' failed for user admin@internal-authz. Reasons:
> > VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE
> > 
> > The setup is available now. Please let me know if you would like to see.
> 
> I checked Polina's environment. She is testing with version
> 4.2.8.9-0.1.el7ev, but we did not back port the fix to 4.2. The fix was back
> ported to Branch 4.3 [1]
> 
> Please retest the upgrade with version 4.3.5 or more
> 
> Thank you.
> 
> [1] https://gerrit.ovirt.org/#/c/100549/

Comment 48 Polina 2020-02-26 10:45:42 UTC
after upgrade from 4.1 to 4.3 the HA VMs with leases are started and rebooted successfully. 
Such an upgrade is performed through the 4.2, like 4.1 -> 4.2 -> 4.3.
So, for the VMs that were created in 4.1 and rebooted in 4.3, everything is correct.
If the VMs were rebooted in the middle on 4.2 and failed there, such VMs could not be started also in 4.3.

Please confirm that this verifies the fix.

Comment 49 Steven Rosenberg 2020-02-26 11:53:52 UTC
(In reply to Polina from comment #48)
> after upgrade from 4.1 to 4.3 the HA VMs with leases are started and
> rebooted successfully. 
> Such an upgrade is performed through the 4.2, like 4.1 -> 4.2 -> 4.3.
> So, for the VMs that were created in 4.1 and rebooted in 4.3, everything is
> correct.
> If the VMs were rebooted in the middle on 4.2 and failed there, such VMs
> could not be started also in 4.3.
> 
> Please confirm that this verifies the fix.

Yes VMs upgraded from 4.1 to 4.2 will fail in 4.2. Only when we upgrade further to 4.3 will the engine be backward compatible.

So yes this verifies the functionality.

Comment 50 Rolfe Dlugy-Hegwer 2020-03-03 01:25:01 UTC
Keeping this summary here:

Cause: When the lease info data was moved from the VM Static to VM Dynamic DB table, there was no consideration that upgrading from 4.1 to later versions would leave the lease info data empty when a lease Storage Domain ID had been specified. This caused the validation to fail when the VM is launched so that the VM can no longer run without the user resetting the lease Storage Domain ID.

Consequence: The HA VMs with lease Storage IDs fail to execute when run.

Fix: By removing the validation and adding automatic regeneration of the lease information data when the lease Storage Domain ID is set, by the lease information data is missing, the VM now has the lease information data it needs to run.

Comment 57 errata-xmlrpc 2020-08-04 13:26:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3246


Note You need to log in before you can comment on or make changes to this bug.