Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1949475

Summary: If pivot failed during live merge, top volume is left illegal, requires manual fix if vm is stopped
Product: [oVirt] vdsm Reporter: Nir Soffer <nsoffer>
Component: GeneralAssignee: Roman Bednář <rbednar>
Status: CLOSED CURRENTRELEASE QA Contact: sshmulev
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.40.60.3CC: ahadas, bcholler, bugs, eshames, gveitmic, pkrempa, rbednar, sfishbai
Target Milestone: ovirt-4.5.0Keywords: ZStream
Target Release: 4.50.0.3Flags: pm-rhel: ovirt-4.5?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.50.0.3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-20 06:33:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nir Soffer 2021-04-14 11:39:40 UTC
Description of problem:

During live merge, when libvirt reports that the block commit block job is 
read for pivot, vdsm change the top volume to ILLEGAL before trying to pivot
to the base volume.

Changing the top volume to ILLEGAL is required to avoid data corruption in
case the pivot was successful, but vdsm was killed before it could update
metadata on storage. After successful pivot, the VM is using the base volume
instead of the top volume, and new data may be written to the base volume.
If you start the VM from the top volume, filesystem on the top volume is
likely to be corrupted.

However if the pivot failed (for example bug 1945635), and the VM is stopped
starting the VM again will fail, and require manual fix of the top volume
metadata. This is likely to lead to downtime and require support.

When pivot failed, we know that the VM is still using the top volume, so 
there is no reason to keep the top volume as ILLEGAL.

Change pivot flow to restore the top volume legal state.

If pivot failed:
- Get the current chain from libvirt
- If the top volume is still in the chain, set top volume to LEGAL. 

If storage becomes inaccessible at this point restoring the volume LEGAL
state will fail. The volume will be fixed on the next pivot attempt.

Comment 1 Evelina Shames 2021-07-19 06:16:37 UTC
Hi Roman/Nir, pls provide a clear verification flow.
Thanks.

Comment 2 Nir Soffer 2021-07-26 09:55:16 UTC
I'm not sure we have a way to reproduce this issue. This happened in the
past due to a bug in libvirt, and since the bug was fixed it should never
happen.

Simulating this in real system requires a way to inject errors in libvirt
or qemu. Peter, do we have such capability?

Comment 3 Peter Krempa 2021-07-26 10:09:30 UTC
No in this instance it's not possible to simulate the outcome there was due to the bug. The issue was that the job was completed properly, but then libvirt emitted the wrong state afterwards, so our APIs can't simulate that since the bug is now fixed.

Comment 5 sshmulev 2022-03-06 09:25:21 UTC
Verified with automation regression tests of tier1-3 related with Live merge.
Didn't detect any failures in the live merge tests related to this bug.

Versions:
vdsm	vdsm-4.50.0.5-1.el8ev.x86_64
ovirt-engine	ovirt-engine-4.5.0-582.gd548206.185.el8ev.noarch
libvirt	libvirt-8.0.0-2.module+el8.6.0+14025+ca131e0a.x86_64

Comment 7 Sandro Bonazzola 2022-04-20 06:33:59 UTC
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.