Description of problem: During live merge, when libvirt reports that the block commit block job is read for pivot, vdsm change the top volume to ILLEGAL before trying to pivot to the base volume. Changing the top volume to ILLEGAL is required to avoid data corruption in case the pivot was successful, but vdsm was killed before it could update metadata on storage. After successful pivot, the VM is using the base volume instead of the top volume, and new data may be written to the base volume. If you start the VM from the top volume, filesystem on the top volume is likely to be corrupted. However if the pivot failed (for example bug 1945635), and the VM is stopped starting the VM again will fail, and require manual fix of the top volume metadata. This is likely to lead to downtime and require support. When pivot failed, we know that the VM is still using the top volume, so there is no reason to keep the top volume as ILLEGAL. Change pivot flow to restore the top volume legal state. If pivot failed: - Get the current chain from libvirt - If the top volume is still in the chain, set top volume to LEGAL. If storage becomes inaccessible at this point restoring the volume LEGAL state will fail. The volume will be fixed on the next pivot attempt.
Hi Roman/Nir, pls provide a clear verification flow. Thanks.
I'm not sure we have a way to reproduce this issue. This happened in the past due to a bug in libvirt, and since the bug was fixed it should never happen. Simulating this in real system requires a way to inject errors in libvirt or qemu. Peter, do we have such capability?
No in this instance it's not possible to simulate the outcome there was due to the bug. The issue was that the job was completed properly, but then libvirt emitted the wrong state afterwards, so our APIs can't simulate that since the bug is now fixed.
Verified with automation regression tests of tier1-3 related with Live merge. Didn't detect any failures in the live merge tests related to this bug. Versions: vdsm vdsm-4.50.0.5-1.el8ev.x86_64 ovirt-engine ovirt-engine-4.5.0-582.gd548206.185.el8ev.noarch libvirt libvirt-8.0.0-2.module+el8.6.0+14025+ca131e0a.x86_64
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.