Bug 2141371

Summary: Incorrect image chain when deleting an intermediate snapshot
Product: Red Hat Enterprise Virtualization Manager Reporter: Juan Orti <jortialc>
Component: vdsmAssignee: Albert Esteve <aesteve>
Status: CLOSED ERRATA QA Contact: sshmulev
Severity: high Docs Contact:
Priority: high    
Version: 4.5.2CC: aesteve, ahadas, aperotti, bcholler, dfodor, emarcus, lsurette, michal.skrivanek, mzamazal, pelauter, sbonazzo, srevivo, ycui
Target Milestone: ovirt-4.5.3-async   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.50.3.6 Doc Type: Bug Fix
Doc Text:
Previously, stale bitmaps in the base image during a cold or live internal merge caused the operation to fail. In this release, the merge operation succeeds.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-11 11:25:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1541529    

Description Juan Orti 2022-11-09 16:04:28 UTC
Description of problem:
When deleting a intermediate snapshot, the engine tries to synchronize a image chain in which the deleted volume still exists.

Version-Release number of selected component (if applicable):
NOTE: This environment has hotfix for bug 2123141
ovirt-engine-4.5.2.5-0.2.el8ev.noarch

How reproducible:
Happened once in customer environment. I've not been able to reproduce locally.

Steps to Reproduce:
1. VM on block-based storage has 2 snapshots: snap1, snap2. The image chain looks like:

1111-1111-1111-1111 [snap1] <- 2222-2222-2222-2222 [snap2] <- 3333-3333-3333-3333 [Active VM]

2. Delete the oldest snapshot snap1

Actual results:
- Volume 2222-2222-2222-2222 has been merged with the base volume 1111-1111-1111-1111. That's OK.
- The qcow2 volume 3333-3333-3333-3333 has 1111-1111-1111-1111 as backing file. That's OK.
- vdsm calls imageSyncVolumeChain with the volume 2222-2222-2222-2222 still present in the image chain, no idea why.

This causes that the volume 2222-2222-2222-2222 is still present in the chain in the LV tags, SD metadata and database. However the volume contents have been already merged and the qcow2 rebased to the base volume.

In this state, all future snapshot operations fail.

Expected results:
Correct image chain after the snapshot deletion.

Additional info:

Comment 26 Arik 2022-12-13 08:37:00 UTC
missing backports

Comment 34 sshmulev 2022-12-22 08:33:26 UTC
Verified.

Verifications steps:
1. Create a VM from a template and add a disk to the running VM using a qcow2 format
size: 10g
allocation: thin
storage domain: FC or iSCSI

2. Create a snapshot (snap1)
3. Fill the disk with random data: dd if=/dev/urandom bs=1M count=2555 of=/dev/sda oflag=direct conv=fsync
4. Create another snapshot (snap2)
5. Add stale bitmaps using to the base volume (snap1): 
for i in $(seq 70); do
    qemu-img bitmap --add /dev/<vg_name>/<lv_name> stale-bitmap-$i
done

6. In engine UI, delete snapshot snap1

Expected results:
Merge operation succeeds, without errors.

Actul results as expected.


Versions:
Engine-4.5.3.6-0.zstream.20221207085812.gitdecf5699b99.el8
vdsm-4.50.3.6-1.el8ev

Comment 36 errata-xmlrpc 2023-01-11 11:25:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV 4.4 SP1 [ovirt-4.5.3-3] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0074