Bug 2141371

Summary:	Incorrect image chain when deleting an intermediate snapshot
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Juan Orti <jortialc>
Component:	vdsm	Assignee:	Albert Esteve <aesteve>
Status:	CLOSED ERRATA	QA Contact:	sshmulev
Severity:	high	Docs Contact:
Priority:	high
Version:	4.5.2	CC:	aesteve, ahadas, aperotti, bcholler, dfodor, emarcus, lsurette, michal.skrivanek, mzamazal, pelauter, sbonazzo, srevivo, ycui
Target Milestone:	ovirt-4.5.3-async
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	vdsm-4.50.3.6	Doc Type:	Bug Fix
Doc Text:	Previously, stale bitmaps in the base image during a cold or live internal merge caused the operation to fail. In this release, the merge operation succeeds.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-01-11 11:25:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1541529

Description Juan Orti 2022-11-09 16:04:28 UTC

Description of problem:
When deleting a intermediate snapshot, the engine tries to synchronize a image chain in which the deleted volume still exists.

Version-Release number of selected component (if applicable):
NOTE: This environment has hotfix for bug 2123141
ovirt-engine-4.5.2.5-0.2.el8ev.noarch

How reproducible:
Happened once in customer environment. I've not been able to reproduce locally.

Steps to Reproduce:
1. VM on block-based storage has 2 snapshots: snap1, snap2. The image chain looks like:

1111-1111-1111-1111 [snap1] <- 2222-2222-2222-2222 [snap2] <- 3333-3333-3333-3333 [Active VM]

2. Delete the oldest snapshot snap1

Actual results:
- Volume 2222-2222-2222-2222 has been merged with the base volume 1111-1111-1111-1111. That's OK.
- The qcow2 volume 3333-3333-3333-3333 has 1111-1111-1111-1111 as backing file. That's OK.
- vdsm calls imageSyncVolumeChain with the volume 2222-2222-2222-2222 still present in the image chain, no idea why.

This causes that the volume 2222-2222-2222-2222 is still present in the chain in the LV tags, SD metadata and database. However the volume contents have been already merged and the qcow2 rebased to the base volume.

In this state, all future snapshot operations fail.

Expected results:
Correct image chain after the snapshot deletion.

Additional info:

Comment 26 Arik 2022-12-13 08:37:00 UTC

missing backports

Comment 34 sshmulev 2022-12-22 08:33:26 UTC

Verified.

Verifications steps:
1. Create a VM from a template and add a disk to the running VM using a qcow2 format
size: 10g
allocation: thin
storage domain: FC or iSCSI

2. Create a snapshot (snap1)
3. Fill the disk with random data: dd if=/dev/urandom bs=1M count=2555 of=/dev/sda oflag=direct conv=fsync
4. Create another snapshot (snap2)
5. Add stale bitmaps using to the base volume (snap1): 
for i in $(seq 70); do
    qemu-img bitmap --add /dev/<vg_name>/<lv_name> stale-bitmap-$i
done

6. In engine UI, delete snapshot snap1

Expected results:
Merge operation succeeds, without errors.

Actul results as expected.


Versions:
Engine-4.5.3.6-0.zstream.20221207085812.gitdecf5699b99.el8
vdsm-4.50.3.6-1.el8ev

Comment 36 errata-xmlrpc 2023-01-11 11:25:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV 4.4 SP1 [ovirt-4.5.3-3] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0074