1196072 – Failed to auto shrink qcow block volumes on merge

Bug 1196072 - Failed to auto shrink qcow block volumes on merge

Summary: Failed to auto shrink qcow block volumes on merge

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	ovirt-3.6.0-rc
Target Release:	3.6.0
Assignee:	Adam Litke
QA Contact:	Ori Gofen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	oVirt_3.5.2_tracker 1197441 1199815
TreeView+	depends on / blocked

Reported:	2015-02-25 09:20 UTC by Raz Tamir
Modified:	2016-05-26 01:50 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1199815 (view as bug list)
Environment:
Last Closed:	2016-03-09 19:32:04 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
vdsm and engine logs (106.37 KB, application/x-gzip) 2015-02-25 09:20 UTC, Raz Tamir	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0362	0	normal	SHIPPED_LIVE	vdsm 3.6.0 bug fix and enhancement update	2016-03-09 23:49:32 UTC
oVirt gerrit	38355	0	master	MERGED	qemu-img: Handle image end offset on different lines of output	Never

Description Raz Tamir 2015-02-25 09:20:54 UTC

Created attachment 995038 [details]
vdsm and engine logs

Description of problem:
Setup with 1 vm + thin, block, boot disk 6 GB (3GB used - OS).
After creating snapshot with this disk, the actual disk size should be 4 GB.
After deleting the snapshot, I excpect that the size of this disk will shrink to 3GB.


Version-Release number of selected component (if applicable):
vt13.11
el7.1


How reproducible:
100%

Steps to Reproduce:
1. Create a vm with a thin provisioned 6GB disk  + OS.
2. Create a snapshot (While VM is up),
3. Delete snapshot (While VM is down)

Actual results:
The actual size doesn't changed


Expected results:
Check that Actual disk size became the same it wos before snapshots were made


Additional info:

Comment 1 Aharon Canan 2015-02-25 09:28:06 UTC


Another way to reproduce - 

All steps while VM is down.
Create 1G block disk, take disk snapshot - Actual size is 2G
Delete the snapshot - Acutal size remain 2G instead shrink to 1G.

Comment 2 Allon Mureinik 2015-02-25 16:02:56 UTC

(In reply to Aharon Canan from comment #1)
> 
> Another way to reproduce - 
> 
> All steps while VM is down.
> Create 1G block disk, take disk snapshot - Actual size is 2G
> Delete the snapshot - Acutal size remain 2G instead shrink to 1G.
On the storage or just in the database?

Comment 3 Aharon Canan 2015-02-25 16:16:10 UTC

(In reply to Allon Mureinik from comment #2)
> (In reply to Aharon Canan from comment #1)
> > 
> > Another way to reproduce - 
> > 
> > All steps while VM is down.
> > Create 1G block disk, take disk snapshot - Actual size is 2G
> > Delete the snapshot - Acutal size remain 2G instead shrink to 1G.
> On the storage or just in the database?

vdsClient, lvs etc'

Comment 4 Allon Mureinik 2015-02-25 16:21:18 UTC

Adam, can you take a look please?
Seems as a possible 3.5.1 blocker.

Thanks!

Comment 5 Adam Litke 2015-02-26 16:04:19 UTC

I don't think this is a bug:

According to your description the following has occurred:
1. Create VM on a sparse block volume
2. VM is writing to disk and the volume is extended twice (to 3G)
3. A snapshot is created (VM continues to write to disk causing cow allocation inside the new leaf volume).  Not enough data is written to require an extension of the leaf volume beyond the initial 1G allocation.
4. The VM is stopped
5. Delete the snapshot:

The underlying operation is a cold merge and requires data from the snapshot volume (size=3G) to be merged into the current leaf (size=1G).  This requires the leaf to be extended to 4G before starting the merge.  After the merge, we check if the leaf can be reduced at all.  This is done by using the qemu-img check command on the leaf volume.  If the amount of unallocated space in the volume is great enough, then the volume size can be reduced.

I suspect that the VM continued writing data into the snapshot so once merged, we needed to keep the size at 4G.  In order to verify this hypothesis, I'd need to see the vdsm.log for the SPM host during the same time frame to check for the call to qemu-img check.

Please reopen if you disagree with my analysis.

Comment 6 Raz Tamir 2015-03-01 09:13:08 UTC

Hi Adam,
The description is as follows:
1. Create VM on a sparse block volume
2. A snapshot is created
4. The VM is stopped
5. Delete the snapshot

There are no writes during the whole process.

Comment 7 Adam Litke 2015-03-03 19:50:27 UTC

After the snapshots is merged, we call shrinkToOptimalSize() on the remaining volume.  This calls qemu-img check on the volume to determine the amount of allocated space.  Due to a parse error when checking the output of qemu-img, the shrink does not proceed.  http://gerrit.ovirt.org/38355 updates the parser to handle the variance in output that I have observed.

Comment 9 Ori Gofen 2015-05-03 11:57:28 UTC

verified on 3.6 master, volume did grow after merge but with an extent size of 128M

Comment 12 errata-xmlrpc 2016-03-09 19:32:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html

Note You need to log in before you can comment on or make changes to this bug.