Bug 1196072

Summary: Failed to auto shrink qcow block volumes on merge
Product: Red Hat Enterprise Virtualization Manager Reporter: Raz Tamir <ratamir>
Component: vdsmAssignee: Adam Litke <alitke>
Status: CLOSED ERRATA QA Contact: Ori Gofen <ogofen>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: acanan, alitke, amureini, bazulay, gklein, lpeer, lsurette, nsoffer, ratamir, sbonazzo, yeylon, ykaul, ylavi
Target Milestone: ovirt-3.6.0-rcKeywords: Regression, Reopened, ZStream
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1199815 (view as bug list) Environment:
Last Closed: 2016-03-09 19:32:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186161, 1197441, 1199815    
Attachments:
Description Flags
vdsm and engine logs none

Description Raz Tamir 2015-02-25 09:20:54 UTC
Created attachment 995038 [details]
vdsm and engine logs

Description of problem:
Setup with 1 vm + thin, block, boot disk 6 GB (3GB used - OS).
After creating snapshot with this disk, the actual disk size should be 4 GB.
After deleting the snapshot, I excpect that the size of this disk will shrink to 3GB.


Version-Release number of selected component (if applicable):
vt13.11
el7.1


How reproducible:
100%

Steps to Reproduce:
1. Create a vm with a thin provisioned 6GB disk  + OS.
2. Create a snapshot (While VM is up),
3. Delete snapshot (While VM is down)

Actual results:
The actual size doesn't changed


Expected results:
Check that Actual disk size became the same it wos before snapshots were made


Additional info:

Comment 1 Aharon Canan 2015-02-25 09:28:06 UTC

Another way to reproduce - 

All steps while VM is down.
Create 1G block disk, take disk snapshot - Actual size is 2G
Delete the snapshot - Acutal size remain 2G instead shrink to 1G.

Comment 2 Allon Mureinik 2015-02-25 16:02:56 UTC
(In reply to Aharon Canan from comment #1)
> 
> Another way to reproduce - 
> 
> All steps while VM is down.
> Create 1G block disk, take disk snapshot - Actual size is 2G
> Delete the snapshot - Acutal size remain 2G instead shrink to 1G.
On the storage or just in the database?

Comment 3 Aharon Canan 2015-02-25 16:16:10 UTC
(In reply to Allon Mureinik from comment #2)
> (In reply to Aharon Canan from comment #1)
> > 
> > Another way to reproduce - 
> > 
> > All steps while VM is down.
> > Create 1G block disk, take disk snapshot - Actual size is 2G
> > Delete the snapshot - Acutal size remain 2G instead shrink to 1G.
> On the storage or just in the database?

vdsClient, lvs etc'

Comment 4 Allon Mureinik 2015-02-25 16:21:18 UTC
Adam, can you take a look please?
Seems as a possible 3.5.1 blocker.

Thanks!

Comment 5 Adam Litke 2015-02-26 16:04:19 UTC
I don't think this is a bug:

According to your description the following has occurred:
1. Create VM on a sparse block volume
2. VM is writing to disk and the volume is extended twice (to 3G)
3. A snapshot is created (VM continues to write to disk causing cow allocation inside the new leaf volume).  Not enough data is written to require an extension of the leaf volume beyond the initial 1G allocation.
4. The VM is stopped
5. Delete the snapshot:

The underlying operation is a cold merge and requires data from the snapshot volume (size=3G) to be merged into the current leaf (size=1G).  This requires the leaf to be extended to 4G before starting the merge.  After the merge, we check if the leaf can be reduced at all.  This is done by using the qemu-img check command on the leaf volume.  If the amount of unallocated space in the volume is great enough, then the volume size can be reduced.

I suspect that the VM continued writing data into the snapshot so once merged, we needed to keep the size at 4G.  In order to verify this hypothesis, I'd need to see the vdsm.log for the SPM host during the same time frame to check for the call to qemu-img check.

Please reopen if you disagree with my analysis.

Comment 6 Raz Tamir 2015-03-01 09:13:08 UTC
Hi Adam,
The description is as follows:
1. Create VM on a sparse block volume
2. A snapshot is created
4. The VM is stopped
5. Delete the snapshot

There are no writes during the whole process.

Comment 7 Adam Litke 2015-03-03 19:50:27 UTC
After the snapshots is merged, we call shrinkToOptimalSize() on the remaining volume.  This calls qemu-img check on the volume to determine the amount of allocated space.  Due to a parse error when checking the output of qemu-img, the shrink does not proceed.  http://gerrit.ovirt.org/38355 updates the parser to handle the variance in output that I have observed.

Comment 9 Ori Gofen 2015-05-03 11:57:28 UTC
verified on 3.6 master, volume did grow after merge but with an extent size of 128M

Comment 12 errata-xmlrpc 2016-03-09 19:32:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html