Bug 1707932

Summary: [downstream clone - 4.3.4] Moving disk results in wrong SIZE/CAP key in the volume metadata
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: vdsmAssignee: Vojtech Juranek <vjuranek>
Status: CLOSED ERRATA QA Contact: Shir Fishbain <sfishbai>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.2.8CC: aefrat, bcholler, eshenitz, fkust, frolland, gveitmic, jinjli, lsurette, mkalinin, nsoffer, pvilayat, Rhev-m-bugs, rhodain, royoung, srevivo, tnisan, ycui
Target Milestone: ovirt-4.3.4Keywords: ZStream
Target Release: 4.3.1Flags: lsvaty: testing_plan_complete-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.30.15 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1700623
: 1707934 (view as bug list) Environment:
Last Closed: 2019-06-20 14:48:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1700623    
Bug Blocks: 1707934    

Description RHV bug bot 2019-05-08 17:32:48 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1700623 +++
======================================================================

Description of problem:

Moving the disk from storage A to B results in wrong SIZE key in the volume metadata on B if the volume has been previously extended.

Before moving
=============

# lvs -o +tags| grep b9fd9e73-32d3-473a-8cb5-d113602f76e1 | awk -F ' ' '{print $1,$2,$4,$5}'
359c2ea7-0a73-4296-8109-b799d9bfbd08 51e44de8-2fc0-4e99-8860-6820ff023108 1.00g IU_b9fd9e73-32d3-473a-8cb5-d113602f76e1,MD_23,PU_5f478dfb-78bb-4217-ad63-6927dab7cc90
5f478dfb-78bb-4217-ad63-6927dab7cc90 51e44de8-2fc0-4e99-8860-6820ff023108 5.00g IU_b9fd9e73-32d3-473a-8cb5-d113602f76e1,MD_22,PU_00000000-0000-0000-0000-000000000000

# dd status=none if=/dev/51e44de8-2fc0-4e99-8860-6820ff023108/metadata count=1 bs=512 skip=22 | grep -a SIZE
SIZE=10485760

# dd status=none if=/dev/51e44de8-2fc0-4e99-8860-6820ff023108/metadata count=1 bs=512 skip=23 | grep -a SIZE
SIZE=20971520

After moving
============

# lvs -o +tags| grep b9fd9e73-32d3-473a-8cb5-d113602f76e1 | awk -F ' ' '{print $1,$2,$4,$5}'
359c2ea7-0a73-4296-8109-b799d9bfbd08 43c67df7-2293-4756-9aa3-de09d67d7050 1.00g IU_b9fd9e73-32d3-473a-8cb5-d113602f76e1,MD_95,PU_5f478dfb-78bb-4217-ad63-6927dab7cc90
5f478dfb-78bb-4217-ad63-6927dab7cc90 43c67df7-2293-4756-9aa3-de09d67d7050 5.00g IU_b9fd9e73-32d3-473a-8cb5-d113602f76e1,MD_93,PU_00000000-0000-0000-0000-000000000000

# dd status=none if=/dev/43c67df7-2293-4756-9aa3-de09d67d7050/metadata count=1 bs=512 skip=93 | grep -a SIZE
SIZE=10485760

# dd status=none if=/dev/43c67df7-2293-4756-9aa3-de09d67d7050/metadata count=1 bs=512 skip=95 | grep -a SIZE
SIZE=10485760       <----------------------- wrong

The SIZE key in the metadata went from 20971520 on SRC SD to 10485760 (same as parent).

Add this to BZ1700189 and the severity of this is urgent.

Version-Release number of selected component (if applicable):
vdsm-4.20.47-1.el7ev
rhvm-4.2.8.5-0.1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create VM with 5GB disk
2. Snapshot it
3. Extend disk by 5GB
4. Move this to another SD

Additional info:
* Also happens on LIVE STORAGE MIGRATION
* The entire chain gets the wrong size, not just he leaf.

(Originally by Germano Veit Michel)

Comment 1 RHV bug bot 2019-05-08 17:32:51 UTC
Note: this was block storage to block storage

(Originally by Germano Veit Michel)

Comment 2 RHV bug bot 2019-05-08 17:32:53 UTC
The create volume command on DST SD looks right, not yet sure why the metadata is wrong.

2019-04-17 09:58:20,359+1000 INFO  (jsonrpc/2) [vdsm.api] START createVolume(sdUUID=u'43c67df7-2293-4756-9aa3-de09d67d7050', spUUID=u'da42e5a5-f6f7-49b4-8256-2adf690ddf4c', imgUUID=u'b9fd9e73-32d3-473a-8cb5-d113602f76e1', size=u'10737418240', volFormat=4, preallocate=2, diskType=u'DATA', volUUID=u'359c2ea7-0a73-4296-8109-b799d9bfbd08', desc=None, srcImgUUID=u'b9fd9e73-32d3-473a-8cb5-d113602f76e1', srcVolUUID=u'5f478dfb-78bb-4217-ad63-6927dab7cc90', initialSize=u'976128931') from=::ffff:10.64.24.161,49332, flow_id=23cc02dc-502c-4d33-9271-3f5b6b89a69a, task_id=c2e90abb-fa9c-415d-b9f7-e9d13520971d (api:46)

(Originally by Germano Veit Michel)

Comment 5 RHV bug bot 2019-05-08 17:32:59 UTC
The issue is this line in volume.py:
 
1148                 # Override the size with the size of the parent
1149                 size = volParent.getSize()

When creating a volume with a parent volume, vdsm override the size sent
by engine silently.

The code was added in

commit 8a0236a2fdf4e81f9b73e9279606053797e14753
Author: Federico Simoncelli <fsimonce>
Date:   Tue Apr 17 18:33:51 2012 +0000

    Unify the volume creation code in volume.create
    
    This patch lays out the principles of the create volume flow (unified
    both for block and file storage domains).
    
    Signed-off-by: Federico Simoncelli <fsimonce>
    Change-Id: I0e44da32351a420f0536505985586b24ded81a2a
    Reviewed-on: http://gerrit.ovirt.org/3627
    Reviewed-by: Allon Mureinik <amureini>
    Reviewed-by: Ayal Baron <abaron>

The review does not exist on gerrit, and there is no info explaning why vdsm
need to  override the size sent by engine silently and use the parent size.
Maybe this was needed in the past to work around some engine bug or issue in
another vdsm flow.

So it seems that creating a volume chain with different sizes was always broken.

I think we need to:
- remove this override
- check if removing it breaks some other flow - may break snapshot creation if engine
  send the wrong size, maybe this code "fixes" such case.
- verify metadata size when preparing existing volume, and fix inconsistencies between
  qcow2 virtual size and volume size

(Originally by Nir Soffer)

Comment 8 RHV bug bot 2019-05-08 17:33:05 UTC
We have 2 patches in review:

- https://gerrit.ovirt.org/c/99539/ - this fixes the root cause, creating volumes
  with bad metadata.

- https://gerrit.ovirt.org/c/99541 - this currently fail to prepare a volume with 
  bad metadata, so it would prevent corruption of the image when creating a snapshot,
  but it will fail starting a VM or moving a disk with such volume. I think we can
  fix bad metadata when preparing a volume, since we already do this for special 
  zero metadata size.

Both patches are small and simple and backport to 4.2 should be possible. When this
will be fixed upstream we can evaluate backport to 4.2.

(Originally by Nir Soffer)

Comment 13 Nir Soffer 2019-05-09 16:55:10 UTC
Removing master patches, only 4.3 patches should be attached here.

Comment 14 RHV bug bot 2019-05-16 15:29:12 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Project 'vdsm'/Component 'ovirt-engine' mismatch]

For more info please contact: rhv-devops

Comment 16 Shir Fishbain 2019-05-26 16:05:06 UTC
Verified 

The SIZE key in the metadata is the same as a parent.
Metadate disk size didn't change as a result of the moving action 

ovirt-engine-4.3.4.1-0.1.el7.noarch
vdsm-4.30.16-3.el7ev.x86_64

Comment 18 errata-xmlrpc 2019-06-20 14:48:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1567

Comment 19 Daniel Gur 2019-08-28 13:12:58 UTC
sync2jira

Comment 20 Daniel Gur 2019-08-28 13:17:10 UTC
sync2jira