Bug 1707934 - [downstream clone - 4.2.10] [downstream clone - 4.3.4] Moving disk results in wrong SIZE/CAP key in the volume metadata
Summary: [downstream clone - 4.2.10] [downstream clone - 4.3.4] Moving disk results in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.2.8
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ovirt-4.2.10
: ---
Assignee: Vojtech Juranek
QA Contact: Yosi Ben Shimon
URL:
Whiteboard:
Depends On: 1700623 1707932
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-08 17:36 UTC by RHV bug bot
Modified: 2020-08-03 15:35 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1707932
Environment:
Last Closed: 2019-05-23 11:31:44 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4065541 0 Troubleshoot None After creating snapshot the VM paused with IO Error, the disk changed size and seems corrupted. 2019-05-08 17:37:21 UTC
Red Hat Product Errata RHBA-2019:1261 0 None None None 2019-05-23 11:31:46 UTC
oVirt gerrit 99834 0 ovirt-4.2 MERGED volume: remove size override when creating a volume with parent 2020-10-28 12:50:51 UTC
oVirt gerrit 99835 0 ovirt-4.2 MERGED volume: Repair volume metadata capacity 2020-10-28 12:51:06 UTC
oVirt gerrit 99838 0 ovirt-4.2 MERGED hsm: Repair volume capacity when preparing an image 2020-10-28 12:50:52 UTC
oVirt gerrit 99891 0 ovirt-4.2 MERGED volume: Validate volume size when creating snapshot 2020-10-28 12:51:07 UTC
oVirt gerrit 100007 0 ovirt-4.2 MERGED image: Prepare copyCollapsed for repairing capacity 2020-10-28 12:50:51 UTC
oVirt gerrit 100011 0 ovirt-4.2 ABANDONED image: Prepare copyCollapsed for repairing capacity 2020-10-28 12:50:51 UTC

Description RHV bug bot 2019-05-08 17:36:51 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1707932 +++
======================================================================

+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1700623 +++
======================================================================

Description of problem:

Moving the disk from storage A to B results in wrong SIZE key in the volume metadata on B if the volume has been previously extended.

Before moving
=============

# lvs -o +tags| grep b9fd9e73-32d3-473a-8cb5-d113602f76e1 | awk -F ' ' '{print $1,$2,$4,$5}'
359c2ea7-0a73-4296-8109-b799d9bfbd08 51e44de8-2fc0-4e99-8860-6820ff023108 1.00g IU_b9fd9e73-32d3-473a-8cb5-d113602f76e1,MD_23,PU_5f478dfb-78bb-4217-ad63-6927dab7cc90
5f478dfb-78bb-4217-ad63-6927dab7cc90 51e44de8-2fc0-4e99-8860-6820ff023108 5.00g IU_b9fd9e73-32d3-473a-8cb5-d113602f76e1,MD_22,PU_00000000-0000-0000-0000-000000000000

# dd status=none if=/dev/51e44de8-2fc0-4e99-8860-6820ff023108/metadata count=1 bs=512 skip=22 | grep -a SIZE
SIZE=10485760

# dd status=none if=/dev/51e44de8-2fc0-4e99-8860-6820ff023108/metadata count=1 bs=512 skip=23 | grep -a SIZE
SIZE=20971520

After moving
============

# lvs -o +tags| grep b9fd9e73-32d3-473a-8cb5-d113602f76e1 | awk -F ' ' '{print $1,$2,$4,$5}'
359c2ea7-0a73-4296-8109-b799d9bfbd08 43c67df7-2293-4756-9aa3-de09d67d7050 1.00g IU_b9fd9e73-32d3-473a-8cb5-d113602f76e1,MD_95,PU_5f478dfb-78bb-4217-ad63-6927dab7cc90
5f478dfb-78bb-4217-ad63-6927dab7cc90 43c67df7-2293-4756-9aa3-de09d67d7050 5.00g IU_b9fd9e73-32d3-473a-8cb5-d113602f76e1,MD_93,PU_00000000-0000-0000-0000-000000000000

# dd status=none if=/dev/43c67df7-2293-4756-9aa3-de09d67d7050/metadata count=1 bs=512 skip=93 | grep -a SIZE
SIZE=10485760

# dd status=none if=/dev/43c67df7-2293-4756-9aa3-de09d67d7050/metadata count=1 bs=512 skip=95 | grep -a SIZE
SIZE=10485760       <----------------------- wrong

The SIZE key in the metadata went from 20971520 on SRC SD to 10485760 (same as parent).

Add this to BZ1700189 and the severity of this is urgent.

Version-Release number of selected component (if applicable):
vdsm-4.20.47-1.el7ev
rhvm-4.2.8.5-0.1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create VM with 5GB disk
2. Snapshot it
3. Extend disk by 5GB
4. Move this to another SD

Additional info:
* Also happens on LIVE STORAGE MIGRATION
* The entire chain gets the wrong size, not just he leaf.

(Originally by Germano Veit Michel)

(Originally by rhv-bugzilla-bot)

Comment 1 RHV bug bot 2019-05-08 17:36:53 UTC
Note: this was block storage to block storage

(Originally by Germano Veit Michel)

(Originally by rhv-bugzilla-bot)

Comment 2 RHV bug bot 2019-05-08 17:36:55 UTC
The create volume command on DST SD looks right, not yet sure why the metadata is wrong.

2019-04-17 09:58:20,359+1000 INFO  (jsonrpc/2) [vdsm.api] START createVolume(sdUUID=u'43c67df7-2293-4756-9aa3-de09d67d7050', spUUID=u'da42e5a5-f6f7-49b4-8256-2adf690ddf4c', imgUUID=u'b9fd9e73-32d3-473a-8cb5-d113602f76e1', size=u'10737418240', volFormat=4, preallocate=2, diskType=u'DATA', volUUID=u'359c2ea7-0a73-4296-8109-b799d9bfbd08', desc=None, srcImgUUID=u'b9fd9e73-32d3-473a-8cb5-d113602f76e1', srcVolUUID=u'5f478dfb-78bb-4217-ad63-6927dab7cc90', initialSize=u'976128931') from=::ffff:10.64.24.161,49332, flow_id=23cc02dc-502c-4d33-9271-3f5b6b89a69a, task_id=c2e90abb-fa9c-415d-b9f7-e9d13520971d (api:46)

(Originally by Germano Veit Michel)

(Originally by rhv-bugzilla-bot)

Comment 5 RHV bug bot 2019-05-08 17:37:01 UTC
The issue is this line in volume.py:
 
1148                 # Override the size with the size of the parent
1149                 size = volParent.getSize()

When creating a volume with a parent volume, vdsm override the size sent
by engine silently.

The code was added in

commit 8a0236a2fdf4e81f9b73e9279606053797e14753
Author: Federico Simoncelli <fsimonce>
Date:   Tue Apr 17 18:33:51 2012 +0000

    Unify the volume creation code in volume.create
    
    This patch lays out the principles of the create volume flow (unified
    both for block and file storage domains).
    
    Signed-off-by: Federico Simoncelli <fsimonce>
    Change-Id: I0e44da32351a420f0536505985586b24ded81a2a
    Reviewed-on: http://gerrit.ovirt.org/3627
    Reviewed-by: Allon Mureinik <amureini>
    Reviewed-by: Ayal Baron <abaron>

The review does not exist on gerrit, and there is no info explaning why vdsm
need to  override the size sent by engine silently and use the parent size.
Maybe this was needed in the past to work around some engine bug or issue in
another vdsm flow.

So it seems that creating a volume chain with different sizes was always broken.

I think we need to:
- remove this override
- check if removing it breaks some other flow - may break snapshot creation if engine
  send the wrong size, maybe this code "fixes" such case.
- verify metadata size when preparing existing volume, and fix inconsistencies between
  qcow2 virtual size and volume size

(Originally by Nir Soffer)

(Originally by rhv-bugzilla-bot)

Comment 8 RHV bug bot 2019-05-08 17:37:06 UTC
We have 2 patches in review:

- https://gerrit.ovirt.org/c/99539/ - this fixes the root cause, creating volumes
  with bad metadata.

- https://gerrit.ovirt.org/c/99541 - this currently fail to prepare a volume with 
  bad metadata, so it would prevent corruption of the image when creating a snapshot,
  but it will fail starting a VM or moving a disk with such volume. I think we can
  fix bad metadata when preparing a volume, since we already do this for special 
  zero metadata size.

Both patches are small and simple and backport to 4.2 should be possible. When this
will be fixed upstream we can evaluate backport to 4.2.

(Originally by Nir Soffer)

(Originally by rhv-bugzilla-bot)

Comment 12 Nir Soffer 2019-05-09 16:56:38 UTC
Removing master and 4.3 patches, only 4.2 patches should be attached here.

Comment 14 Yosi Ben Shimon 2019-05-22 13:02:28 UTC
Tested using:
ovirt-engine-4.2.8.7-0.1.el7ev.noarch

Tried according to the steps in the description with:
1. iscsi -> iscsi (move disk to other domain - same type)
2. iscsi -> nfs (move disk to other domain - other type)

No failures and/or corruptions found

The volume capacity was as expected after extending it.

For example (2G in this case),

vdsm-client Volume getInfo storagepoolID=d51f09b5-3534-4fc5-bbeb-796172274255 storagedomainID=5e27bc90-38ba-417e-bcc7-e019223d5127 imageID=b2375de8-1b11-4f96-939e-837f5181cb8d volumeID=6127ad83-fb18-4459-a000-a2a8adf1e610
{
    "status": "OK", 
    "lease": {
        "path": "/dev/5e27bc90-38ba-417e-bcc7-e019223d5127/leases", 
        "owners": [], 
        "version": null, 
        "offset": 113246208
    }, 
    "domain": "5e27bc90-38ba-417e-bcc7-e019223d5127", 
    "capacity": "2147483648", 
    "voltype": "LEAF", 
    "description": "None", 
    "parent": "d10ef7b4-af9b-4f1a-bad7-2385a2ea1824", 
    "format": "COW", 
    "generation": 1, 
    "image": "b2375de8-1b11-4f96-939e-837f5181cb8d", 
    "uuid": "6127ad83-fb18-4459-a000-a2a8adf1e610", 
    "disktype": "DATA", 
    "legality": "LEGAL", 
    "mtime": "0", 
    "apparentsize": "1073741824", 
    "truesize": "1073741824", 
    "type": "SPARSE", 
    "children": [], 
    "pool": "", 
    "ctime": "1558528919"
}


Moving to VERIFIED

Comment 16 errata-xmlrpc 2019-05-23 11:31:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1261

Comment 17 Daniel Gur 2019-08-28 13:13:11 UTC
sync2jira

Comment 18 Daniel Gur 2019-08-28 13:17:24 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.