Bug 1450674 - [downstream clone - 4.1.3] RHV-M is not verifying the storage domain free space before running live merge
Summary: [downstream clone - 4.1.3] RHV-M is not verifying the storage domain free spa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.0.6
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-4.1.3
: ---
Assignee: Benny Zlotnik
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On: 1414472
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-14 14:21 UTC by rhev-integ
Modified: 2020-08-13 09:11 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1414472
Environment:
Last Closed: 2017-07-06 07:30:42 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1692 0 normal SHIPPED_LIVE Red Hat Virtualization Manager (ovirt-engine) 4.1.3 2017-07-06 11:24:35 UTC
oVirt gerrit 75096 0 master MERGED merge: validate available space before merge 2017-05-14 14:21:51 UTC
oVirt gerrit 76612 0 ovirt-engine-4.1 MERGED core: Refactor SubchainInfo 2017-05-17 13:00:49 UTC
oVirt gerrit 76613 0 ovirt-engine-4.1 MERGED merge: validate available space before merge 2017-05-22 08:58:45 UTC

Description rhev-integ 2017-05-14 14:21:17 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1414472 +++
======================================================================

Description of problem:

During live merge operation, RHV-M is not checking if the required space is available on the storage domain. The merge commands will be send to vdsm and will fail during the extension of the base image. 

Log analysis in my test environment after replicating the customer issue is given below.

Merge operation started 

jsonrpc.Executor/6::DEBUG::2017-01-18 05:49:43,657::__init__::529::jsonrpc.JsonRpcServer::(_handle_request) Calling 'VM.merge' in bridge with {u'topVolUUID': u'aded98c2-703d-4476-80d9-75bacedf00b3', u'vmID': u'421ea8a9-64df-bb0c-3d9d-8530d4ee1a46', u'drive': {u'poolID': u'00000001-0001-0001-0001-000000000149', u'volumeID': u'aded98c2-703d-4476-80d9-75bacedf00b3', u'domainID': u'67731f56-7950-4113-9e02-83304885eb92', u'imageID': u'8de814f8-317d-433e-97db-a8198a60883e'}, u'bandwidth': u'0', u'jobUUID': u'b656d43b-7470-4264-a904-2048562ef83f', u'baseVolUUID': u'5ac93096-a959-411f-971d-d1501a9ebfec'}

Extending the base image failed.

38b4fe81-fe69-4053-823d-22f16b149e5e::DEBUG::2017-01-18 05:49:45,784::lvm::298::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /usr/sbin/lvm lvextend --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360014056bb17902b2654030a6331582c|/dev/mapper/360014058b3aa2e04ee343e988d5d3808|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup n --size 18560m 67731f56-7950-4113-9e02-83304885eb92/5ac93096-a959-411f-971d-d1501a9ebfec (cwd None)
38b4fe81-fe69-4053-823d-22f16b149e5e::ERROR::2017-01-18 05:49:45,869::storage_mailbox::174::Storage.SPM.Messages.Extend::(processRequest) processRequest: Exception caught while trying to extend volume: 5ac93096-a959-411f-971d-d1501a9ebfec in domain: 67731f56-7950-4113-9e02-83304885eb92
VolumeGroupSizeError: Volume Group not big enough: ('67731f56-7950-4113-9e02-83304885eb92/5ac93096-a959-411f-971d-d1501a9ebfec 18560 > 8576 (MiB)',)

a2027abd-74d2-49e6-89ff-feb387b229c1::DEBUG::2017-01-18 05:49:47,006::vm::1042::virt.vm::(__verifyVolumeExtension) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Verifying extension for volume 5ac93096-a959-411f-971d-d1501a9ebfec, requested size 19461570560, current size 2281701376
a2027abd-74d2-49e6-89ff-feb387b229c1::ERROR::2017-01-18 05:49:47,006::task::868::Storage.TaskManager.Task::(_setError) Task=`318cd7f7-cf95-4afa-b5dd-13cc339e61bb`::Unexpected error

Thread-19180::INFO::2017-01-18 05:49:44,579::vm::4889::virt.vm::(tryPivot) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Requesting pivot to complete active layer commit (job b656d43b-7470-4264-a904-2048562ef83f)
Thread-19215::INFO::2017-01-18 05:53:41,758::vm::4925::virt.vm::(run) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Synchronizing volume chain after live merge (job b656d43b-7470-4264-a904-2048562ef83f)
Thread-19215::DEBUG::2017-01-18 05:53:41,788::vm::4725::virt.vm::(_syncVolumeChain) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::vdsm chain: ['aded98c2-703d-4476-80d9-75bacedf00b3', '5ac93096-a959-411f-971d-d1501a9ebfec'], libvirt chain: ['5ac93096-a959-411f-971d-d1501a9ebfec', 'aded98c2-703d-4476-80d9-75bacedf00b3']

Because of this , the leaf image will be marked as illegal in the storage domain metadata. Hence if we shutdown the VM, we will not be able to start it again and will fails with error "Bad volume specification" . Even we will not be able to delete the snapshot offline after increasing the storage domain space as the image is marked as illegal in the storage domain metadata.

Also the event  / engine log is only showing "Failed to delete snapshot 'test-snap' for VM 'RHEL7Gold'." which is not giving any hints to the end user the reason for failure.


Version-Release number of selected component (if applicable):

rhevm-4.0.6.3-0.1.el7ev.noarch

How reproducible:

100%

Steps to Reproduce:

1. Create a thin provisioned disk for a VM and then create a snapshot for this disk

2. Do a write operation using dd command within the VM so that leaf image will extend up to the total disk size.

3. Fill the storage domain so that it is not having free space to merge the images.

Actual results:

Merge operation is started without verifying the free space in the storage domain.  

Expected results:

RHV-M should not allow the merge operation if there is not enough free space in the storage domain required for merge operation.

Additional info:

(Originally by Nijin Ashok)

Comment 1 rhev-integ 2017-05-14 14:21:27 UTC
From customer view: The virtual machine had two images and a snapshot was
performed. Then the storage run more or less out of disk space. Deleting
the snapshots at some other VMs lead to an error (when they only had one
large disk, but too less free disk space). Deleting the snapshot this VM
with two disks (first small, second large), did not lead to an error, but
failed. So some checking happens, but it is not complete or not consistent
or similar. Nijin from GSS helped to get this solved, result is this bug.

Cross-filed case 01777160 on the Red Hat customer portal.

(Originally by redhat-bugzilla)

Comment 6 Kevin Alon Goldblatt 2017-06-05 11:55:03 UTC
Verified with the following code:
--------------------------------------
ovirt-engine-4.1.3-0.1.el7.noarch
rhevm-4.1.3-0.1.el7.noarch
vdsm-4.19.16-1.el7ev.x86_64

Verified with the following scenario:
--------------------------------------
1. Create a VM with 2 disks and start the VM
2. Create snapshots snap1, snap2, snap3
3. Write data to one of the disks until full.
3. Delete snap2 >>> An error is reported informing the user that the snapshot cannot be deleted



Moving to VERIFIED!

Comment 8 errata-xmlrpc 2017-07-06 07:30:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1692


Note You need to log in before you can comment on or make changes to this bug.