Bug 1414472

Summary: RHV-M is not verifying the storage domain free space before running live merge
Product: Red Hat Enterprise Virtualization Manager Reporter: nijin ashok <nashok>
Component: ovirt-engineAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED ERRATA QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: high Docs Contact:
Priority: high    
Version: 4.0.6CC: klaas, lsurette, ratamir, rbalakri, redhat-bugzilla, Rhev-m-bugs, robert.scheck, srevivo, tnisan, ykaul, ylavi
Target Milestone: ovirt-4.2.0Keywords: ZStream
Target Release: 4.2.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
: 1450674 (view as bug list) Environment:
Last Closed: 2018-05-15 17:40:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1450674    

Description nijin ashok 2017-01-18 15:34:27 UTC
Description of problem:

During live merge operation, RHV-M is not checking if the required space is available on the storage domain. The merge commands will be send to vdsm and will fail during the extension of the base image. 

Log analysis in my test environment after replicating the customer issue is given below.

Merge operation started 

jsonrpc.Executor/6::DEBUG::2017-01-18 05:49:43,657::__init__::529::jsonrpc.JsonRpcServer::(_handle_request) Calling 'VM.merge' in bridge with {u'topVolUUID': u'aded98c2-703d-4476-80d9-75bacedf00b3', u'vmID': u'421ea8a9-64df-bb0c-3d9d-8530d4ee1a46', u'drive': {u'poolID': u'00000001-0001-0001-0001-000000000149', u'volumeID': u'aded98c2-703d-4476-80d9-75bacedf00b3', u'domainID': u'67731f56-7950-4113-9e02-83304885eb92', u'imageID': u'8de814f8-317d-433e-97db-a8198a60883e'}, u'bandwidth': u'0', u'jobUUID': u'b656d43b-7470-4264-a904-2048562ef83f', u'baseVolUUID': u'5ac93096-a959-411f-971d-d1501a9ebfec'}

Extending the base image failed.

38b4fe81-fe69-4053-823d-22f16b149e5e::DEBUG::2017-01-18 05:49:45,784::lvm::298::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /usr/sbin/lvm lvextend --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360014056bb17902b2654030a6331582c|/dev/mapper/360014058b3aa2e04ee343e988d5d3808|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup n --size 18560m 67731f56-7950-4113-9e02-83304885eb92/5ac93096-a959-411f-971d-d1501a9ebfec (cwd None)
38b4fe81-fe69-4053-823d-22f16b149e5e::ERROR::2017-01-18 05:49:45,869::storage_mailbox::174::Storage.SPM.Messages.Extend::(processRequest) processRequest: Exception caught while trying to extend volume: 5ac93096-a959-411f-971d-d1501a9ebfec in domain: 67731f56-7950-4113-9e02-83304885eb92
VolumeGroupSizeError: Volume Group not big enough: ('67731f56-7950-4113-9e02-83304885eb92/5ac93096-a959-411f-971d-d1501a9ebfec 18560 > 8576 (MiB)',)

a2027abd-74d2-49e6-89ff-feb387b229c1::DEBUG::2017-01-18 05:49:47,006::vm::1042::virt.vm::(__verifyVolumeExtension) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Verifying extension for volume 5ac93096-a959-411f-971d-d1501a9ebfec, requested size 19461570560, current size 2281701376
a2027abd-74d2-49e6-89ff-feb387b229c1::ERROR::2017-01-18 05:49:47,006::task::868::Storage.TaskManager.Task::(_setError) Task=`318cd7f7-cf95-4afa-b5dd-13cc339e61bb`::Unexpected error

Thread-19180::INFO::2017-01-18 05:49:44,579::vm::4889::virt.vm::(tryPivot) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Requesting pivot to complete active layer commit (job b656d43b-7470-4264-a904-2048562ef83f)
Thread-19215::INFO::2017-01-18 05:53:41,758::vm::4925::virt.vm::(run) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Synchronizing volume chain after live merge (job b656d43b-7470-4264-a904-2048562ef83f)
Thread-19215::DEBUG::2017-01-18 05:53:41,788::vm::4725::virt.vm::(_syncVolumeChain) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::vdsm chain: ['aded98c2-703d-4476-80d9-75bacedf00b3', '5ac93096-a959-411f-971d-d1501a9ebfec'], libvirt chain: ['5ac93096-a959-411f-971d-d1501a9ebfec', 'aded98c2-703d-4476-80d9-75bacedf00b3']

Because of this , the leaf image will be marked as illegal in the storage domain metadata. Hence if we shutdown the VM, we will not be able to start it again and will fails with error "Bad volume specification" . Even we will not be able to delete the snapshot offline after increasing the storage domain space as the image is marked as illegal in the storage domain metadata.

Also the event  / engine log is only showing "Failed to delete snapshot 'test-snap' for VM 'RHEL7Gold'." which is not giving any hints to the end user the reason for failure.


Version-Release number of selected component (if applicable):

rhevm-4.0.6.3-0.1.el7ev.noarch

How reproducible:

100%

Steps to Reproduce:

1. Create a thin provisioned disk for a VM and then create a snapshot for this disk

2. Do a write operation using dd command within the VM so that leaf image will extend up to the total disk size.

3. Fill the storage domain so that it is not having free space to merge the images.

Actual results:

Merge operation is started without verifying the free space in the storage domain.  

Expected results:

RHV-M should not allow the merge operation if there is not enough free space in the storage domain required for merge operation.

Additional info:

Comment 1 Robert Scheck 2017-01-20 19:11:48 UTC
From customer view: The virtual machine had two images and a snapshot was
performed. Then the storage run more or less out of disk space. Deleting
the snapshots at some other VMs lead to an error (when they only had one
large disk, but too less free disk space). Deleting the snapshot this VM
with two disks (first small, second large), did not lead to an error, but
failed. So some checking happens, but it is not complete or not consistent
or similar. Nijin from GSS helped to get this solved, result is this bug.

Cross-filed case 01777160 on the Red Hat customer portal.

Comment 5 Allon Mureinik 2017-06-13 09:26:46 UTC
All the patches here have been merged a long time ago.

Comment 6 Kevin Alon Goldblatt 2017-07-02 16:20:29 UTC
Verified with the following code:
--------------------------------------------
rpm -qovirt-engine-4.2.0-0.0.master.20170621095718.git8901d14.el7.centos.noarch
vdsm-4.20.1-66.git228c7be.el7.centos.x86_64


Verified with the following scenario:
-------------------------------------------
Steps to Reproduce:

1. Create a thin provisioned disk for a VM and then create a snapshot for this disk
2. Do a write operation using dd command within the VM so that leaf image will extend up to the total disk size.
3. Fill the storage domain so that it is not having free space to merge the images.

Actual results:
An error is generated indicating that there is not enough free space in the storage domain required for merge operation.

Comment 9 errata-xmlrpc 2018-05-15 17:40:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 10 Franta Kust 2019-05-16 13:08:28 UTC
BZ<2>Jira Resync