Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1414472 - RHV-M is not verifying the storage domain free space before running live merge
RHV-M is not verifying the storage domain free space before running live merge
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
4.0.6
All Linux
high Severity high
: ovirt-4.2.0
: 4.2.0
Assigned To: Benny Zlotnik
Kevin Alon Goldblatt
: ZStream
Depends On:
Blocks: 1450674
  Show dependency treegraph
 
Reported: 2017-01-18 10:34 EST by nijin ashok
Modified: 2018-05-15 13:42 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
: 1450674 (view as bug list)
Environment:
Last Closed: 2018-05-15 13:40:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 75096 master MERGED merge: validate available space before merge 2017-05-09 04:52 EDT
oVirt gerrit 76612 ovirt-engine-4.1 MERGED core: Refactor SubchainInfo 2017-05-17 09:00 EDT
oVirt gerrit 76613 ovirt-engine-4.1 MERGED merge: validate available space before merge 2017-05-22 04:58 EDT
Red Hat Product Errata RHEA-2018:1488 None None None 2018-05-15 13:42 EDT

  None (edit)
Description nijin ashok 2017-01-18 10:34:27 EST
Description of problem:

During live merge operation, RHV-M is not checking if the required space is available on the storage domain. The merge commands will be send to vdsm and will fail during the extension of the base image. 

Log analysis in my test environment after replicating the customer issue is given below.

Merge operation started 

jsonrpc.Executor/6::DEBUG::2017-01-18 05:49:43,657::__init__::529::jsonrpc.JsonRpcServer::(_handle_request) Calling 'VM.merge' in bridge with {u'topVolUUID': u'aded98c2-703d-4476-80d9-75bacedf00b3', u'vmID': u'421ea8a9-64df-bb0c-3d9d-8530d4ee1a46', u'drive': {u'poolID': u'00000001-0001-0001-0001-000000000149', u'volumeID': u'aded98c2-703d-4476-80d9-75bacedf00b3', u'domainID': u'67731f56-7950-4113-9e02-83304885eb92', u'imageID': u'8de814f8-317d-433e-97db-a8198a60883e'}, u'bandwidth': u'0', u'jobUUID': u'b656d43b-7470-4264-a904-2048562ef83f', u'baseVolUUID': u'5ac93096-a959-411f-971d-d1501a9ebfec'}

Extending the base image failed.

38b4fe81-fe69-4053-823d-22f16b149e5e::DEBUG::2017-01-18 05:49:45,784::lvm::298::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /usr/sbin/lvm lvextend --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360014056bb17902b2654030a6331582c|/dev/mapper/360014058b3aa2e04ee343e988d5d3808|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup n --size 18560m 67731f56-7950-4113-9e02-83304885eb92/5ac93096-a959-411f-971d-d1501a9ebfec (cwd None)
38b4fe81-fe69-4053-823d-22f16b149e5e::ERROR::2017-01-18 05:49:45,869::storage_mailbox::174::Storage.SPM.Messages.Extend::(processRequest) processRequest: Exception caught while trying to extend volume: 5ac93096-a959-411f-971d-d1501a9ebfec in domain: 67731f56-7950-4113-9e02-83304885eb92
VolumeGroupSizeError: Volume Group not big enough: ('67731f56-7950-4113-9e02-83304885eb92/5ac93096-a959-411f-971d-d1501a9ebfec 18560 > 8576 (MiB)',)

a2027abd-74d2-49e6-89ff-feb387b229c1::DEBUG::2017-01-18 05:49:47,006::vm::1042::virt.vm::(__verifyVolumeExtension) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Verifying extension for volume 5ac93096-a959-411f-971d-d1501a9ebfec, requested size 19461570560, current size 2281701376
a2027abd-74d2-49e6-89ff-feb387b229c1::ERROR::2017-01-18 05:49:47,006::task::868::Storage.TaskManager.Task::(_setError) Task=`318cd7f7-cf95-4afa-b5dd-13cc339e61bb`::Unexpected error

Thread-19180::INFO::2017-01-18 05:49:44,579::vm::4889::virt.vm::(tryPivot) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Requesting pivot to complete active layer commit (job b656d43b-7470-4264-a904-2048562ef83f)
Thread-19215::INFO::2017-01-18 05:53:41,758::vm::4925::virt.vm::(run) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Synchronizing volume chain after live merge (job b656d43b-7470-4264-a904-2048562ef83f)
Thread-19215::DEBUG::2017-01-18 05:53:41,788::vm::4725::virt.vm::(_syncVolumeChain) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::vdsm chain: ['aded98c2-703d-4476-80d9-75bacedf00b3', '5ac93096-a959-411f-971d-d1501a9ebfec'], libvirt chain: ['5ac93096-a959-411f-971d-d1501a9ebfec', 'aded98c2-703d-4476-80d9-75bacedf00b3']

Because of this , the leaf image will be marked as illegal in the storage domain metadata. Hence if we shutdown the VM, we will not be able to start it again and will fails with error "Bad volume specification" . Even we will not be able to delete the snapshot offline after increasing the storage domain space as the image is marked as illegal in the storage domain metadata.

Also the event  / engine log is only showing "Failed to delete snapshot 'test-snap' for VM 'RHEL7Gold'." which is not giving any hints to the end user the reason for failure.


Version-Release number of selected component (if applicable):

rhevm-4.0.6.3-0.1.el7ev.noarch

How reproducible:

100%

Steps to Reproduce:

1. Create a thin provisioned disk for a VM and then create a snapshot for this disk

2. Do a write operation using dd command within the VM so that leaf image will extend up to the total disk size.

3. Fill the storage domain so that it is not having free space to merge the images.

Actual results:

Merge operation is started without verifying the free space in the storage domain.  

Expected results:

RHV-M should not allow the merge operation if there is not enough free space in the storage domain required for merge operation.

Additional info:
Comment 1 Robert Scheck 2017-01-20 14:11:48 EST
From customer view: The virtual machine had two images and a snapshot was
performed. Then the storage run more or less out of disk space. Deleting
the snapshots at some other VMs lead to an error (when they only had one
large disk, but too less free disk space). Deleting the snapshot this VM
with two disks (first small, second large), did not lead to an error, but
failed. So some checking happens, but it is not complete or not consistent
or similar. Nijin from GSS helped to get this solved, result is this bug.

Cross-filed case 01777160 on the Red Hat customer portal.
Comment 5 Allon Mureinik 2017-06-13 05:26:46 EDT
All the patches here have been merged a long time ago.
Comment 6 Kevin Alon Goldblatt 2017-07-02 12:20:29 EDT
Verified with the following code:
--------------------------------------------
rpm -qovirt-engine-4.2.0-0.0.master.20170621095718.git8901d14.el7.centos.noarch
vdsm-4.20.1-66.git228c7be.el7.centos.x86_64


Verified with the following scenario:
-------------------------------------------
Steps to Reproduce:

1. Create a thin provisioned disk for a VM and then create a snapshot for this disk
2. Do a write operation using dd command within the VM so that leaf image will extend up to the total disk size.
3. Fill the storage domain so that it is not having free space to merge the images.

Actual results:
An error is generated indicating that there is not enough free space in the storage domain required for merge operation.
Comment 9 errata-xmlrpc 2018-05-15 13:40:52 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Note You need to log in before you can comment on or make changes to this bug.