+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1414472 +++ ====================================================================== Description of problem: During live merge operation, RHV-M is not checking if the required space is available on the storage domain. The merge commands will be send to vdsm and will fail during the extension of the base image. Log analysis in my test environment after replicating the customer issue is given below. Merge operation started jsonrpc.Executor/6::DEBUG::2017-01-18 05:49:43,657::__init__::529::jsonrpc.JsonRpcServer::(_handle_request) Calling 'VM.merge' in bridge with {u'topVolUUID': u'aded98c2-703d-4476-80d9-75bacedf00b3', u'vmID': u'421ea8a9-64df-bb0c-3d9d-8530d4ee1a46', u'drive': {u'poolID': u'00000001-0001-0001-0001-000000000149', u'volumeID': u'aded98c2-703d-4476-80d9-75bacedf00b3', u'domainID': u'67731f56-7950-4113-9e02-83304885eb92', u'imageID': u'8de814f8-317d-433e-97db-a8198a60883e'}, u'bandwidth': u'0', u'jobUUID': u'b656d43b-7470-4264-a904-2048562ef83f', u'baseVolUUID': u'5ac93096-a959-411f-971d-d1501a9ebfec'} Extending the base image failed. 38b4fe81-fe69-4053-823d-22f16b149e5e::DEBUG::2017-01-18 05:49:45,784::lvm::298::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /usr/sbin/lvm lvextend --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360014056bb17902b2654030a6331582c|/dev/mapper/360014058b3aa2e04ee343e988d5d3808|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --autobackup n --size 18560m 67731f56-7950-4113-9e02-83304885eb92/5ac93096-a959-411f-971d-d1501a9ebfec (cwd None) 38b4fe81-fe69-4053-823d-22f16b149e5e::ERROR::2017-01-18 05:49:45,869::storage_mailbox::174::Storage.SPM.Messages.Extend::(processRequest) processRequest: Exception caught while trying to extend volume: 5ac93096-a959-411f-971d-d1501a9ebfec in domain: 67731f56-7950-4113-9e02-83304885eb92 VolumeGroupSizeError: Volume Group not big enough: ('67731f56-7950-4113-9e02-83304885eb92/5ac93096-a959-411f-971d-d1501a9ebfec 18560 > 8576 (MiB)',) a2027abd-74d2-49e6-89ff-feb387b229c1::DEBUG::2017-01-18 05:49:47,006::vm::1042::virt.vm::(__verifyVolumeExtension) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Verifying extension for volume 5ac93096-a959-411f-971d-d1501a9ebfec, requested size 19461570560, current size 2281701376 a2027abd-74d2-49e6-89ff-feb387b229c1::ERROR::2017-01-18 05:49:47,006::task::868::Storage.TaskManager.Task::(_setError) Task=`318cd7f7-cf95-4afa-b5dd-13cc339e61bb`::Unexpected error Thread-19180::INFO::2017-01-18 05:49:44,579::vm::4889::virt.vm::(tryPivot) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Requesting pivot to complete active layer commit (job b656d43b-7470-4264-a904-2048562ef83f) Thread-19215::INFO::2017-01-18 05:53:41,758::vm::4925::virt.vm::(run) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::Synchronizing volume chain after live merge (job b656d43b-7470-4264-a904-2048562ef83f) Thread-19215::DEBUG::2017-01-18 05:53:41,788::vm::4725::virt.vm::(_syncVolumeChain) vmId=`421ea8a9-64df-bb0c-3d9d-8530d4ee1a46`::vdsm chain: ['aded98c2-703d-4476-80d9-75bacedf00b3', '5ac93096-a959-411f-971d-d1501a9ebfec'], libvirt chain: ['5ac93096-a959-411f-971d-d1501a9ebfec', 'aded98c2-703d-4476-80d9-75bacedf00b3'] Because of this , the leaf image will be marked as illegal in the storage domain metadata. Hence if we shutdown the VM, we will not be able to start it again and will fails with error "Bad volume specification" . Even we will not be able to delete the snapshot offline after increasing the storage domain space as the image is marked as illegal in the storage domain metadata. Also the event / engine log is only showing "Failed to delete snapshot 'test-snap' for VM 'RHEL7Gold'." which is not giving any hints to the end user the reason for failure. Version-Release number of selected component (if applicable): rhevm-4.0.6.3-0.1.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Create a thin provisioned disk for a VM and then create a snapshot for this disk 2. Do a write operation using dd command within the VM so that leaf image will extend up to the total disk size. 3. Fill the storage domain so that it is not having free space to merge the images. Actual results: Merge operation is started without verifying the free space in the storage domain. Expected results: RHV-M should not allow the merge operation if there is not enough free space in the storage domain required for merge operation. Additional info: (Originally by Nijin Ashok)
From customer view: The virtual machine had two images and a snapshot was performed. Then the storage run more or less out of disk space. Deleting the snapshots at some other VMs lead to an error (when they only had one large disk, but too less free disk space). Deleting the snapshot this VM with two disks (first small, second large), did not lead to an error, but failed. So some checking happens, but it is not complete or not consistent or similar. Nijin from GSS helped to get this solved, result is this bug. Cross-filed case 01777160 on the Red Hat customer portal. (Originally by redhat-bugzilla)
Verified with the following code: -------------------------------------- ovirt-engine-4.1.3-0.1.el7.noarch rhevm-4.1.3-0.1.el7.noarch vdsm-4.19.16-1.el7ev.x86_64 Verified with the following scenario: -------------------------------------- 1. Create a VM with 2 disks and start the VM 2. Create snapshots snap1, snap2, snap3 3. Write data to one of the disks until full. 3. Delete snap2 >>> An error is reported informing the user that the snapshot cannot be deleted Moving to VERIFIED!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1692