Description of problem: If a block job somehow failed in the libvirtd side, the vdsm will continuously execute the cleanup thread for live merge until the VM goes down. A merge operation starts at the backend to merge volume ded1883f and be4a4d85. 2017-10-12 14:56:21,244+0530 INFO (jsonrpc/6) [vdsm.api] START merge(driveSpec={u'poolID': u'00000001-0001-0001-0001-000000000311', u'volumeID': u'ded1883f-94de-4510-8ddd-7ecc432e4009', u'domainID': u'8257cf14-d88d-4e4e-998c-9f8976dac2a2', u'imageID': u'7455de38-1df1-4acd-b07c-9dc2138aafb3'}, baseVolUUID=u'be4a4d85-d7e6-4725-b7f5-90c9d935c336', topVolUUID=u'ded1883f-94de-4510-8ddd-7ecc432e4009', bandwidth=u'0', jobUUID=u'26702cd2-6666-4eca-86dd-ac072f0c4478') from=::ffff:10.74.130.115,58346, flow_id=cafd88cf-d091-4a0d-95aa-645fcc714955 (api:46) Block job terminated in the libvirtd side and vdsm detects it correctly. 2017-10-12 14:56:49,824+0530 DEBUG (periodic/0) [virt.vm] (vmId='c1ed19b5-e876-4ec0-8067-837bed214d48') Checking job 26702cd2-6666-4eca-86dd-ac072f0c4478 (vm:4760) 2017-10-12 14:56:49,840+0530 INFO (periodic/0) [virt.vm] (vmId='c1ed19b5-e876-4ec0-8067-837bed214d48') Libvirt job 26702cd2-6666-4eca-86dd-ac072f0c4478 was terminated (vm:4799) The chain in both libvirtd and vdsm was detected correctly. 2017-10-12 14:56:50,043+0530 DEBUG (merge/c1ed19b5) [virt.vm] (vmId='c1ed19b5-e876-4ec0-8067-837bed214d48') vdsm chain: ['ded1883f-94de-4510-8ddd-7ecc432e4009', 'be4a4d85-d7e6-4725-b7f5-90c9d935c336'], libvirt chain: ['be4a4d85-d7e6-4725-b7f5-90c9d935c336', 'ded1883f-94de-4510-8ddd-7ecc432e4009'] (vm:5008) 2017-10-12 14:56:50,043+0530 INFO (merge/c1ed19b5) [vdsm.api] START imageSyncVolumeChain(sdUUID=u'8257cf14-d88d-4e4e-998c-9f8976dac2a2', imgUUID=u'7455de38-1df1-4acd-b07c-9dc2138aafb3', volUUID=u'ded1883f-94de-4510-8ddd-7ecc432e4009', newChain=['be4a4d85-d7e6-4725-b7f5-90c9d935c336', 'ded1883f-94de-4510-8ddd-7ecc432e4009']) from=internal, task_id=b67ea7bd-0073-4977-9612-bdb0d303906a (api:46) 2017-10-12 14:56:50,142+0530 INFO (merge/c1ed19b5) [storage.Image] Current chain=be4a4d85-d7e6-4725-b7f5-90c9d935c336 < ded1883f-94de-4510-8ddd-7ecc432e4009 (top) (image:1266) 2017-10-12 14:56:50,144+0530 INFO (merge/c1ed19b5) [vdsm.api] FINISH imageSyncVolumeChain return=None from=internal, task_id=b67ea7bd-0073-4977-9612-bdb0d303906a (api:52) However, after syncVolumeChain, teardown_top_volume was executed which will fail as the image is still in use by the VM. self.vm._syncVolumeChain(self.drive) if self.doPivot: self.vm.enableDriveMonitor() self.teardown_top_volume() 2017-10-12 14:56:55,007+0530 DEBUG (merge/c1ed19b5) [storage.Misc.excCmd] FAILED: <err> = ' Logical volume 8257cf14-d88d-4e4e-998c-9f8976dac2a2/ded1883f-94de-4510-8ddd-7ecc432e4009 in use.\n'; <rc> = 5 (commands:93) 2017-10-12 14:56:55,009+0530 ERROR (merge/c1ed19b5) [root] Unhandled exception (utils:374) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 368, in wrapper return f(*a, **kw) File "/usr/share/vdsm/virt/vm.py", line 5287, in run self.teardown_top_volume() File "/usr/share/vdsm/virt/vm.py", line 5270, in teardown_top_volume self.job['topVolume']) File "/usr/share/vdsm/storage/blockSD.py", line 918, in teardownVolume lvm.deactivateLVs(self.sdUUID, [volUUID]) File "/usr/share/vdsm/storage/lvm.py", line 1304, in deactivateLVs _setLVAvailability(vgName, toDeactivate, "n") File "/usr/share/vdsm/storage/lvm.py", line 843, in _setLVAvailability raise error(str(e)) CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume 8257cf14-d88d-4e4e-998c-9f8976dac2a2/ded1883f-94de-4510-8ddd-7ecc432e4009 in use.\']\\n8257cf14-d88d-4e4e-998c-9f8976dac2a2/[u\'ded1883f-94de-4510-8ddd-7ecc432e4009\']",)',) 2017-10-12 14:56:55,028+0530 ERROR (merge/c1ed19b5) [root] Unhandled exception (utils:374) I think we should only do teardown_top_volume if doPivot is set. The vdsm will try to do this until the VM goes down. 2017-10-16 06:01:10,915+0530 DEBUG (periodic/3) [virt.vm] (vmId='c1ed19b5-e876-4ec0-8067-837bed214d48') Checking job 26702cd2-6666-4eca-86dd-ac072f0c4478 (vm:4760) 2017-10-16 06:01:10,915+0530 INFO (periodic/3) [virt.vm] (vmId='c1ed19b5-e876-4ec0-8067-837bed214d48') Previous job 26702cd2-6666-4eca-86dd-ac072f0c4478 cleanup thread failed, retrying (vm:4817) Version-Release number of selected component (if applicable): rhevm-4.1.6.2-0.1.el7.noarch vdsm-api-4.19.31-1.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Start a live merge. 2. During live merge, abort the block job manually from the libvirtd. Actual results: The vdsm is not correctly doing the cleanup if the job failed in libvirtd Expected results: vdsm should detect the failure correctly and correct the chains. Additional info:
-------------------------------------- Tested with the following code: ---------------------------------------- ovirt-engine-4.2.0-0.0.master.20171112130303.git8bc889c.el7.centos.noarch vdsm-4.20.6-62.gitd3023e4.el7.centos.x86_64 Tested with the following scenario: Steps to Reproduce: 1. Start a live merge. 2. During live merge, abort the block job manually from the libvirtd. Actual results: vdsm detected the libvirt job failure and no error is shown in the vdsm log Expected results: Moving to VERIFIED!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1489
BZ<2>Jira Resync
sync2jira