Description of problem: When using the Backup API, this sequence takes place: 1) Snapshot Disk X of VM A 2) Hotplug snapshot of disk X (transient) to VM B 3) VM B does backup of disk 4) Hotunplug snapshot of Disk X from VM B 5) Teardown Disk X 6) Remove snapshot of Disk X If VM A and VM B are on the same host, the teardown fails, as the LVs of Disk X are still in use by VM A. See, step 4 (hotunplug) 2018-10-30 15:28:34,291+1000 INFO (jsonrpc/4) [api.virt] START hotunplugDisk(params={u'xml': u'<?xml version="1.0" encoding="UTF-8" standalone="yes"?><hotunplug><devices><disk><alias name="ua-788b33b4-83cf-4093-a5e5-a4401008fc03"/></disk></devices></hotunplug>', u'vmId': u'39df439f-f608-4e40-add3-643fd4802f5a'}) from=::ffff:192.168.100.253,45974, flow_id=1fdf83be-21b6-4817-9ff2-7e254375f1ec, vmId=39df439f-f608-4e40-add3-643fd4802f5a (api:46) Then step 5 (teardown) 2018-10-30 15:28:34,395+1000 INFO (jsonrpc/4) [vdsm.api] START teardownImage(sdUUID='f7eeca0e-b360-4d88-959a-1e0e0730f846', spUUID='f18d59c0-d67d-11e8-830f-52540015c1ff', imgUUID='788b33b4-83cf-4093-a5e5-a4401008fc03', volUUID=None) from=::ffff:192.168.100.253,45974, flow_id=1fdf83be-21b6-4817-9ff2-7e254375f1ec, task_id=9d94660a-0be5-4ad7-9ab4-ecb2ce19f304 (api:46) But the LVs are in use by a VM running on the host (the VM being backed up): 2018-10-30 15:28:44,161+1000 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='9d94660a-0be5-4ad7-9ab4-ecb2ce19f304') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in teardownImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3233, in teardownImage dom.deactivateImage(imgUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1302, in deactivateImage lvm.deactivateLVs(self.sdUUID, volUUIDs) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1303, in deactivateLVs _setLVAvailability(vgName, toDeactivate, "n") File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 834, in _setLVAvailability raise error(str(e)) CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume f7eeca0e-b360-4d88-959a-1e0e0730f846/f7473300-006d-43ce-b4de-f4ab82358ca2 in use.\', \' Logical volume f7eeca0e-b360-4d88-959a-1e0e0730f846/0b207436-50b1-460a-bb68-cf97f45cf042 in use.\']\\nf7eeca0e-b360-4d88-959a-1e0e0730f846/[\'f7473300-006d-43ce-b4de-f4ab82358ca2\', \'0b207436-50b1-460a-bb68-cf97f45cf042\']",)',) Version-Release number of selected component (if applicable): rhvm-4.2.6.4-0.1.el7ev.noarch vdsm-4.20.39.1-1.el7ev.x86_64 How reproducible: 100% Steps to reproduce: use the below on block storage. The backup agent VM does not need to be on block, just the target VM for the backup. https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/vm_backup.py
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
This does not look like a bug to me. The user tried to deactivate a disk which is in use by other VM on the same host. This operation succeeded in the engine side, as it should. The failure for deactivating the LV is ok because it is used by another VM on that host, no failure was shown to the user, just a log of the teardown failure in the VDSM log. The alternatives for handling this issue are: 1) Hide the exception in the VDSM log -> I think it is wrong because we need to know about it. 2) Prevent from the teardown to happened in this specific case: a) In the Engine -> Not possible since the teardown is done as part of the hotunplug operation and we are not going to change the flow b) In VDSM -> Not possible because we don't have the needed data for the validation at this stage in VDSM I recommend closing this bug as NOTABUG
(In reply to Eyal Shenitzky from comment #6) > no failure was shown to the user, just a log of the teardown failure in the > VDSM log. Yes, the user does not see any failure. But it does not seem right to me to try to deactive this LV, and as a consequence fill vdsm logs with exceptions during intensive VM backups. Still, I understand this would require significant changes. Once you decide what can or cannot be done I'll update the article and the customer about it.
Closing as NOTABUG according to comment #6, I don't see any impact aside for the exception in the logs which is logged righteously and attempting to change this behavior might lead to severe regressions