1644142 – After disk hotunplug, teardown attempts to deactivate in-use LVs.

Bug 1644142 - After disk hotunplug, teardown attempts to deactivate in-use LVs.

Summary: After disk hotunplug, teardown attempts to deactivate in-use LVs.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.2.6
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.3.5
Target Release:	4.3.0
Assignee:	Eyal Shenitzky
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-30 05:33 UTC by Germano Veit Michel
Modified:	2020-08-03 15:38 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-05-22 12:01:23 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:
Flags:	lsvaty: testing_plan_complete-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3669461	0	None	None	None	2018-10-30 06:02:39 UTC

Description Germano Veit Michel 2018-10-30 05:33:58 UTC

Description of problem:

When using the Backup API, this sequence takes place:

1) Snapshot Disk X of VM A
2) Hotplug snapshot of disk X (transient) to VM B
3) VM B does backup of disk
4) Hotunplug snapshot of Disk X from VM B
5) Teardown Disk X
6) Remove snapshot of Disk X

If VM A and VM B are on the same host, the teardown fails, as the LVs of Disk X are still in use by VM A.

See, step 4 (hotunplug)

2018-10-30 15:28:34,291+1000 INFO  (jsonrpc/4) [api.virt] START hotunplugDisk(params={u'xml': u'<?xml version="1.0" encoding="UTF-8" standalone="yes"?><hotunplug><devices><disk><alias name="ua-788b33b4-83cf-4093-a5e5-a4401008fc03"/></disk></devices></hotunplug>', u'vmId': u'39df439f-f608-4e40-add3-643fd4802f5a'}) from=::ffff:192.168.100.253,45974, flow_id=1fdf83be-21b6-4817-9ff2-7e254375f1ec, vmId=39df439f-f608-4e40-add3-643fd4802f5a (api:46)

Then step 5 (teardown)

2018-10-30 15:28:34,395+1000 INFO  (jsonrpc/4) [vdsm.api] START teardownImage(sdUUID='f7eeca0e-b360-4d88-959a-1e0e0730f846', spUUID='f18d59c0-d67d-11e8-830f-52540015c1ff', imgUUID='788b33b4-83cf-4093-a5e5-a4401008fc03', volUUID=None) from=::ffff:192.168.100.253,45974, flow_id=1fdf83be-21b6-4817-9ff2-7e254375f1ec, task_id=9d94660a-0be5-4ad7-9ab4-ecb2ce19f304 (api:46)

But the LVs are in use by a VM running on the host (the VM being backed up):

2018-10-30 15:28:44,161+1000 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='9d94660a-0be5-4ad7-9ab4-ecb2ce19f304') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in teardownImage
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3233, in teardownImage
    dom.deactivateImage(imgUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1302, in deactivateImage
    lvm.deactivateLVs(self.sdUUID, volUUIDs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1303, in deactivateLVs
    _setLVAvailability(vgName, toDeactivate, "n")
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 834, in _setLVAvailability
    raise error(str(e))
CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\'  Logical volume f7eeca0e-b360-4d88-959a-1e0e0730f846/f7473300-006d-43ce-b4de-f4ab82358ca2 in use.\', \'  Logical volume f7eeca0e-b360-4d88-959a-1e0e0730f846/0b207436-50b1-460a-bb68-cf97f45cf042 in use.\']\\nf7eeca0e-b360-4d88-959a-1e0e0730f846/[\'f7473300-006d-43ce-b4de-f4ab82358ca2\', \'0b207436-50b1-460a-bb68-cf97f45cf042\']",)',)

Version-Release number of selected component (if applicable):
rhvm-4.2.6.4-0.1.el7ev.noarch
vdsm-4.20.39.1-1.el7ev.x86_64

How reproducible:
100%

Steps to reproduce:
use the below on block storage. The backup agent VM does not need to be on block, just the target VM for the backup.
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/vm_backup.py

Comment 3 Sandro Bonazzola 2019-01-28 09:41:52 UTC

This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 6 Eyal Shenitzky 2019-05-19 08:04:13 UTC

This does not look like a bug to me.

The user tried to deactivate a disk which is in use by other VM on the same host.
This operation succeeded in the engine side, as it should. 
The failure for deactivating the LV is ok because it is used by another VM on that host, 
no failure was shown to the user, just a log of the teardown failure in the VDSM log. 

The alternatives for handling this issue are:

1) Hide the exception in the VDSM log -> I think it is wrong because we need to know about it.

2) Prevent from the teardown to happened in this specific case:
   a) In the Engine -> Not possible since the teardown is done as part of the hotunplug operation and we are not going to change the flow
   b) In VDSM -> Not possible because we don't have the needed data for the validation at this stage in VDSM

I recommend closing this bug as NOTABUG

Comment 7 Germano Veit Michel 2019-05-19 22:54:07 UTC

(In reply to Eyal Shenitzky from comment #6)
> no failure was shown to the user, just a log of the teardown failure in the
> VDSM log. 

Yes, the user does not see any failure. But it does not seem right to me to try to deactive this LV, and as a consequence fill vdsm logs with exceptions during intensive VM backups.
Still, I understand this would require significant changes.

Once you decide what can or cannot be done I'll update the article and the customer about it.

Comment 8 Tal Nisan 2019-05-22 12:01:23 UTC

Closing as NOTABUG according to comment #6, I don't see any impact aside for the exception in the logs which is logged righteously and attempting to change this behavior might lead to severe regressions

Note You need to log in before you can comment on or make changes to this bug.