1028045 – [engine-backend] trying to power-off a VM twice should not cause spurious errors in vdsm

Bug 1028045 - [engine-backend] trying to power-off a VM twice should not cause spurious errors in vdsm

Summary: [engine-backend] trying to power-off a VM twice should not cause spurious err...

Keywords:
Status:	CLOSED DUPLICATE of bug 912390
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.3.0
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Francesco Romani
QA Contact:
Docs Contact:
URL:
Whiteboard:	virt
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-11-07 14:33 UTC by Elad
Modified:	2014-04-11 06:29 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-04-11 06:29:26 UTC
oVirt Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
engine and vdsm logs (5.46 MB, application/x-gzip) 2013-11-07 14:33 UTC, Elad	no flags	Details
View All

Description Elad 2013-11-07 14:33:47 UTC

Created attachment 821172 [details]
engine and vdsm logs

Description of problem:
When I try to power-off a VM for the second time, right after the first, engine sends DestroyVmVDSCommand to VDSM twice, when it suppose to block me from doing so with a CanDoAction failure

Version-Release number of selected component (if applicable):
rhevm-3.3.0-0.31.beta1.el6ev.noarch

How reproducible:
Reproducible when we have a VM with disk snapshot that was attached to it from another VM, as part of the new backup-API feature. This is necessary because when we have a disk snapshot attached to a VM, the power-off action takes more time and we are able to stop the VM more than once using the UI. 

Steps to Reproduce:
On a block pool:
1) create a VM with disk attached and create a snapshot
2) attach the disk snapshot using Rest-API to a second VM and run the new VM
3) power-off the new VM and do it again immediately after


Actual results:

Engine sends DestroyVmVDSCommand to VDSM twice and get a failure from VDSM:

2013-11-07 16:16:25,686 INFO  [org.ovirt.engine.core.bll.StopVmCommand] (pool-4-thread-45) [64ba6b30] Running command: StopVmCommand internal: false. Entities affected :  ID: c7df45a6-5dff-491e-9967-7097c49d7ae7 T
ype: VM
2013-11-07 16:16:25,690 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-45) [64ba6b30] START, DestroyVmVDSCommand(HostName = nott-vds1, HostId = 599ac308-ee01-48bf-9da9-b1c33750d77b, vmI
d=c7df45a6-5dff-491e-9967-7097c49d7ae7, force=false, secondsToWait=0, gracefully=false), log id: 1095c218
2013-11-07 16:16:25,695 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-45) [64ba6b30] START, DestroyVDSCommand(HostName = nott-vds1, HostId = 599ac308-ee01-48bf-9da9-b1c33750d77
b, vmId=c7df45a6-5dff-491e-9967-7097c49d7ae7, force=false, secondsToWait=0, gracefully=false), log id: 62f43c19
2013-11-07 16:16:45,360 ERROR [org.ovirt.engine.core.bll.gluster.tasks.GlusterTasksService] (DefaultQuartzScheduler_Worker-57) [10ed2d08] No up server in cluster
2013-11-07 16:16:51,988 INFO  [org.ovirt.engine.core.bll.StopVmCommand] (pool-4-thread-49) [24dbedbd] Running command: StopVmCommand internal: false. Entities affected :  ID: c7df45a6-5dff-491e-9967-7097c49d7ae7 T
ype: VM
2013-11-07 16:16:51,992 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-49) [24dbedbd] START, DestroyVmVDSCommand(HostName = nott-vds1, HostId = 599ac308-ee01-48bf-9da9-b1c33750d77b, vmI
d=c7df45a6-5dff-491e-9967-7097c49d7ae7, force=false, secondsToWait=0, gracefully=false), log id: 7ef71809
2013-11-07 16:17:25,754 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-45) [64ba6b30] FINISH, DestroyVDSCommand, log id: 62f43c19
2013-11-07 16:17:25,806 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-45) [64ba6b30] FINISH, DestroyVmVDSCommand, return: Down, log id: 1095c218
2013-11-07 16:17:25,822 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-45) [64ba6b30] Correlation ID: 64ba6b30, Job ID: 9939d55f-9845-4aa0-be70-2e5a96b749fb, Call Stack: null, Custom Event ID: -1, Message: VM iscsi2-3 powered off by admin@internal (Host: nott-vds1).
2013-11-07 16:17:25,862 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-49) [24dbedbd] START, DestroyVDSCommand(HostName = nott-vds1, HostId = 599ac308-ee01-48bf-9da9-b1c33750d77b, vmId=c7df45a6-5dff-491e-9967-7097c49d7ae7, force=false, secondsToWait=0, gracefully=false), log id: 7c5f64bc
2013-11-07 16:17:25,876 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-49) [24dbedbd] Failed in DestroyVDS method
2013-11-07 16:17:25,877 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-49) [24dbedbd] Error code noVM and error message VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine does not exist


Failure to deactivate the volume for the second time on VDSM:

Thread-61544::ERROR::2013-11-07 16:17:00,502::task::850::TaskManager.Task::(_setError) Task=`ad7e491a-d29d-43a2-96bf-c7a015d8b45a`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3301, in teardownImage
    dom.deactivateImage(imgUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 999, in deactivateImage
    lvm.deactivateLVs(self.sdUUID, volUUIDs)
  File "/usr/share/vdsm/storage/lvm.py", line 1139, in deactivateLVs
    _setLVAvailability(vgName, toDeactivate, "n")
  File "/usr/share/vdsm/storage/lvm.py", line 793, in _setLVAvailability
    raise error(str(e))
CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\'  Logical volume fdc629ee-f335-49f0-ba18-627298358070/a4cd2319-685b-44c2-b91a-85151c45e21a in use.\', \'  Logical volume fdc629ee-f335-49f0-ba18-627298358070/7893d167-69a3-4fba-b179-393fa7507c07 in use.\', \'  Logical volume fdc629ee-f335-49f0-ba18-627298358070/927eeb62-e0b3-43cf-a9fb-2b341e244672 in use.\']\\nfdc629ee-f335-49f0-ba18-627298358070/[\'927eeb62-e0b3-43cf-a9fb-2b341e244672\', \'a4cd2319-685b-44c2-b91a-85151c45e21a\', \'7893d167-69a3-4fba-b179-393fa7507c07\']",)',)


Expected results:
Engine should not send the second DestroyVmVDSCommand to VDSM. User should be blocked from powering off the VM for the second time.

Additional info:
logs

Comment 1 Itamar Heim 2013-11-08 08:55:25 UTC

I'm not sure i agree. I think engine should allow to send the command more than once to vdsm, maybe the first one got cancelled by the user for example, etc.
engine could behave more gracefully wrt to the exception maybe.

Comment 2 Michal Skrivanek 2013-11-13 12:44:57 UTC

if it didn't cause any other problem I'd also prefer not to "fix" it as a potential problems with tracking the progress of shutdown in vdsm would have much more serious consequences and potential problems.

Comment 3 Michal Skrivanek 2013-11-13 12:46:22 UTC

changing tile, keeping open for cleaning up the flow

Comment 5 Michal Skrivanek 2014-01-31 09:20:50 UTC

possibly (to be) addressed by Martin's http://gerrit.ovirt.org/#/c/23457/ ?

Comment 6 Martin Betak 2014-02-17 11:02:04 UTC

http://gerrit.ovirt.org/#/c/23457/ only refactors the existing Graceful shutdown and reboot commands. Destroy is handled separately and untouched by this patch.

Comment 7 Michal Skrivanek 2014-03-25 10:30:14 UTC

it's just the log, descreasing severity

Comment 8 Francesco Romani 2014-03-27 15:44:13 UTC

taking the bug, will improve the logging

Comment 9 Francesco Romani 2014-04-09 15:50:05 UTC

after deeper inspection:

* refactoring/cleanup of the destroy flow is already planned for https://bugzilla.redhat.com/show_bug.cgi?id=912390

* unless I'm missing something, I don't see logging to be improved (or added) here on the virt flow.

* the reporter quoted an error into the storage subsystem (which I think should _not_ be removed)

for those reasons I'm considering to close this bug, possibly to mark it as duplicate of 912390. Michal, do you agree with this?

Comment 10 Michal Skrivanek 2014-04-11 06:29:26 UTC

makes sense. Logging is fine as long as we show to the user something non-scary. I don't care much about vdsm logs, but in UI we shouldn't say anything if we find the VM down. To differentiate, though, we may need to do an extra check at the engine level to see if that VM is not running somewhere else (I'm thinking of a race at the end of migration before the state is updated in UI one would send poweroff to the source host where the VM doesn't run any more)

*** This bug has been marked as a duplicate of bug 912390 ***

Note You need to log in before you can comment on or make changes to this bug.