Created attachment 821172 [details] engine and vdsm logs Description of problem: When I try to power-off a VM for the second time, right after the first, engine sends DestroyVmVDSCommand to VDSM twice, when it suppose to block me from doing so with a CanDoAction failure Version-Release number of selected component (if applicable): rhevm-3.3.0-0.31.beta1.el6ev.noarch How reproducible: Reproducible when we have a VM with disk snapshot that was attached to it from another VM, as part of the new backup-API feature. This is necessary because when we have a disk snapshot attached to a VM, the power-off action takes more time and we are able to stop the VM more than once using the UI. Steps to Reproduce: On a block pool: 1) create a VM with disk attached and create a snapshot 2) attach the disk snapshot using Rest-API to a second VM and run the new VM 3) power-off the new VM and do it again immediately after Actual results: Engine sends DestroyVmVDSCommand to VDSM twice and get a failure from VDSM: 2013-11-07 16:16:25,686 INFO [org.ovirt.engine.core.bll.StopVmCommand] (pool-4-thread-45) [64ba6b30] Running command: StopVmCommand internal: false. Entities affected : ID: c7df45a6-5dff-491e-9967-7097c49d7ae7 T ype: VM 2013-11-07 16:16:25,690 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-45) [64ba6b30] START, DestroyVmVDSCommand(HostName = nott-vds1, HostId = 599ac308-ee01-48bf-9da9-b1c33750d77b, vmI d=c7df45a6-5dff-491e-9967-7097c49d7ae7, force=false, secondsToWait=0, gracefully=false), log id: 1095c218 2013-11-07 16:16:25,695 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-45) [64ba6b30] START, DestroyVDSCommand(HostName = nott-vds1, HostId = 599ac308-ee01-48bf-9da9-b1c33750d77 b, vmId=c7df45a6-5dff-491e-9967-7097c49d7ae7, force=false, secondsToWait=0, gracefully=false), log id: 62f43c19 2013-11-07 16:16:45,360 ERROR [org.ovirt.engine.core.bll.gluster.tasks.GlusterTasksService] (DefaultQuartzScheduler_Worker-57) [10ed2d08] No up server in cluster 2013-11-07 16:16:51,988 INFO [org.ovirt.engine.core.bll.StopVmCommand] (pool-4-thread-49) [24dbedbd] Running command: StopVmCommand internal: false. Entities affected : ID: c7df45a6-5dff-491e-9967-7097c49d7ae7 T ype: VM 2013-11-07 16:16:51,992 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-49) [24dbedbd] START, DestroyVmVDSCommand(HostName = nott-vds1, HostId = 599ac308-ee01-48bf-9da9-b1c33750d77b, vmI d=c7df45a6-5dff-491e-9967-7097c49d7ae7, force=false, secondsToWait=0, gracefully=false), log id: 7ef71809 2013-11-07 16:17:25,754 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-45) [64ba6b30] FINISH, DestroyVDSCommand, log id: 62f43c19 2013-11-07 16:17:25,806 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-45) [64ba6b30] FINISH, DestroyVmVDSCommand, return: Down, log id: 1095c218 2013-11-07 16:17:25,822 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-45) [64ba6b30] Correlation ID: 64ba6b30, Job ID: 9939d55f-9845-4aa0-be70-2e5a96b749fb, Call Stack: null, Custom Event ID: -1, Message: VM iscsi2-3 powered off by admin@internal (Host: nott-vds1). 2013-11-07 16:17:25,862 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-49) [24dbedbd] START, DestroyVDSCommand(HostName = nott-vds1, HostId = 599ac308-ee01-48bf-9da9-b1c33750d77b, vmId=c7df45a6-5dff-491e-9967-7097c49d7ae7, force=false, secondsToWait=0, gracefully=false), log id: 7c5f64bc 2013-11-07 16:17:25,876 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-49) [24dbedbd] Failed in DestroyVDS method 2013-11-07 16:17:25,877 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-49) [24dbedbd] Error code noVM and error message VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine does not exist Failure to deactivate the volume for the second time on VDSM: Thread-61544::ERROR::2013-11-07 16:17:00,502::task::850::TaskManager.Task::(_setError) Task=`ad7e491a-d29d-43a2-96bf-c7a015d8b45a`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3301, in teardownImage dom.deactivateImage(imgUUID) File "/usr/share/vdsm/storage/blockSD.py", line 999, in deactivateImage lvm.deactivateLVs(self.sdUUID, volUUIDs) File "/usr/share/vdsm/storage/lvm.py", line 1139, in deactivateLVs _setLVAvailability(vgName, toDeactivate, "n") File "/usr/share/vdsm/storage/lvm.py", line 793, in _setLVAvailability raise error(str(e)) CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume fdc629ee-f335-49f0-ba18-627298358070/a4cd2319-685b-44c2-b91a-85151c45e21a in use.\', \' Logical volume fdc629ee-f335-49f0-ba18-627298358070/7893d167-69a3-4fba-b179-393fa7507c07 in use.\', \' Logical volume fdc629ee-f335-49f0-ba18-627298358070/927eeb62-e0b3-43cf-a9fb-2b341e244672 in use.\']\\nfdc629ee-f335-49f0-ba18-627298358070/[\'927eeb62-e0b3-43cf-a9fb-2b341e244672\', \'a4cd2319-685b-44c2-b91a-85151c45e21a\', \'7893d167-69a3-4fba-b179-393fa7507c07\']",)',) Expected results: Engine should not send the second DestroyVmVDSCommand to VDSM. User should be blocked from powering off the VM for the second time. Additional info: logs
I'm not sure i agree. I think engine should allow to send the command more than once to vdsm, maybe the first one got cancelled by the user for example, etc. engine could behave more gracefully wrt to the exception maybe.
if it didn't cause any other problem I'd also prefer not to "fix" it as a potential problems with tracking the progress of shutdown in vdsm would have much more serious consequences and potential problems.
changing tile, keeping open for cleaning up the flow
possibly (to be) addressed by Martin's http://gerrit.ovirt.org/#/c/23457/ ?
http://gerrit.ovirt.org/#/c/23457/ only refactors the existing Graceful shutdown and reboot commands. Destroy is handled separately and untouched by this patch.
it's just the log, descreasing severity
taking the bug, will improve the logging
after deeper inspection: * refactoring/cleanup of the destroy flow is already planned for https://bugzilla.redhat.com/show_bug.cgi?id=912390 * unless I'm missing something, I don't see logging to be improved (or added) here on the virt flow. * the reporter quoted an error into the storage subsystem (which I think should _not_ be removed) for those reasons I'm considering to close this bug, possibly to mark it as duplicate of 912390. Michal, do you agree with this?
makes sense. Logging is fine as long as we show to the user something non-scary. I don't care much about vdsm logs, but in UI we shouldn't say anything if we find the VM down. To differentiate, though, we may need to do an extra check at the engine level to see if that VM is not running somewhere else (I'm thinking of a race at the end of migration before the state is updated in UI one would send poweroff to the source host where the VM doesn't run any more) *** This bug has been marked as a duplicate of bug 912390 ***