Bug 806938 - [ovirt] [vdsm] call destroy returns True altough vm is not destroyed which creates split brain
Summary: [ovirt] [vdsm] call destroy returns True altough vm is not destroyed which cr...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Saveliev Peter
QA Contact:
URL:
Whiteboard: virt
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-26 14:36 UTC by Haim
Modified: 2014-01-13 00:51 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-04-17 10:57:15 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)

Description Haim 2012-03-26 14:36:59 UTC
Description of problem:

potential split brain running the following scenario:

- run vm 
- cause vm to enter pause state 
- using web-admin, stop vm (or using API, call destroy)
- call destroy return with code: done, qemu-process keeps on leaving source. 

analysis: great chances its a libvirt bug. 

Following code was taken from vdsm/libvirtvm.py:
- in logs, i see we enter the exception handling, so here, we assume (libvirt reports) domain is not running, we print the exception, and set vm down, but qemu keeps on running. 

1970     def releaseVm(self):
1971         """
1972         Stop VM and release all resources
1973         """
1974         with self._releaseLock:
1975             if self._released:
1976                 return {'status': doneCode}
1977 
1978             self.log.info('Release VM resources')
1979             self.lastStatus = 'Powering down'
1980             try:
1981                 if self._vmStats:
1982                     self._vmStats.stop()
1983                 if self.guestAgent:
1984                     self.guestAgent.stop()
1985                 if self._dom:
1986                     self._dom.destroy()
1987             except libvirt.libvirtError, e:
1988                 if e.get_error_code() == libvirt.VIR_ERR_NO_DOMAIN:
1989                     self.log.warning("libvirt domain not found", exc_info=True)
1990                 else:
1991                     self.log.warn("VM %s is not running", self.conf['vmId'])
1992 
1993             self.cif.ksmMonitor.adjust()
1994             self._cleanup()
1995 
1996             self.cif.irs.inappropriateDevices(self.id)


Logs:


Thread-5161::DEBUG::2012-03-26 14:23:34,417::BindingXMLRPC::869::vds::(wrapper) client [10.35.97.30]::call vmDestroy with ('9e669b36-cf3b-4cef-81c7-5cd5a522bfcc',) {} flowID [38746fd3]
Thread-5161::INFO::2012-03-26 14:23:34,417::API::300::vds::(destroy) vmContainerLock acquired by vm 9e669b36-cf3b-4cef-81c7-5cd5a522bfcc
Thread-5161::DEBUG::2012-03-26 14:23:34,417::libvirtvm::2016::vm.Vm::(destroy) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::destroy Called
Thread-5161::INFO::2012-03-26 14:23:34,418::libvirtvm::1978::vm.Vm::(releaseVm) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::Release VM resources
Thread-5161::WARNING::2012-03-26 14:23:34,418::vm::327::vm.Vm::(_set_lastStatus) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::trying to set state to Powering down when already Down
Thread-5161::DEBUG::2012-03-26 14:23:34,418::utils::336::vm.Vm::(stop) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::Stop statistics collection
Thread-5161::DEBUG::2012-03-26 14:23:34,419::vmChannels::152::vds::(unregister) Delete fileno 22 from listener.
Thread-5161::WARNING::2012-03-26 14:23:34,421::libvirtvm::1989::vm.Vm::(releaseVm) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::libvirt domain not found
Traceback (most recent call last):
  File "/usr/share/vdsm/libvirtvm.py", line 1986, in releaseVm
    self._dom.destroy()
  File "/usr/share/vdsm/libvirtvm.py", line 490, in f
    ret = attr(*args, **kwargs)
  File "/usr/share/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 658, in destroy
    if ret == -1: raise libvirtError ('virDomainDestroy() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid '9e669b36-cf3b-4cef-81c7-5cd5a522bfcc'
Thread-5161::DEBUG::2012-03-26 14:23:34,421::utils::602::Storage.Misc.excCmd::(execCmd) '/usr/bin/sudo -n /sbin/service ksmtuned retune' (cwd None)
Thread-5161::DEBUG::2012-03-26 14:23:34,456::utils::602::Storage.Misc.excCmd::(execCmd) FAILED: <err> = 'Unknown operation retune\n'; <rc> = 1
Thread-5161::DEBUG::2012-03-26 14:23:34,457::vmChannels::152::vds::(unregister) Delete fileno 22 from listener.
Thread-5161::DEBUG::2012-03-26 14:23:34,458::task::588::TaskManager.Task::(_updateState) Task=`b2c09668-9f5d-4660-b2e9-a41c0619fdf4`::moving from state init -> state preparing
Thread-5161::INFO::2012-03-26 14:23:34,458::logUtils::37::dispatcher::(wrapper) Run and protect: inappropriateDevices(thiefId='9e669b36-cf3b-4cef-81c7-5cd5a522bfcc')
Thread-5161::INFO::2012-03-26 14:23:34,461::logUtils::39::dispatcher::(wrapper) Run and protect: inappropriateDevices, Return response: None
Thread-5161::DEBUG::2012-03-26 14:23:34,461::task::1172::TaskManager.Task::(prepare) Task=`b2c09668-9f5d-4660-b2e9-a41c0619fdf4`::finished: None
Thread-5161::DEBUG::2012-03-26 14:23:34,462::task::588::TaskManager.Task::(_updateState) Task=`b2c09668-9f5d-4660-b2e9-a41c0619fdf4`::moving from state preparing -> state finished
Thread-5161::DEBUG::2012-03-26 14:23:34,462::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-5161::DEBUG::2012-03-26 14:23:34,462::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-5161::DEBUG::2012-03-26 14:23:34,463::task::978::TaskManager.Task::(_decref) Task=`b2c09668-9f5d-4660-b2e9-a41c0619fdf4`::ref 0 aborting False
Thread-5161::DEBUG::2012-03-26 14:23:34,463::libvirtvm::2011::vm.Vm::(deleteVm) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::Total desktops after destroy of 9e669b36-cf3b-4cef-81c7-5cd5a522bfcc is 0
Thread-5161::DEBUG::2012-03-26 14:23:34,464::BindingXMLRPC::875::vds::(wrapper) return vmDestroy with {'status': {'message': 'Machine destroyed', 'code': 0}}

Comment 1 Saveliev Peter 2013-04-15 11:07:40 UTC
I can not reproduce it on libvirt build 0.10.2-4.3, so maybe it is better to install updates. Or maybe not: it depends on what the version do you use now.

Can you please specify versions of:

vdsm
libvirt
qemu

you used to get this issue?


Note You need to log in before you can comment on or make changes to this bug.