Description of problem: potential split brain running the following scenario: - run vm - cause vm to enter pause state - using web-admin, stop vm (or using API, call destroy) - call destroy return with code: done, qemu-process keeps on leaving source. analysis: great chances its a libvirt bug. Following code was taken from vdsm/libvirtvm.py: - in logs, i see we enter the exception handling, so here, we assume (libvirt reports) domain is not running, we print the exception, and set vm down, but qemu keeps on running. 1970 def releaseVm(self): 1971 """ 1972 Stop VM and release all resources 1973 """ 1974 with self._releaseLock: 1975 if self._released: 1976 return {'status': doneCode} 1977 1978 self.log.info('Release VM resources') 1979 self.lastStatus = 'Powering down' 1980 try: 1981 if self._vmStats: 1982 self._vmStats.stop() 1983 if self.guestAgent: 1984 self.guestAgent.stop() 1985 if self._dom: 1986 self._dom.destroy() 1987 except libvirt.libvirtError, e: 1988 if e.get_error_code() == libvirt.VIR_ERR_NO_DOMAIN: 1989 self.log.warning("libvirt domain not found", exc_info=True) 1990 else: 1991 self.log.warn("VM %s is not running", self.conf['vmId']) 1992 1993 self.cif.ksmMonitor.adjust() 1994 self._cleanup() 1995 1996 self.cif.irs.inappropriateDevices(self.id) Logs: Thread-5161::DEBUG::2012-03-26 14:23:34,417::BindingXMLRPC::869::vds::(wrapper) client [10.35.97.30]::call vmDestroy with ('9e669b36-cf3b-4cef-81c7-5cd5a522bfcc',) {} flowID [38746fd3] Thread-5161::INFO::2012-03-26 14:23:34,417::API::300::vds::(destroy) vmContainerLock acquired by vm 9e669b36-cf3b-4cef-81c7-5cd5a522bfcc Thread-5161::DEBUG::2012-03-26 14:23:34,417::libvirtvm::2016::vm.Vm::(destroy) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::destroy Called Thread-5161::INFO::2012-03-26 14:23:34,418::libvirtvm::1978::vm.Vm::(releaseVm) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::Release VM resources Thread-5161::WARNING::2012-03-26 14:23:34,418::vm::327::vm.Vm::(_set_lastStatus) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::trying to set state to Powering down when already Down Thread-5161::DEBUG::2012-03-26 14:23:34,418::utils::336::vm.Vm::(stop) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::Stop statistics collection Thread-5161::DEBUG::2012-03-26 14:23:34,419::vmChannels::152::vds::(unregister) Delete fileno 22 from listener. Thread-5161::WARNING::2012-03-26 14:23:34,421::libvirtvm::1989::vm.Vm::(releaseVm) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::libvirt domain not found Traceback (most recent call last): File "/usr/share/vdsm/libvirtvm.py", line 1986, in releaseVm self._dom.destroy() File "/usr/share/vdsm/libvirtvm.py", line 490, in f ret = attr(*args, **kwargs) File "/usr/share/vdsm/libvirtconnection.py", line 82, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 658, in destroy if ret == -1: raise libvirtError ('virDomainDestroy() failed', dom=self) libvirtError: Domain not found: no domain with matching uuid '9e669b36-cf3b-4cef-81c7-5cd5a522bfcc' Thread-5161::DEBUG::2012-03-26 14:23:34,421::utils::602::Storage.Misc.excCmd::(execCmd) '/usr/bin/sudo -n /sbin/service ksmtuned retune' (cwd None) Thread-5161::DEBUG::2012-03-26 14:23:34,456::utils::602::Storage.Misc.excCmd::(execCmd) FAILED: <err> = 'Unknown operation retune\n'; <rc> = 1 Thread-5161::DEBUG::2012-03-26 14:23:34,457::vmChannels::152::vds::(unregister) Delete fileno 22 from listener. Thread-5161::DEBUG::2012-03-26 14:23:34,458::task::588::TaskManager.Task::(_updateState) Task=`b2c09668-9f5d-4660-b2e9-a41c0619fdf4`::moving from state init -> state preparing Thread-5161::INFO::2012-03-26 14:23:34,458::logUtils::37::dispatcher::(wrapper) Run and protect: inappropriateDevices(thiefId='9e669b36-cf3b-4cef-81c7-5cd5a522bfcc') Thread-5161::INFO::2012-03-26 14:23:34,461::logUtils::39::dispatcher::(wrapper) Run and protect: inappropriateDevices, Return response: None Thread-5161::DEBUG::2012-03-26 14:23:34,461::task::1172::TaskManager.Task::(prepare) Task=`b2c09668-9f5d-4660-b2e9-a41c0619fdf4`::finished: None Thread-5161::DEBUG::2012-03-26 14:23:34,462::task::588::TaskManager.Task::(_updateState) Task=`b2c09668-9f5d-4660-b2e9-a41c0619fdf4`::moving from state preparing -> state finished Thread-5161::DEBUG::2012-03-26 14:23:34,462::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-5161::DEBUG::2012-03-26 14:23:34,462::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-5161::DEBUG::2012-03-26 14:23:34,463::task::978::TaskManager.Task::(_decref) Task=`b2c09668-9f5d-4660-b2e9-a41c0619fdf4`::ref 0 aborting False Thread-5161::DEBUG::2012-03-26 14:23:34,463::libvirtvm::2011::vm.Vm::(deleteVm) vmId=`9e669b36-cf3b-4cef-81c7-5cd5a522bfcc`::Total desktops after destroy of 9e669b36-cf3b-4cef-81c7-5cd5a522bfcc is 0 Thread-5161::DEBUG::2012-03-26 14:23:34,464::BindingXMLRPC::875::vds::(wrapper) return vmDestroy with {'status': {'message': 'Machine destroyed', 'code': 0}}
I can not reproduce it on libvirt build 0.10.2-4.3, so maybe it is better to install updates. Or maybe not: it depends on what the version do you use now. Can you please specify versions of: vdsm libvirt qemu you used to get this issue?