Scenario: Host with vdsm-4.9.6-44.0.el6_3.x86_64 was upgraded to vdsm-4.9.6-44.0.el6_3. (yum update vdsm). After upgrade vdsm daemon not responding to getVdsCapabilities. Getting connection refused from the daemon. VDSMD process is alive and service vdsmd status shows that vdsmd is running. libvird is also running. Full vdsm log attached. here is the exception: ---------------------- Thread-17::INFO::2012-11-27 13:33:44,073::vm::604::vm.Vm::(_startUnderlyingVm) vmId=`098418b0-f484-4f2a-a371-f4ad1f4248da`::Skipping errors on recovery Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 597, in _startUnderlyingVm self._run() File "/usr/share/vdsm/libvirtvm.py", line 1396, in _run self._connection.lookupByUUIDString(self.id), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2682, in lookupByUUIDString if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self) libvirtError: Cannot write data: Broken pipe Thread-21::DEBUG::2012-11-27 13:33:44,073::task::588::TaskManager.Task::(_updateState) Task=`15d0ea2e-7c71-4568-a1c8-c5565ceae0d5`::moving from state init -> state preparing Thread-14::DEBUG::2012-11-27 13:33:44,073::task::588::TaskManager.Task::(_updateState) Task=`8341ada6-d1ba-4e9e-92cf-4c70e1d5bf48`::moving from state preparing -> state finished Thread-19::INFO::2012-11-27 13:33:44,074::vm::604::vm.Vm::(_startUnderlyingVm) vmId=`6ac9d011-0c0c-4945-a367-74fe5df5fd8d`::Skipping errors on recovery Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 597, in _startUnderlyingVm self._run() File "/usr/share/vdsm/libvirtvm.py", line 1396, in _run self._connection.lookupByUUIDString(self.id), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2682, in lookupByUUIDString if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self)
Created attachment 653402 [details] vdsm log
yes. sorry. typo :( Changed the title .
From initial investigation this looks like a race between vdsm starting up and restarting itself (due to libvirt restart after libvirt being upgraded). reboot / kill -9 should solve the issue. Since this is during upgrade no VMs should be running on the machine anyway.
commit 9384951815875b72ae52d82f43dc610de2301e8a Author: Federico Simoncelli <fsimonce> Date: Wed Nov 28 12:27:56 2012 -0500 thread: daemonize HSM_MailMonitor and storageRefresh In order to successfully complete the prepareForShutdown command some additional threads have been switched to daemon. http://gerrit.ovirt.org/#/c/9559/
looks like the above patch will not handle the upgrade scenario (in the old code the threads are not daemonized) Fede ?
(In reply to comment #11) > looks like the above patch will not handle the upgrade scenario (in the old > code the threads are not daemonized) > > Fede ? That's what I thought too at the beginning (and that's why I didn't switch the bz to post earlier). Looking at the logs instead we see: MainThread::INFO::2012-11-27 13:33:26,510::vdsm::70::vds::(run) I am the actual vdsm 4.9-44.0 [...] Thread-21::ERROR::2012-11-27 13:33:44,060::libvirtconnection::93::vds::(wrapper) connection to libvirt broken. taking vdsm down. [...] MainThread::INFO::2012-11-27 13:33:44,897::vdsm::75::vds::(run) VDSM main thread ended. Waiting for 9 other threads... MainThread::INFO::2012-11-27 13:33:44,898::vdsm::78::vds::(run) <_MainThread(MainThread, started 140425021429504)> MainThread::INFO::2012-11-27 13:33:44,898::vdsm::78::vds::(run) <Thread(libvirtEventLoop, started daemon 140424934115072)> MainThread::INFO::2012-11-27 13:33:44,898::vdsm::78::vds::(run) <WorkerThread(Thread-64, started daemon 140423955740416)> MainThread::INFO::2012-11-27 13:33:44,899::vdsm::78::vds::(run) <WorkerThread(Thread-67, started daemon 140423903291136)> MainThread::INFO::2012-11-27 13:33:44,899::vdsm::78::vds::(run) <WorkerThread(Thread-65, started daemon 140424383543040)> MainThread::INFO::2012-11-27 13:33:44,899::vdsm::78::vds::(run) <Thread(Thread-43, started daemon 140423485978368)> MainThread::INFO::2012-11-27 13:33:44,900::vdsm::78::vds::(run) <WorkerThread(Thread-66, started daemon 140423913780992)> MainThread::INFO::2012-11-27 13:33:44,900::vdsm::78::vds::(run) <WorkerThread(Thread-68, started daemon 140423464998656)> MainThread::INFO::2012-11-27 13:33:44,900::vdsm::78::vds::(run) <KsmMonitorThread(KsmMonitor, started daemon 140423945250560)> MainThread::INFO::2012-11-27 13:33:44,901::vdsm::78::vds::(run) <Thread(Thread-12, started 140424373053184)> This means that the vdsm shutting down (and stuck) was the new one.
si25. vdsm-4.9.6-44.2.el6_3.x86_64. was unable to reproduce.
sf6. vdsm-4.10.2-7.0.el6ev.x86_64 . looks ok.
This bug is currently attached to errata RHBA-2012:14332. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
Doc Text is correct.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0886.html