Bug 732914
Summary: | vdsm][libvirtconnection] vdsm does not recover when it fails to connect to libvirtd upon startup. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | David Naori <dnaori> | ||||
Component: | vdsm | Assignee: | Federico Simoncelli <fsimonce> | ||||
Status: | CLOSED ERRATA | QA Contact: | David Naori <dnaori> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.1 | CC: | abaron, bazulay, dnaori, fsimonce, hateya, iheim, ilvovsky, mgoldboi, ykaul | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | vdsm-4.9-97.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-12-06 07:25:32 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
David, would the author of http://gerrit.usersys.redhat.com/699 be nice to add "libvirtError: Failed to connect socket" to the errors expected to kill vdsm? Yeylon, why did you add blocker flag? Is it urgent to touch for 3.0? (In reply to comment #2) > David, would the author of http://gerrit.usersys.redhat.com/699 be nice to add > "libvirtError: Failed to connect socket" to the errors expected to kill vdsm? I'm afraid this is not the case here, it is not in the try/except block: 93 if not conn: 94 conn = libvirt.openAuth('qemu:///system', auth, 0) * tried to put it in a try/except block and call prepareForShutdown if it fails - it's not doing the job in this case. David, are you sure? This works for me: commit 067f769de4df00cf4015e82acd16c1319938a14f Author: Federico Simoncelli <fsimonce> Date: Mon Aug 29 11:01:16 2011 +0000 BZ#732914 VDSM must exit if libvirt is not running Change-Id: I673184b8e5d765a9397f3fc14a70f7c31b907b3e http://gerrit.usersys.redhat.com/861 The problem here is quite tricky. When we issue the prepareForShutdown at startup if vdsm was previously connected to a pool then we don't have stoppable threads running yet but storageRefresh is trying to reconnect to the pool: threading.Thread(target=storageRefresh).start() (hsm.py:192) and eventually when it succeeds it will start new non-demoniac threads which then won't be stopped. I tried already several solutions but I didn't find the definitive working one. commit dc34ed11fe964fe2cdcc89e4df7f7f96cb639332 Author: Federico Simoncelli <fsimonce> Date: Mon Sep 5 16:25:53 2011 +0000 BZ#732914 Check libvirt connection on startup Change-Id: I913acefd3d41bc34e831783687f287d92c7aa282 http://gerrit.usersys.redhat.com/896 clearing needinfo flag since it's fixed already. MainThread::INFO::2011-09-19 21:56:01,703::vdsm::71::vds::(run) I am the actual vdsm 4.9-100 MainThread::ERROR::2011-09-19 21:56:01,896::vdsm::74::vds::(run) Traceback (most recent call last): File "/usr/share/vdsm//vdsm", line 72, in run serve_clients(log) File "/usr/share/vdsm//vdsm", line 40, in serve_clients cif = clientIF.clientIF(log) File "/usr/share/vdsm/clientIF.py", line 92, in __init__ self._libvirt = libvirtconnection.get() File "/usr/share/vdsm/libvirtconnection.py", line 94, in get conn = libvirt.openAuth('qemu:///system', auth, 0) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth if ret is None:raise libvirtError('virConnectOpenAuth() failed') libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory MainThread::INFO::2011-09-19 21:56:01,896::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 1 other threads... MainThread::INFO::2011-09-19 21:56:01,896::vdsm::79::vds::(run) <_MainThread(MainThread, started 140116261517056)> MainThread::INFO::2011-09-19 21:56:01,896::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 140116177499904)> MainThread::INFO::2011-09-19 21:56:01,965::vdsm::71::vds::(run) I am the actual vdsm 4.9-100 Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2011-1782.html |
Created attachment 519562 [details] vdsm log Description of problem: When vdsm fails to connect to libvirtd upon startup (libvirtd is not running in that exact time), it stays forever in "recoverying from crash or initializing" and does not take itself down. clientIFinit::ERROR::2011-08-24 02:11:20,634::clientIF::938::vds::(_recoverExistingVms) Vm's recovery failed Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 903, in _recoverExistingVms vdsmVms = self.getVDSMVms() File "/usr/share/vdsm/clientIF.py", line 964, in getVDSMVms conn = libvirtconnection.get(self) File "/usr/share/vdsm/libvirtconnection.py", line 106, in get conn = libvirt.openAuth('qemu:///system', auth, 0) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth if ret is None:raise libvirtError('virConnectOpenAuth() failed') libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory Version-Release number of selected component (if applicable): vdsm-4.9-95.el6.x86_64 libvirt-0.9.4-4.el6.x86_64 How reproducible: 100% Steps to Reproduce: (On a host with running vms) 1.`/etc/init.d/vdsmd restart && initctl stop libvirtd` Actual results: Expected results: Additional info: