Bug 1097645
| Summary: | vdsm.libvirtconnection kills TUI upon libvirt restart. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | cshao <cshao> | ||||||
| Component: | ovirt-node-plugin-vdsm | Assignee: | Douglas Schilling Landgraf <dougsland> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Pavel Stehlik <pstehlik> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 3.4.0 | CC: | asegurap, bazulay, cpelland, cshao, danken, fdeutsch, gklein, gouyang, guasun, hadong, hateya, huiwa, iheim, jboggs, leiwang, mtayer, pstehlik, rbarry, tpoitras, yaniwang, ycui | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 3.4.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | infra | ||||||||
| Fixed In Version: | ovirt-node-plugin-vdsm-0.1.1-20.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Previously, a hypervisor registered to Red Hat Enterprise Virtualization Manager would encounter a broken pipe if the cursor was moved over the oVirt Engine menu after the first registration. This caused the hypervisor to suddenly logout. Now, the issue has been corrected, and the sudden logouts do not happen.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-06-09 14:26:38 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Chen, does the hypervisor get really unregistered in RHEV-M or does it only appear in the TUI as if the hypervisor get's unregistered? (In reply to Fabian Deutsch from comment #1) > Chen, > > does the hypervisor get really unregistered in RHEV-M or does it only appear > in the TUI as if the hypervisor get's unregistered? I missunderstood the problem. The setup TUI is quit after registration. Created attachment 895427 [details]
video for bug
Quote from the Chen's video: libvir: XML-RPC error : Cannot write data: Broken pipe libvir: XML-RPC error : internal error client socket is closed taking calling process down. Which comes from here: http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/libvirtconnection.py;hb=HEAD#l119 Maybe the vdsm lib has a problem bcause libvirt got restarted and it's pipe got closed. The question is if we can make a call which does not kill the calling process. Mooli, can you say if we can make a call to vdsm.netinfo.networks() in a way that the host process won't be terminated? (In reply to shaochen from comment #0) > Created attachment 895408 [details] > log.tar.gz > > Description of problem: > Register Hypervisor to RHEV-M, and then move the cursor to oVirt Engine > menu, the hypervisor will logout suddenly. It is only occurs on the first > registration. > > Version-Release number of selected component (if applicable): > rhev-hypervisor6-6.5-20140513.0 > ovirt-node-plugin-vdsm-0.1.1-19.el6ev.noarch > ovirt-node-3.0.1-18.el6_5.10.noarch > rhevm av 9.1 > > How reproducible: > 100% > > Steps to Reproduce: > 1. Install rhev-hypervisor6-6.5-20140513.0 > 2. Approve it and change status to up. > 3. Move cursor to oVirt Engine menu > Reproduced @Fabian, yes, it is possible, you can just monkeypatch libvirtconnection.get
from functools import partial
from vdsm import netinfo # This you already had
# Let's monkey patch
_libvirtconn_get = netinfo.libvirtconnection.get
netinfo.libvirtconnection.get = partial(_libvirtconn_get,
killOnFailure=False)
This will raise a libvirterror instead of killing the process.
vdsm utils also has some monkeypatch utility if you want.
You can see it in action:
In [31]: netinfo.libvirtconnection.get = partial(_netinfo_libvirt_get, killOnFailure=False)
In [32]: netinfo.networks()
Out[32]: {'other': {'bridge': u'other', 'bridged': True}}
In [33]: netinfo.networks()
Out[33]: {'other': {'bridge': u'other', 'bridged': True}}
In [34]: netinfo.networks()
Out[34]: {'other': {'bridge': u'other', 'bridged': True}}
Here I stop libvirtd in the machine
In [35]: netinfo.networks()
libvir: XML-RPC error : Cannot write data: Broken pipe
libvir: XML-RPC error : internal error client socket is closed
No handlers could be found for logger "root"
---------------------------------------------------------------------------
libvirtError Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib64/python2.6/site-packages/vdsm/netinfo.pyc in networks()
110 nets = {}
111 conn = libvirtconnection.get()
--> 112 allNets = ((net, net.name()) for net in conn.listAllNetworks(0))
113 for net, netname in allNets:
114 if netname.startswith(LIBVIRT_NET_PREFIX):
/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.pyc in wrapper(*args, **kwargs)
108 if edom in EDOMAINS and ecode in ECODES:
109 try:
--> 110 __connections.get(id(target)).pingLibvirt()
111 except libvirt.libvirtError as e:
112 edom = e.get_error_domain()
/usr/lib64/python2.6/site-packages/libvirt.pyc in getLibVersion(self)
3387 """Returns the libvirt version of the connection host """
3388 ret = libvirtmod.virConnectGetLibVersion(self._o)
-> 3389 if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self)
3390 return ret
3391
libvirtError: internal error client socket is closed
Hey Antoni, thanks for the detailed explanation. IMHO the default should rather be an exception instead of killing the caller - or killing anyone. The caller itself can then decide what to do when the exception is raised. @Fabian: I tend to agree. @Douglas: Maybe you can ask infra if the change of default behavior is okay with them and reassign the bug to them. (In reply to Fabian Deutsch from comment #7) > IMHO the default should rather be an exception instead of killing the caller > - or killing anyone. > The caller itself can then decide what to do when the exception is raised. That would have been more polite; but Vdsm uses libvirt in so many circumstances, that it can be difficult to ensure. Everything changes when non-Vdsm processes begin to use vdsm libraries. We should have an explicit global libvirtconnection.KILL_CALLER=False. Vdsm could set it to True on startup. Hi, Thanks guys for all information. If I use the below patch and add a try/except in ovirt-node-plugin-vdsm to retry to collect the network data after the exception it works. Please let me know your thoughts. libvirtconnection: retry to establish connection http://gerrit.ovirt.org/#/c/27754/ Thanks! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0673.html |
Created attachment 895408 [details] log.tar.gz Description of problem: Register Hypervisor to RHEV-M, and then move the cursor to oVirt Engine menu, the hypervisor will logout suddenly. It is only occurs on the first registration. Version-Release number of selected component (if applicable): rhev-hypervisor6-6.5-20140513.0 ovirt-node-plugin-vdsm-0.1.1-19.el6ev.noarch ovirt-node-3.0.1-18.el6_5.10.noarch rhevm av 9.1 How reproducible: 100% Steps to Reproduce: 1. Install rhev-hypervisor6-6.5-20140513.0 2. Approve it and change status to up. 3. Move cursor to oVirt Engine menu Actual results: 1. The hypervisor will logout suddenly when move cursor to oVirt-Engine menu 2. The situation is only occurs on the first registration. Expected results: Can move to oVirt Engine menu but no logout. Additional info: