Created attachment 895408 [details] log.tar.gz Description of problem: Register Hypervisor to RHEV-M, and then move the cursor to oVirt Engine menu, the hypervisor will logout suddenly. It is only occurs on the first registration. Version-Release number of selected component (if applicable): rhev-hypervisor6-6.5-20140513.0 ovirt-node-plugin-vdsm-0.1.1-19.el6ev.noarch ovirt-node-3.0.1-18.el6_5.10.noarch rhevm av 9.1 How reproducible: 100% Steps to Reproduce: 1. Install rhev-hypervisor6-6.5-20140513.0 2. Approve it and change status to up. 3. Move cursor to oVirt Engine menu Actual results: 1. The hypervisor will logout suddenly when move cursor to oVirt-Engine menu 2. The situation is only occurs on the first registration. Expected results: Can move to oVirt Engine menu but no logout. Additional info:
Chen, does the hypervisor get really unregistered in RHEV-M or does it only appear in the TUI as if the hypervisor get's unregistered?
(In reply to Fabian Deutsch from comment #1) > Chen, > > does the hypervisor get really unregistered in RHEV-M or does it only appear > in the TUI as if the hypervisor get's unregistered? I missunderstood the problem. The setup TUI is quit after registration.
Created attachment 895427 [details] video for bug
Quote from the Chen's video: libvir: XML-RPC error : Cannot write data: Broken pipe libvir: XML-RPC error : internal error client socket is closed taking calling process down. Which comes from here: http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/libvirtconnection.py;hb=HEAD#l119 Maybe the vdsm lib has a problem bcause libvirt got restarted and it's pipe got closed. The question is if we can make a call which does not kill the calling process. Mooli, can you say if we can make a call to vdsm.netinfo.networks() in a way that the host process won't be terminated?
(In reply to shaochen from comment #0) > Created attachment 895408 [details] > log.tar.gz > > Description of problem: > Register Hypervisor to RHEV-M, and then move the cursor to oVirt Engine > menu, the hypervisor will logout suddenly. It is only occurs on the first > registration. > > Version-Release number of selected component (if applicable): > rhev-hypervisor6-6.5-20140513.0 > ovirt-node-plugin-vdsm-0.1.1-19.el6ev.noarch > ovirt-node-3.0.1-18.el6_5.10.noarch > rhevm av 9.1 > > How reproducible: > 100% > > Steps to Reproduce: > 1. Install rhev-hypervisor6-6.5-20140513.0 > 2. Approve it and change status to up. > 3. Move cursor to oVirt Engine menu > Reproduced
@Fabian, yes, it is possible, you can just monkeypatch libvirtconnection.get from functools import partial from vdsm import netinfo # This you already had # Let's monkey patch _libvirtconn_get = netinfo.libvirtconnection.get netinfo.libvirtconnection.get = partial(_libvirtconn_get, killOnFailure=False) This will raise a libvirterror instead of killing the process. vdsm utils also has some monkeypatch utility if you want. You can see it in action: In [31]: netinfo.libvirtconnection.get = partial(_netinfo_libvirt_get, killOnFailure=False) In [32]: netinfo.networks() Out[32]: {'other': {'bridge': u'other', 'bridged': True}} In [33]: netinfo.networks() Out[33]: {'other': {'bridge': u'other', 'bridged': True}} In [34]: netinfo.networks() Out[34]: {'other': {'bridge': u'other', 'bridged': True}} Here I stop libvirtd in the machine In [35]: netinfo.networks() libvir: XML-RPC error : Cannot write data: Broken pipe libvir: XML-RPC error : internal error client socket is closed No handlers could be found for logger "root" --------------------------------------------------------------------------- libvirtError Traceback (most recent call last) /root/<ipython console> in <module>() /usr/lib64/python2.6/site-packages/vdsm/netinfo.pyc in networks() 110 nets = {} 111 conn = libvirtconnection.get() --> 112 allNets = ((net, net.name()) for net in conn.listAllNetworks(0)) 113 for net, netname in allNets: 114 if netname.startswith(LIBVIRT_NET_PREFIX): /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.pyc in wrapper(*args, **kwargs) 108 if edom in EDOMAINS and ecode in ECODES: 109 try: --> 110 __connections.get(id(target)).pingLibvirt() 111 except libvirt.libvirtError as e: 112 edom = e.get_error_domain() /usr/lib64/python2.6/site-packages/libvirt.pyc in getLibVersion(self) 3387 """Returns the libvirt version of the connection host """ 3388 ret = libvirtmod.virConnectGetLibVersion(self._o) -> 3389 if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self) 3390 return ret 3391 libvirtError: internal error client socket is closed
Hey Antoni, thanks for the detailed explanation. IMHO the default should rather be an exception instead of killing the caller - or killing anyone. The caller itself can then decide what to do when the exception is raised.
@Fabian: I tend to agree. @Douglas: Maybe you can ask infra if the change of default behavior is okay with them and reassign the bug to them.
(In reply to Fabian Deutsch from comment #7) > IMHO the default should rather be an exception instead of killing the caller > - or killing anyone. > The caller itself can then decide what to do when the exception is raised. That would have been more polite; but Vdsm uses libvirt in so many circumstances, that it can be difficult to ensure. Everything changes when non-Vdsm processes begin to use vdsm libraries. We should have an explicit global libvirtconnection.KILL_CALLER=False. Vdsm could set it to True on startup.
Hi, Thanks guys for all information. If I use the below patch and add a try/except in ovirt-node-plugin-vdsm to retry to collect the network data after the exception it works. Please let me know your thoughts. libvirtconnection: retry to establish connection http://gerrit.ovirt.org/#/c/27754/ Thanks!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0673.html