Bug 1097645 - vdsm.libvirtconnection kills TUI upon libvirt restart.
Summary: vdsm.libvirtconnection kills TUI upon libvirt restart.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node-plugin-vdsm
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 3.4.0
Assignee: Douglas Schilling Landgraf
QA Contact: Pavel Stehlik
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-14 08:53 UTC by cshao
Modified: 2016-02-10 19:03 UTC (History)
21 users (show)

Fixed In Version: ovirt-node-plugin-vdsm-0.1.1-20.el6
Doc Type: Bug Fix
Doc Text:
Previously, a hypervisor registered to Red Hat Enterprise Virtualization Manager would encounter a broken pipe if the cursor was moved over the oVirt Engine menu after the first registration. This caused the hypervisor to suddenly logout. Now, the issue has been corrected, and the sudden logouts do not happen.
Clone Of:
Environment:
Last Closed: 2014-06-09 14:26:38 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log.tar.gz (20.83 KB, application/x-gzip)
2014-05-14 08:53 UTC, cshao
no flags Details
video for bug (1.53 MB, video/ogg)
2014-05-14 09:52 UTC, cshao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0673 0 normal SHIPPED_LIVE ovirt-node-plugin-vdsm bug fix and enhancement update 2014-06-09 18:24:50 UTC
oVirt gerrit 27779 0 master MERGED engine_page: replace netinfo to xmlrpc Never
oVirt gerrit 27788 0 node-3.0 MERGED engine_page: replace netinfo to xmlrpc Never

Description cshao 2014-05-14 08:53:20 UTC
Created attachment 895408 [details]
log.tar.gz

Description of problem:
Register Hypervisor to RHEV-M, and then move the cursor to oVirt Engine menu, the hypervisor will logout suddenly. It is only occurs on the first registration.

Version-Release number of selected component (if applicable):
rhev-hypervisor6-6.5-20140513.0
ovirt-node-plugin-vdsm-0.1.1-19.el6ev.noarch
ovirt-node-3.0.1-18.el6_5.10.noarch
rhevm av 9.1

How reproducible:
100%

Steps to Reproduce:
1. Install rhev-hypervisor6-6.5-20140513.0
2. Approve it and change status to up.
3. Move cursor to oVirt Engine menu

Actual results:
1. The hypervisor will logout suddenly when move cursor to  oVirt-Engine menu
2. The situation is only occurs on the first registration.

Expected results:
Can move to oVirt Engine menu but no logout.

Additional info:

Comment 1 Fabian Deutsch 2014-05-14 09:04:01 UTC
Chen,

does the hypervisor get really unregistered in RHEV-M or does it only appear in the TUI as if the hypervisor get's unregistered?

Comment 2 Fabian Deutsch 2014-05-14 09:08:08 UTC
(In reply to Fabian Deutsch from comment #1)
> Chen,
> 
> does the hypervisor get really unregistered in RHEV-M or does it only appear
> in the TUI as if the hypervisor get's unregistered?

I missunderstood the problem.

The setup TUI is quit after registration.

Comment 3 cshao 2014-05-14 09:52:28 UTC
Created attachment 895427 [details]
video for bug

Comment 4 Fabian Deutsch 2014-05-14 10:01:11 UTC
Quote from the Chen's video:

libvir: XML-RPC error : Cannot write data: Broken pipe
libvir: XML-RPC error : internal error client socket is closed
taking calling process down.

Which comes from here:
http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/libvirtconnection.py;hb=HEAD#l119

Maybe the vdsm lib has a problem bcause libvirt got restarted and it's pipe got closed.

The question is if we can make a call which does not kill the calling process.


Mooli,
can you say if we can make a call to vdsm.netinfo.networks() in a way that the host process won't be terminated?

Comment 5 Douglas Schilling Landgraf 2014-05-14 23:53:29 UTC
(In reply to shaochen from comment #0)
> Created attachment 895408 [details]
> log.tar.gz
> 
> Description of problem:
> Register Hypervisor to RHEV-M, and then move the cursor to oVirt Engine
> menu, the hypervisor will logout suddenly. It is only occurs on the first
> registration.
> 
> Version-Release number of selected component (if applicable):
> rhev-hypervisor6-6.5-20140513.0
> ovirt-node-plugin-vdsm-0.1.1-19.el6ev.noarch
> ovirt-node-3.0.1-18.el6_5.10.noarch
> rhevm av 9.1
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Install rhev-hypervisor6-6.5-20140513.0
> 2. Approve it and change status to up.
> 3. Move cursor to oVirt Engine menu
> 

Reproduced

Comment 6 Antoni Segura Puimedon 2014-05-15 13:30:07 UTC
@Fabian, yes, it is possible, you can just monkeypatch libvirtconnection.get

    from functools import partial

    from vdsm import netinfo  # This you already had

    # Let's monkey patch
    _libvirtconn_get = netinfo.libvirtconnection.get
    netinfo.libvirtconnection.get = partial(_libvirtconn_get,
                                            killOnFailure=False)

This will raise a libvirterror instead of killing the process.
vdsm utils also has some monkeypatch utility if you want.

You can see it in action:

    In [31]: netinfo.libvirtconnection.get = partial(_netinfo_libvirt_get, killOnFailure=False)

    In [32]: netinfo.networks()
    Out[32]: {'other': {'bridge': u'other', 'bridged': True}}

    In [33]: netinfo.networks()
    Out[33]: {'other': {'bridge': u'other', 'bridged': True}}

    In [34]: netinfo.networks()
    Out[34]: {'other': {'bridge': u'other', 'bridged': True}}

Here I stop libvirtd in the machine

    In [35]: netinfo.networks()
    libvir: XML-RPC error : Cannot write data: Broken pipe
    libvir: XML-RPC error : internal error client socket is closed
    No handlers could be found for logger "root"
    ---------------------------------------------------------------------------
    libvirtError                              Traceback (most recent call last)

    /root/<ipython console> in <module>()

    /usr/lib64/python2.6/site-packages/vdsm/netinfo.pyc in networks()
        110     nets = {}
        111     conn = libvirtconnection.get()
    --> 112     allNets = ((net, net.name()) for net in conn.listAllNetworks(0))
        113     for net, netname in allNets:
        114         if netname.startswith(LIBVIRT_NET_PREFIX):

    /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.pyc in wrapper(*args, **kwargs)
        108                 if edom in EDOMAINS and ecode in ECODES:
        109                     try:
    --> 110                         __connections.get(id(target)).pingLibvirt()
        111                     except libvirt.libvirtError as e:
        112                         edom = e.get_error_domain()

    /usr/lib64/python2.6/site-packages/libvirt.pyc in getLibVersion(self)
       3387         """Returns the libvirt version of the connection host """
       3388         ret = libvirtmod.virConnectGetLibVersion(self._o)
    -> 3389         if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self)
       3390         return ret
       3391 

    libvirtError: internal error client socket is closed

Comment 7 Fabian Deutsch 2014-05-15 14:37:15 UTC
Hey Antoni,

thanks for the detailed explanation.

IMHO the default should rather be an exception instead of killing the caller - or killing anyone.
The caller itself can then decide what to do when the exception is raised.

Comment 8 Antoni Segura Puimedon 2014-05-15 15:07:29 UTC
@Fabian: I tend to agree.

@Douglas: Maybe you can ask infra if the change of default behavior is okay with them and reassign the bug to them.

Comment 9 Dan Kenigsberg 2014-05-15 16:10:42 UTC
(In reply to Fabian Deutsch from comment #7)
> IMHO the default should rather be an exception instead of killing the caller
> - or killing anyone.
> The caller itself can then decide what to do when the exception is raised.

That would have been more polite; but Vdsm uses libvirt in so many circumstances, that it can be difficult to ensure.

Everything changes when non-Vdsm processes begin to use vdsm libraries.

We should have an explicit global libvirtconnection.KILL_CALLER=False. Vdsm could set it to True on startup.

Comment 10 Douglas Schilling Landgraf 2014-05-16 06:29:42 UTC
Hi,

Thanks guys for all information. 

If I use the below patch and add a try/except in ovirt-node-plugin-vdsm to retry to collect the network data after the exception it works. Please let me know your thoughts.

libvirtconnection: retry to establish connection
http://gerrit.ovirt.org/#/c/27754/ 

Thanks!

Comment 13 errata-xmlrpc 2014-06-09 14:26:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0673.html


Note You need to log in before you can comment on or make changes to this bug.