Bug 1097645

Summary:

vdsm.libvirtconnection kills TUI upon libvirt restart.

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

cshao <cshao>

Component:

ovirt-node-plugin-vdsm

Assignee:

Douglas Schilling Landgraf <dougsland>

Status:

CLOSED ERRATA

QA Contact:

Pavel Stehlik <pstehlik>

Severity:

high

Docs Contact:

Priority:

medium

Version:

3.4.0

CC:

asegurap, bazulay, cpelland, cshao, danken, fdeutsch, gklein, gouyang, guasun, hadong, hateya, huiwa, iheim, jboggs, leiwang, mtayer, pstehlik, rbarry, tpoitras, yaniwang, ycui

Target Milestone:

---

Target Release:

3.4.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

infra

Fixed In Version:

ovirt-node-plugin-vdsm-0.1.1-20.el6

Doc Type:

Bug Fix

Doc Text:

Previously, a hypervisor registered to Red Hat Enterprise Virtualization Manager would encounter a broken pipe if the cursor was moved over the oVirt Engine menu after the first registration. This caused the hypervisor to suddenly logout. Now, the issue has been corrected, and the sudden logouts do not happen.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-06-09 14:26:38 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Infra

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
log.tar.gz	none
video for bug	none

Description cshao 2014-05-14 08:53:20 UTC

Created attachment 895408 [details]
log.tar.gz

Description of problem:
Register Hypervisor to RHEV-M, and then move the cursor to oVirt Engine menu, the hypervisor will logout suddenly. It is only occurs on the first registration.

Version-Release number of selected component (if applicable):
rhev-hypervisor6-6.5-20140513.0
ovirt-node-plugin-vdsm-0.1.1-19.el6ev.noarch
ovirt-node-3.0.1-18.el6_5.10.noarch
rhevm av 9.1

How reproducible:
100%

Steps to Reproduce:
1. Install rhev-hypervisor6-6.5-20140513.0
2. Approve it and change status to up.
3. Move cursor to oVirt Engine menu

Actual results:
1. The hypervisor will logout suddenly when move cursor to  oVirt-Engine menu
2. The situation is only occurs on the first registration.

Expected results:
Can move to oVirt Engine menu but no logout.

Additional info:

Comment 1 Fabian Deutsch 2014-05-14 09:04:01 UTC

Chen,

does the hypervisor get really unregistered in RHEV-M or does it only appear in the TUI as if the hypervisor get's unregistered?

Comment 2 Fabian Deutsch 2014-05-14 09:08:08 UTC

(In reply to Fabian Deutsch from comment #1)
> Chen,
> 
> does the hypervisor get really unregistered in RHEV-M or does it only appear
> in the TUI as if the hypervisor get's unregistered?

I missunderstood the problem.

The setup TUI is quit after registration.

Comment 3 cshao 2014-05-14 09:52:28 UTC

Created attachment 895427 [details]
video for bug

Comment 4 Fabian Deutsch 2014-05-14 10:01:11 UTC

Quote from the Chen's video:

libvir: XML-RPC error : Cannot write data: Broken pipe
libvir: XML-RPC error : internal error client socket is closed
taking calling process down.

Which comes from here:
http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/libvirtconnection.py;hb=HEAD#l119

Maybe the vdsm lib has a problem bcause libvirt got restarted and it's pipe got closed.

The question is if we can make a call which does not kill the calling process.


Mooli,
can you say if we can make a call to vdsm.netinfo.networks() in a way that the host process won't be terminated?

Comment 5 Douglas Schilling Landgraf 2014-05-14 23:53:29 UTC

(In reply to shaochen from comment #0)
> Created attachment 895408 [details]
> log.tar.gz
> 
> Description of problem:
> Register Hypervisor to RHEV-M, and then move the cursor to oVirt Engine
> menu, the hypervisor will logout suddenly. It is only occurs on the first
> registration.
> 
> Version-Release number of selected component (if applicable):
> rhev-hypervisor6-6.5-20140513.0
> ovirt-node-plugin-vdsm-0.1.1-19.el6ev.noarch
> ovirt-node-3.0.1-18.el6_5.10.noarch
> rhevm av 9.1
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Install rhev-hypervisor6-6.5-20140513.0
> 2. Approve it and change status to up.
> 3. Move cursor to oVirt Engine menu
> 

Reproduced

Comment 6 Antoni Segura Puimedon 2014-05-15 13:30:07 UTC

@Fabian, yes, it is possible, you can just monkeypatch libvirtconnection.get

    from functools import partial

    from vdsm import netinfo  # This you already had

    # Let's monkey patch
    _libvirtconn_get = netinfo.libvirtconnection.get
    netinfo.libvirtconnection.get = partial(_libvirtconn_get,
                                            killOnFailure=False)

This will raise a libvirterror instead of killing the process.
vdsm utils also has some monkeypatch utility if you want.

You can see it in action:

    In [31]: netinfo.libvirtconnection.get = partial(_netinfo_libvirt_get, killOnFailure=False)

    In [32]: netinfo.networks()
    Out[32]: {'other': {'bridge': u'other', 'bridged': True}}

    In [33]: netinfo.networks()
    Out[33]: {'other': {'bridge': u'other', 'bridged': True}}

    In [34]: netinfo.networks()
    Out[34]: {'other': {'bridge': u'other', 'bridged': True}}

Here I stop libvirtd in the machine

    In [35]: netinfo.networks()
    libvir: XML-RPC error : Cannot write data: Broken pipe
    libvir: XML-RPC error : internal error client socket is closed
    No handlers could be found for logger "root"
    ---------------------------------------------------------------------------
    libvirtError                              Traceback (most recent call last)

    /root/<ipython console> in <module>()

    /usr/lib64/python2.6/site-packages/vdsm/netinfo.pyc in networks()
        110     nets = {}
        111     conn = libvirtconnection.get()
    --> 112     allNets = ((net, net.name()) for net in conn.listAllNetworks(0))
        113     for net, netname in allNets:
        114         if netname.startswith(LIBVIRT_NET_PREFIX):

    /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.pyc in wrapper(*args, **kwargs)
        108                 if edom in EDOMAINS and ecode in ECODES:
        109                     try:
    --> 110                         __connections.get(id(target)).pingLibvirt()
        111                     except libvirt.libvirtError as e:
        112                         edom = e.get_error_domain()

    /usr/lib64/python2.6/site-packages/libvirt.pyc in getLibVersion(self)
       3387         """Returns the libvirt version of the connection host """
       3388         ret = libvirtmod.virConnectGetLibVersion(self._o)
    -> 3389         if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self)
       3390         return ret
       3391 

    libvirtError: internal error client socket is closed

Comment 7 Fabian Deutsch 2014-05-15 14:37:15 UTC

Hey Antoni,

thanks for the detailed explanation.

IMHO the default should rather be an exception instead of killing the caller - or killing anyone.
The caller itself can then decide what to do when the exception is raised.

Comment 8 Antoni Segura Puimedon 2014-05-15 15:07:29 UTC

@Fabian: I tend to agree.

@Douglas: Maybe you can ask infra if the change of default behavior is okay with them and reassign the bug to them.

Comment 9 Dan Kenigsberg 2014-05-15 16:10:42 UTC

(In reply to Fabian Deutsch from comment #7)
> IMHO the default should rather be an exception instead of killing the caller
> - or killing anyone.
> The caller itself can then decide what to do when the exception is raised.

That would have been more polite; but Vdsm uses libvirt in so many circumstances, that it can be difficult to ensure.

Everything changes when non-Vdsm processes begin to use vdsm libraries.

We should have an explicit global libvirtconnection.KILL_CALLER=False. Vdsm could set it to True on startup.

Comment 10 Douglas Schilling Landgraf 2014-05-16 06:29:42 UTC

Hi,

Thanks guys for all information. 

If I use the below patch and add a try/except in ovirt-node-plugin-vdsm to retry to collect the network data after the exception it works. Please let me know your thoughts.

libvirtconnection: retry to establish connection
http://gerrit.ovirt.org/#/c/27754/ 

Thanks!

Comment 13 errata-xmlrpc 2014-06-09 14:26:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0673.html