Bug 1209401 - [RHEV-H] vdsm with predefined bonds is down after upgrade from 3.5.0 to 3.5.1
Summary: [RHEV-H] vdsm with predefined bonds is down after upgrade from 3.5.0 to 3.5.1
Keywords:
Status: CLOSED DUPLICATE of bug 1205711
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.5.1
Assignee: Fabian Deutsch
QA Contact: Aharon Canan
URL:
Whiteboard: network
Depends On:
Blocks: 1193058
TreeView+ depends on / blocked
 
Reported: 2015-04-07 09:55 UTC by cshao
Modified: 2016-02-10 19:54 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-04-21 13:11:41 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nic-info (181.08 KB, image/png)
2015-04-07 09:55 UTC, cshao
no flags Details
7.0 upgrade to 7.1 failed (5.28 MB, application/x-gzip)
2015-04-07 09:57 UTC, cshao
no flags Details
logs apr8 (92.93 KB, application/x-gzip)
2015-04-08 17:58 UTC, Douglas Schilling Landgraf
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 39995 0 None None None Never

Description cshao 2015-04-07 09:55:03 UTC
Created attachment 1011676 [details]
nic-info

Description of problem:
Host gone to non-operational state after upgrade from 7.0 to 7.1, and RHEV-M bridge disappeared.

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.0-20150127
rhev-hypervisor7-7.1-20150402.0.el7ev
ovirt-node-3.2.2-3.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install RHEV-H GA build(rhev-hypervisor7-7.0-20150127) with [bond+ vlan] network, 
2. Add rhevh to RHEV-M via RHEV-M UI.
3. Maintenance the host.
4. Upgrade to rhev-hypervisor7-7.1-20150402.0.el7ev.


Actual results:
1. RHEV-M side: Host gone to non-operational state, ifcfg-rhevm disappeared.
2. RHEV-H side: 
   1) Networking show as "Unknown"
   2) Bond status show as "Unconfigured"
   3) RHEV-M bridge disappeared.

Expected results:
rhevh 7.1 host UP after upgrade from rhevh 7.0 GA.

Additional info:

2015-04-07 04:25:54,631       INFO Effective changes {'nics': 'bond1'}
2015-04-07 04:25:55,718      ERROR An error appeared in the UI: UnknownNicError("Unknown network interface: 'bond1'",)
2015-04-07 04:25:55,718       INFO Exception:
Traceback (most recent call last):

Comment 1 cshao 2015-04-07 09:57:50 UTC
Created attachment 1011677 [details]
7.0 upgrade to 7.1 failed

Comment 2 cshao 2015-04-07 10:15:42 UTC
Update vdsm and rhevm version info in here:
vdsm-4.16.13-1.el7ev.x86_64
RHEVM vt14.2 (3.5.1-0.3.el6ev)

Comment 4 Douglas Schilling Landgraf 2015-04-07 20:18:18 UTC
Hello shaochen,

(In reply to shaochen from comment #0)
> Created attachment 1011676 [details]
> nic-info
> 
> Description of problem:
> Host gone to non-operational state after upgrade from 7.0 to 7.1, and RHEV-M
> bridge disappeared.
> 
> Version-Release number of selected component (if applicable):
> rhev-hypervisor7-7.0-20150127
> rhev-hypervisor7-7.1-20150402.0.el7ev
> ovirt-node-3.2.2-3.el7.noarch
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Install RHEV-H GA build(rhev-hypervisor7-7.0-20150127) with [bond+ vlan]
> network, 

Could you please provide step by step how did you configure your bond+vlan settings so I could replicate locally? Example: Was it via TUI?  

Also, I would appreciate if you could provide the below data:

Before the upgrade (after installation/registration/approval of rhev-hypervisor7-7.0-20150127 in RHEV-M and host is UP)
===============================================================================
* The output of ls -la /config/etc/sysconfig/network-scripts/*
* The output of ls -la /etc/sysconfig/network-scripts/*
* is ifcfg-rhevm persisted? The host keeps the ifcfg-rhevm across reboots?
* ifconfig -a

Thanks!

Comment 5 cshao 2015-04-08 03:31:04 UTC
I met the same issue when upgrade from 
6.6 for 3.5.0 (rhev-hypervisor6-6.6-20150128.0)-> 6.6 for 3.5.1 latest (rhev-hypervisor6-6.6-20150402.0).

Test steps:
1. Install 6.6 for 3.5.0 (rhev-hypervisor6-6.6-20150128.0)- 
2. Configure [bond+ vlan] network via TUI. 
3. Add rhevh to RHEV-M via RHEV-M UI.
4. Maintenance the host.
5. Upgrade to 6.6 for 3.5.1 latest (rhev-hypervisor6-6.6-20150402.0).

Test result:
1. RHEV-M side: Host gone to non-operational state, ifcfg-rhevm disappeared.
2. RHEV-H side: 
   1) Networking show as "Unknown"
   2) Bond status show as "Unconfigured"
   3) RHEV-M bridge disappeared.

I will reply #c4 ASAP.

Thanks!

Comment 7 Douglas Schilling Landgraf 2015-04-08 17:50:05 UTC
Thanks shaochen, that helps.

I could reproduce the original report using a different scenario as well:

- Install RHEV-H 20150127.0

  # cat /etc/redhat-release 
  Red Hat Enterprise Virtualization Hypervisor 7.0 (20150127.0.el7ev)

  # rpm -qa | grep -i vdsm
  vdsm-xmlrpc-4.16.8.1-6.el7ev.noarch
  vdsm-python-4.16.8.1-6.el7ev.noarch
  vdsm-cli-4.16.8.1-6.el7ev.noarch
  vdsm-yajsonrpc-4.16.8.1-6.el7ev.noarch
  vdsm-4.16.8.1-6.el7ev.x86_64
  vdsm-hook-ethtool-options-4.16.8.1-6.el7ev.noarch
  ovirt-node-plugin-vdsm-0.2.0-18.el7ev.noarch
  vdsm-python-zombiereaper-4.16.8.1-6.el7ev.noarch
  vdsm-jsonrpc-4.16.8.1-6.el7ev.noarch
  vdsm-reg-4.16.8.1-6.el7ev.noarch
  vdsm-hook-vhostmd-4.16.8.1-6.el7ev.noarch


- Setup the nic to dhcp via network TUI 
- Register to RHEV-M via TUI (or call via shell /usr/share/vdsm-reg/vdsm-reg-setup)
- Approve in RHEV-M Web admin

* Host will be up, the network settings will be available via /etc/sysconfig/network-scripts/ but not persisted.

Data from tests
=======================

#1 Settings when configuring network via ovirt-node TUI

# ls -la /etc/sysconfig/network-scripts/ifcfg-*
-rw-r--r--. 1 root root 137 Apr  8 17:06 /etc/sysconfig/network-scripts/ifcfg-ens3
-rw-r--r--. 1 root root  64 Apr  8 17:06 /etc/sysconfig/network-scripts/ifcfg-lo

# cat /config/etc/sysconfig/network-scripts/ifcfg-ens3 
BOOTPROTO="dhcp"
DEVICE="ens3"
HWADDR="52:54:00:da:98:4e"
IPV6INIT="no"
IPV6_AUTOCONF="no"
NM_CONTROLLED="no"
ONBOOT="yes"
PEERNTP="yes"

# cat /config/etc/sysconfig/network-scripts/ifcfg-lo 
DEVICE="lo"
IPADDR="127.0.0.1"
NETMASK="255.0.0.0"
ONBOOT="yes"

Both files are persisted correctly:
****************************************************
# ls /config/etc/sysconfig/network-scripts/
ifcfg-ens3  ifcfg-lo


Registered the node into rhevm
=====================================
Host will be pending approval in RHEV-M web admin, however the network settings in the node vdsm now owns it:

# cat /etc/sysconfig/network-scripts/ifcfg-ens3 
# Generated by VDSM version 4.16.8.1-6.el7ev
DEVICE=ens3
HWADDR=52:54:00:da:98:4e
BRIDGE=rhevm
ONBOOT=yes
NM_CONTROLLED=no
IPV6_AUTOCONF=no
PEERNTP=yes
IPV6INIT=no

# cat /etc/sysconfig/network-scripts/ifcfg-rhevm 
# Generated by VDSM version 4.16.8.1-6.el7ev
DEVICE=rhevm
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
BOOTPROTO=dhcp
DEFROUTE=yes
NM_CONTROLLED=no
IPV6_AUTOCONF=no
PEERNTP=yes
IPV6INIT=no
HOTPLUG=no

However, there is no more ifcfg-ens3 persisted or ifcfg-rhevm file:
# ls /config/etc/sysconfig/network-scripts/
ifcfg-lo

The persist command in ovirt-node also is working, as example in the same scenario I have execute the persist into ifcfg-rhevm:

# ls /config/etc/sysconfig/network-scripts/
ifcfg-lo
# persist /etc/sysconfig/network-scripts/ifcfg-rhevm
# ls /config/etc/sysconfig/network-scripts/ifcfg-rhevm 

Additional data
=================
vdsm-reg.log

MainThread::DEBUG::2015-04-08 17:27:37,754::deployUtil::487::root::Bridge rhevm not found, need to create it.
MainThread::DEBUG::2015-04-08 17:27:37,754::vdsm-reg-setup::94::root::renameBridge begin.
MainThread::DEBUG::2015-04-08 17:27:37,754::deployUtil::1015::root::makeBridge begin.
MainThread::DEBUG::2015-04-08 17:27:37,755::deployUtil::438::root::_getMGTIface: read host name: 192.168.122.70
MainThread::DEBUG::2015-04-08 17:27:37,755::deployUtil::446::root::_getMGTIface: using host name 192.168.122.70 strIP= 192.168.122.70
MainThread::DEBUG::2015-04-08 17:27:37,755::deployUtil::453::root::_getMGTIface IP=192.168.122.70 strIface=ens3
MainThread::DEBUG::2015-04-08 17:27:37,756::deployUtil::1059::root::makeBridge found the following bridge paramaters: ['BOOTPROTO=dhcp', 'IPV6INIT=no', 'IPV6_AUTOCONF=no', 'ONBOOT=yes', 'PEERNTP=yes']
MainThread::DEBUG::2015-04-08 17:27:37,760::deployUtil::140::root::['/usr/share/vdsm/addNetwork', 'rhevm', '', '', 'ens3', 'BOOTPROTO=dhcp', 'IPV6INIT=no', 'IPV6_AUTOCONF=no', 'ONBOOT=yes', 'PEERNTP=yes', 'blockingdhcp=true']
MainThread::DEBUG::2015-04-08 17:27:42,537::deployUtil::149::root::
MainThread::DEBUG::2015-04-08 17:27:42,538::deployUtil::150::root::libvirt: Network Driver error : Network not found: no network with matching name 'vdsm-rhevm'

MainThread::DEBUG::2015-04-08 17:27:42,538::deployUtil::140::root::['/usr/share/vdsm/get-conf-item', '/etc/vdsm/vdsm.conf', 'vars', 'net_persistence', 'ifcfg']
MainThread::DEBUG::2015-04-08 17:27:42,555::deployUtil::149::root::unified
MainThread::DEBUG::2015-04-08 17:27:42,555::deployUtil::150::root::
MainThread::DEBUG::2015-04-08 17:27:42,556::deployUtil::140::root::['/usr/share/vdsm/vdsm-store-net-config', 'unified']
MainThread::DEBUG::2015-04-08 17:27:43,869::deployUtil::149::root::
MainThread::DEBUG::2015-04-08 17:27:43,869::deployUtil::150::root::
MainThread::DEBUG::2015-04-08 17:27:43,869::deployUtil::1144::root::makeBridge return.
MainThread::DEBUG::2015-04-08 17:27:43,870::deployUtil::140::root::['/usr/share/vdsm/vdsm-store-net-config']
MainThread::DEBUG::2015-04-08 17:27:43,883::deployUtil::149::root::
MainThread::DEBUG::2015-04-08 17:27:43,883::deployUtil::150::root::
MainThread::ERROR::2015-04-08 17:27:43,883::vdsm-reg-setup::124::root::renameBridge: failed to chmod bridge file
MainThread::DEBUG::2015-04-08 17:27:43,883::vdsm-reg-setup::126::root::renameBridge return.
MainThread::DEBUG::2015-04-08 17:27:43,884::vdsm-reg-setup::238::root::execute: after renameBridge: False
MainThread::DEBUG::2015-04-08 17:27:43,884::vdsm-reg-setup::316::root::Registration status: False

Please note that the first registration failed after /usr/share/vdsm/vdsm-store-net-config and /var/lib/vdsm/netconfback is empty so in the end the chmod operation couldn't be executed or even file persisted.


More detailed from vdsm-reg-setup.in
==========================================
   SCRIPT_NAME_SAVE = "vdsm-store-net-config"
   # Rename existing bridge
        fReturn = deployUtil.makeBridge(self.vdcName, self.vdsmDir)
        if not fReturn:
            logging.error("renameBridge Failed to rename existing bridge!")

   # Persist changes
        if fReturn:
            try:
                out, err, ret = deployUtil._logExec(
                    [os.path.join(self.vdsmDir, SCRIPT_NAME_SAVE)])
     <snip>
     os.chmod(
                    (
                        "/config/etc/sysconfig/network-scripts/ifcfg-" +
                        MGT_BRIDGE_NAME
                    ),
                    0o644
                )
     except:
                fReturn = False
                logging.error("renameBridge: failed to chmod bridge file")


From deployUtil.py
=======================
   # Add bridge
    if fReturn:
        try:
            lstBridgeOptions.append('blockingdhcp=true')
            out, err, ret = _logExec([os.path.join(vdsmDir, SCRIPT_NAME_ADD),
                                      bridgeName, vlan, bonding, nic]
                                     + lstBridgeOptions)
            if ret:
                raise Exception('Failed to add bridge')

    # Save current config by removing the undo files:
    try:
        if fReturn:
            if fIsOvirt:
                out, err, ret = _logExec(
                    [os.path.join(vdsmDir, SCRIPT_NAME_GET_CONFIG),
                     P_VDSM_CONF, 'vars', 'net_persistence', 'ifcfg'])
                if ret:
                    raise Exception('Failed to retrieve vdsm persistence '
                                    'mode. Stderr: %s' % err)

                net_persistence = out.strip()
                out, err, ret = _logExec(
                    [os.path.join(vdsmDir, SCRIPT_NAME_STORE_NET_CONFIG),
                     net_persistence])
                if ret:
                    raise Exception('Failed to persist vdsm networking '
                                    'configuration. Stderr: %s' % err)
            else:
                setSafeVdsmNetworkConfig() 


From this perspective ovirt-node is working as expected, I am moving to vdsm component for the vdsm network guys review the bugzilla. Please let me know if you guys have any patch or test so I can quickly help.

Comment 8 Douglas Schilling Landgraf 2015-04-08 17:58:11 UTC
Created attachment 1012344 [details]
logs apr8

Comment 9 Lior Vernia 2015-04-09 12:51:33 UTC

*** This bug has been marked as a duplicate of bug 1209486 ***

Comment 10 Fabian Deutsch 2015-04-16 12:56:09 UTC
Re-opening this because we are not sure if ifcfg-rhevm must be persisted, thus we can not make it a dupe of bug 1209486.

Comment 11 Fabian Deutsch 2015-04-20 13:39:58 UTC
In RHEV 3.5.0 network configuration files which were created by other parties than vdsm were unpersisted once the host got registered to Engine.
This lead to the problem that devices like bonds and bridges did not come up, because either the configuration for the device itself, or for a required (i.e. slave) got unpersisted and was not available any longer.

This had several effects:
- Devices appeared as unconfigured in the TUI
- vdsmd did not come up because libvirtd was not coming up because it found no device to bind to
- because vdsmd did not come up no other devices were configured

Devices which were created in Engine (and thus owned by vdsm) were not affected.

This bug can not be fixed, because we can not bring back the unpersisted configuration files.

RHEV 3.5.1 has the logic to also persist config files which were not created by vdsm.

Comment 14 Yaniv Lavi 2015-04-21 13:11:41 UTC

*** This bug has been marked as a duplicate of bug 1205711 ***


Note You need to log in before you can comment on or make changes to this bug.