Bug 1255533

Summary: Hosts losing ifcfg-eth0 networking in 20150603.0.el6ev
Product: Red Hat Enterprise Virtualization Manager Reporter: Robert McSwain <rmcswain>
Component: rhev-hypervisorAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Chaofeng Wu <cwu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.4.5CC: amureini, cshao, cwu, ecohen, gklein, huiwa, ibarkan, leiwang, lsurette, mburman, pstehlik, rmcswain, ycui, yeylon, ylavi
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: network
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-21 09:16:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert McSwain 2015-08-20 20:32:48 UTC
Description of problem:
After updating all the host's firmware and installing fresh hypervisors, we are losing network connectivity. net_persistence = ifcfg and migration_timeout = 600 were added to every host, rebooted, and observed ifcfg-eth0 missing

Version-Release number of selected component (if applicable):
RHEV H 20150603.0.el6ev

How reproducible:


Steps to Reproduce:
1. net_persistence = ifcfg and migration_timeout = 600
2. Configure ifcfg-eth0 in the admin TUI of RHEV-H
3. Reboot
4. Observe the TUI and /etc/sysconfig/network-scripts missing ifcfg-eth0

Actual results:
Networking is missing

Expected results:
All network devices are configured as they were previously before the reboot.

Additional info:
blade2nonetwork1.PNG (26 KB) 
screenshot inside admin showing network configured after a reboot

blade2nonetwork2.PNG (19 KB) 
screenshot showing config of eth0 after reboot

blade2nonetwork3.PNG (18 KB) 
screenshot showing missing ifcfg-eth0 after reboot

Comment 2 Fabian Deutsch 2015-09-02 17:39:05 UTC
From the description it looks like these are the symptoms around network persistence.

Ido, does this look like the networking issue we fix in 3.5.4?

Comment 3 Ido Barkan 2015-09-22 07:31:17 UTC
Hi, There are tons of logs here, which is kind of challenging. I found the last setupNetworks command on /hp-enc1-blade2-2015081913151439990114/var/log/vdsm/supervds.log

I guess this is the call from the TUI since it configures the management network.
2015-08-13 21:37:50,835::api::631::setupNetworks::(setupNetworks) Setting up network according to configuration: networks:{'rhevm': {'vlan': '319', 'ipaddr': '10.10.0.84', 'bonding': 'bond0', 'netmask': '255.255.0.0', 'STP': 'no', 'bridged': 'true', 'gateway': '10.10.101.13', 'defaultRoute': True}}, bondings:{}, options:{'connectivityCheck': 'true', 'connectivityTimeout': 120}

... and then during the execution vdsm writes the updated ifcfg-eth0

2015-08-13 21:37:51,607::ifcfg::550::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-eth0 configuration:
# Generated by VDSM version 4.16.20-1.el6ev
DEVICE=eth0
HWADDR=00:17:a4:77:00:1e
MASTER=bond0
SLAVE=yes
ONBOOT=yes
MTU=1500
NM_CONTROLLED=no

there is no reference of ifcfg-eth0 later in the log.

Also, later, I see another call to setupNetworks which maybe hints that this is the wrong server:

2015-08-13 21:41:49,497::api::631::setupNetworks::(setupNetworks) Setting up network according to configuration: networks:{'iSCSI_1': {'nic': 'eth8', 'netmask': '255.255.255.0', 'ipaddr': '172.16.5.2', 'bridged': 'true', 'STP': 'no'}, 'iSCSI_2': {'nic': 'eth9', 'netmask': '255.255.255.0', 'ipaddr': '172.16.3.13', 'bridged': 'true', 'STP': 'no'}, 'RAILS_205': {'bonding': 'bond0', 'vlan': '205', 'STP': 'no', 'bridged': 'true'}, 'RAILS_204': {'bonding': 'bond0', 'vlan': '204', 'STP': 'no', 'bridged': 'true'}, 'RAILS_220': {'bonding': 'bond0', 'vlan': '220', 'STP': 'no', 'bridged': 'true'}, 'VLAN300': {'bonding': 'bond0', 'vlan': '300', 'STP': 'no', 'bridged': 'true'}, 'VLAN902': {'vlan': '902', 'ipaddr': '172.16.2.28', 'bonding': 'bond0', 'netmask': '255.255.255.0', 'STP': 'no', 'bridged': 'true'}, 'P2000A': {'nic': 'eth2', 'netmask': '255.255.255.224', 'ipaddr': '172.16.6.7', 'bridged': 'true', 'STP': 'no'}, 'P2000B': {'nic': 'eth3', 'netmask': '255.255.255.224', 'ipaddr': '172.16.6.67', 'bridged': 'true', 'STP': 'no'}, 'zimbra_private': {'bonding': 'bond0', 'vlan': '4001', 'STP': 'no', 'bridged': 'true'}, 'web_private': {'bonding': 'bond0', 'vlan': '4000', 'STP': 'no', 'bridged': 'true'}, 'moodle_private': {'bonding': 'bond0', 'vlan': '4002', 'STP': 'no', 'bridged': 'true'}, 'RAILS_233': {'bonding': 'bond0', 'vlan': '233', 'STP': 'no', 'bridged': 'true'}, 'RAILS_232': {'bonding': 'bond0', 'vlan': '232', 'STP': 'no', 'bridged': 'true'}}, bondings:{}, options:{'connectivityCheck': 'true', 'connectivityTimeout': 120}

Robert? Maybe I am looking at the wrong log file? Can you point me to the a maybe more relevant log?

Comment 4 Robert McSwain 2015-10-07 14:35:25 UTC
The customer is testing out RHEL 7.1 hosts, so I've asked if there's a need to keep this bug opened. Leaving the NEEDINFO on myself for now while we await answers.

Comment 5 Yaniv Lavi 2015-10-21 09:16:39 UTC
Please reopen when you can provide the info.