Created attachment 925160 [details] logs Description of problem: adding host into engine changes host ifcfg file from ONBOOT="yes" to ONBOOT="no" which results on loss of connectivity after host reboot or service network restart before [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE="eth0" BOOTPROTO="dhcp" HWADDR="00:1A:4A:51:B2:06" IPV6INIT="no" MTU="1500" NM_CONTROLLED="yes" ONBOOT="yes" TYPE="Ethernet" UUID="c3453e86-3362-403f-be72-90bb3e4462c5" after [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 # Generated by VDSM version 4.16.1-6.gita4a4614.el6 DEVICE=eth0 ONBOOT=no HWADDR=00:1a:4a:51:b2:06 BRIDGE=ovirtmgmt MTU=1500 NM_CONTROLLED=no [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt # Generated by VDSM version 4.16.1-6.gita4a4614.el6 DEVICE=ovirtmgmt ONBOOT=no TYPE=Bridge DELAY=0 STP=off BOOTPROTO=dhcp MTU=1500 DEFROUTE=yes NM_CONTROLLED=no HOTPLUG=no Version-Release number of selected component (if applicable): vdsm-4.16.1-6.gita4a4614.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. install new host 2. add host to engine via engine GUI Actual results: ONBOOT=no Expected results: it should be always ONBOOT=yes for mgmt brige (if used) and its underlying device/s I can hardly imagine situation where user would want mgmt interface to stay down Additional info: MainProcess|Thread-13::INFO::2014-08-08 12:47:37,470::api::299::root::(addNetwork) Adding network ovirtmgmt with vlan=None, bonding=None, nics=['eth0'], bondingOptions=None, mtu=1500, bridged=True, defaultRoute=True,options={'bootproto': 'dhcp', 'STP': 'no', 'implicitBonding': True} MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,471::ifcfg::541::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt configuration: # Generated by VDSM version 4.16.1-6.gita4a4614.el6 DEVICE=ovirtmgmt ONBOOT=no TYPE=Bridge DELAY=0 STP=off BOOTPROTO=dhcp MTU=1500 DEFROUTE=yes NM_CONTROLLED=no HOTPLUG=no MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,575::utils::738::root::(execCmd) /sbin/ifdown ovirtmgmt (cwd None) MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,837::utils::758::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,838::utils::738::root::(execCmd) /sbin/ip route show to 0.0.0.0/0 table all (cwd None) MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,841::utils::758::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,852::ifcfg::374::root::(_atomicBackup) Backed up /etc/sysconfig/network-scripts/ifcfg-eth0 MainProcess|Thread-13::DEBUG::2014-08-08 12:47:37,852::ifcfg::541::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-eth0 configuration: # Generated by VDSM version 4.16.1-6.gita4a4614.el6 DEVICE=eth0 ONBOOT=no HWADDR=00:1a:4a:51:b2:06 BRIDGE=ovirtmgmt MTU=1500 NM_CONTROLLED=no
*** Bug 1129385 has been marked as a duplicate of this bug. ***
verified after fresh host was added to engine [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 # Generated by VDSM version 4.16.2-1.gite8cba75.el6 DEVICE=eth0 HWADDR=00:1a:4a:51:b2:09 BRIDGE=ovirtmgmt ONBOOT=yes MTU=1500 NM_CONTROLLED=no
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt # Generated by VDSM version 4.16.2-1.gite8cba75.el6 DEVICE=ovirtmgmt TYPE=Bridge DELAY=0 STP=off ONBOOT=yes BOOTPROTO=dhcp MTU=1500 DEFROUTE=yes NM_CONTROLLED=no HOTPLUG=no
it seems there is a problem if combined with self hosted engine and bond host freshly installed over em1 with following ifcfg [root@dell-r210ii-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-em1 DEVICE="em1" BOOTPROTO="dhcp" IPV6INIT="no" ONBOOT="yes" # ifcfg manually edited as follows [root@dell-r210ii-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-em1 DEVICE="em1" BOOTPROTO="none" IPV6INIT="no" ONBOOT="yes" cat > /etc/sysconfig/network-scripts/ifcfg-bond0 << EOF DEVICE=bond0 ONBOOT=yes BONDING_OPTS='mode=active-backup miimon=150' NM_CONTROLLED=no IPADDR=10.34.67.40 NETMASK=255.255.255.224 GATEWAY=10.34.67.62 EOF cat > /etc/sysconfig/network-scripts/ifcfg-p1p1<< EOF DEVICE=p1p1 ONBOOT=yes MASTER=bond0 SLAVE=yes NM_CONTROLLED=no EOF cat > /etc/sysconfig/network-scripts/ifcfg-p1p2<< EOF DEVICE=p1p2 ONBOOT=yes MASTER=bond0 SLAVE=yes NM_CONTROLLED=no EOF [root@dell-r210ii-06 ~]# service network restart [root@dell-r210ii-06 ~]# ip r 10.34.67.32/27 dev bond0 proto kernel scope link src 10.34.67.40 169.254.0.0/16 dev em1 scope link metric 1004 169.254.0.0/16 dev bond0 scope link metric 1006 default via 10.34.67.62 dev bond0 [root@dell-r210ii-06 ~]# yum install ovirt-hosted-engine-setup -y [root@dell-r210ii-06 ~]# hosted-engine --deploy #somehow it ends up with onboot = no [root@dell-r210ii-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-p1p1 # Generated by VDSM version 4.16.2-1.gite8cba75.el6 DEVICE=p1p1 HWADDR=90:e2:ba:04:28:c0 MASTER=bond0 SLAVE=yes ONBOOT=no NM_CONTROLLED=no [root@dell-r210ii-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-p1p2 # Generated by VDSM version 4.16.2-1.gite8cba75.el6 DEVICE=p1p2 HWADDR=90:e2:ba:04:28:c1 MASTER=bond0 SLAVE=yes ONBOOT=no NM_CONTROLLED=no [root@dell-r210ii-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 # Generated by VDSM version 4.16.2-1.gite8cba75.el6 DEVICE=bond0 BONDING_OPTS='mode=active-backup miimon=150' ONBOOT=no NM_CONTROLLED=no HOTPLUG=no
Created attachment 930759 [details] logs_self_hosted_engine
Antoni, can you check comment #4 ?
I can reproduce the problem when installing a new RHEL6 host: preinstallation status: - Two ethernet devices (em1 and em2) bonded in LACP mode (bond0), statically configured in network scripts (Network manager disabled), bond configuration: DEVICE=bond0 BONDING_OPTS='mode=4 miimon=100' ONBOOT=yes IPADDR=10.34.73.80 NETMASK=255.255.255.192 GATEWAY=10.34.73.126 BOOTPROTO=none MTU=1500 DEFROUTE=yes NM_CONTROLLED=no The connection to the host is lost at the point of vdsm deamon start, The status after installation failed is following: - The network scripts persisted the same except for that item "ONBOOT=yes" changed to ONBOOT=no on ethernet (em1 and em2) devices and on bond0 device, moreover static IP configuration (IPADDR, NETMASK, GATEWAY) was deleted from the bond0 network script. Then I edited manually network scripts to ONBOOT=yes, stopped vdsm and reinstalled hosts which was successful. I am attaching whole vdsm and supervdsm host, be aware that I created more logical networks (vm_dynamic, display and display_wanem) after successful host installation.
@Toni the problematic part seems to be here, maybe I overlooked something, but form quick look it does not seem to generate ifcfg-rhevm MainThread::DEBUG::2014-09-05 14:13:22,875::api::625::root::(setupNetworks) Validating configuration MainThread::DEBUG::2014-09-05 14:13:22,877::api::637::setupNetworks::(setupNetworks) Applying... MainThread::DEBUG::2014-09-05 14:13:22,877::netconfpersistence::134::root::(_getConfigs) Non-existing config set. MainThread::DEBUG::2014-09-05 14:13:22,877::utils::738::root::(execCmd) /sbin/ip route show to 0.0.0.0/0 table all (cwd None) MainThread::DEBUG::2014-09-05 14:13:22,879::utils::758::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 MainThread::DEBUG::2014-09-05 14:13:22,883::api::531::_handleBondings::(_handleBondings) Editing bond Bond(bond0: [Nic(em1), Nic(em2)]) with options mode=4 miimon=100 MainThread::DEBUG::2014-09-05 14:13:22,884::ifcfg::541::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-bond0 configuration: # Generated by VDSM version 4.16.3-2.el6 DEVICE=bond0 BONDING_OPTS='mode=4 miimon=100' ONBOOT=no NM_CONTROLLED=no HOTPLUG=no
what is happening is that when vdsm first starts, the init common script is running restore nets and, for some reason, the system in c#8 and c#9 shows that is has a definition for the bond0 in /var/run/vdsm/netconf/bonds/bond0 Now. The only thing we know that writes to it is the setupNetworks flow. So we have to find out what ran setupNetworks for the bond before the vdsm ever ran (since this setupNetworks does not appear in the attached supervdsm.log nor in vdsm.log).
@mkrcmari: Do you think you could try to reproduce again and just before installing vdsm and before starting it for the first time, check the contents of /var/run/vdsm/netconf/{nets,bonds} ?
works in vt5 before [root@dell-r210ii-05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-em1 DEVICE="em1" BOOTPROTO="dhcp" HWADDR="D0:67:E5:F0:7F:02" IPV6INIT="no" MTU="1500" NM_CONTROLLED="yes" ONBOOT="yes" TYPE="Ethernet" UUID="6ffbe414-3922-4482-b793-3766058dcedb" after [root@dell-r210ii-05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-em1 # Generated by VDSM version 4.16.6-1.el6ev DEVICE=em1 HWADDR=d0:67:e5:f0:7f:02 BRIDGE=rhevm ONBOOT=yes MTU=1500 NM_CONTROLLED=no
oVirt 3.5 has been released and should include the fix for this issue.
Still witnessed on 3.5.1.1-1.el6 and CentOS 6.6 hosts and vdsm-4.16.10-8. May someone tell us at which precise version it is fixed?
(In reply to Nicolas Ecarnot from comment #17) > Still witnessed on 3.5.1.1-1.el6 and CentOS 6.6 hosts and vdsm-4.16.10-8. > > May someone tell us at which precise version it is fixed? I see Fixed In Version: 4.16.4. Can you please check using engine 3.5.4.2 / vdsm 4.16.26 and open a new bug against version 3.5.4 if there?
(In reply to Sandro Bonazzola from comment #18) > (In reply to Nicolas Ecarnot from comment #17) > > Still witnessed on 3.5.1.1-1.el6 and CentOS 6.6 hosts and vdsm-4.16.10-8. > > > > May someone tell us at which precise version it is fixed? > > I see Fixed In Version: 4.16.4. > > Can you please check using engine 3.5.4.2 / vdsm 4.16.26 and open a new bug > against version 3.5.4 if there? Sandro, We indeed plan to upgrade to 3.5.something in the close future, but these are sensitive production systems, so I can't promise this will be done in the next week. I'll send an answer there when done. Thank you.