+++ This bug was initially created as a clone of Bug #1203422 +++ Description of problem: After rebooting, it seems that 'vdsm-tool restore-nets' changes the configuration of the ovirtmgmt interface from "ONBOOT=yes" to "ONBOOT=no" In our environment, that interface is the primary NIC ("em1"), and is not on a VLAN, nor is it used as a VM network. When initially adding the interface, it keeps the "ONBOOT" setting, but after rebooting once, it goes back to "ONBOOT=no" # Generated by VDSM version 4.16.10-8.gitc937927.el6 DEVICE=em1 HWADDR=b8:2a:72:de:05:fe ONBOOT=no IPADDR=10.227.178.131 NETMASK=255.255.255.128 BOOTPROTO=none MTU=1500 DEFROUTE=yes NM_CONTROLLED=no Version-Release number of selected component (if applicable): oVirt Engine Version: 3.5.1.1-1.el6 vdsm-4.16.10-8.gitc937927.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. Install oVirt 3.5.1 and add a host 2. Set the ovirtmgmt interface to not be a VM network 3. Add other VLAN-tagged VM networks 4. Reboot Actual results: After running 'vdsm-tool restore-nets' or rebooting, /etc/sysconfig/network-scripts/ifcfg-em1 switches from "ONBOOT=yes" to "ONBOOT=no" Expected results: I expect VDSM not to alter the "ONBOOT" setting. Additional info: This causes cascade failures for other services when the primary interface is not reactivated after booting. Right now, I have Ansible ensuring it gets set to "yes", but that's less than ideal, especially if we need to reboot twice in succession for some reason. --- Additional comment from Matt R on 2015-03-18 15:10:43 EDT --- I should add, this is not a strict duplicate of #1128140 - the notes in that bug indicate it's an issue with self-hosted engines. Our engine is not self-hosted in this instance. --- Additional comment from Dan Kenigsberg on 2015-03-19 06:55:29 EDT --- Setting ONBOOT=no was an intentional step on the route of leaving ifcfg files behind and moving to what we call "unified persistence": where ovirt-owned networks are stored under /var/lib/vdsm/persistence/netconf. When we merge https://gerrit.ovirt.org/29441/ network would be started earlier. Would you please share more details on your cascade of failure, so we can understand if they would be solved by setting an independent vdsm-network service? --- Additional comment from Matt R on 2015-03-19 08:59:08 EDT --- Hi Dan, The biggest thing in my setup is that each oVirt node is also a gluster storage node. Glusterd starts before VDSMD, and it fails when the network isn't active. This then causes the storage domain to fail, as well. Aside from that, it also causes nslcd and autofs to not load properly, which prevents logging in with a non-admin account once the machine does finish booting. Is there any reason to not start the vdsm daemon earlier in the boot process (say, the same as the 'network' service) until that merge is released? Or should gluster be started before vdsmd, causing a sort of catch-22 scenario? --- Additional comment from Dan Kenigsberg on 2015-03-19 09:12:47 EDT --- Vdsm requires libvirtd, which requires network. The motivation for the vdsm-network service is to break this vicious circle. --- Additional comment from Matt R on 2015-03-19 09:36:09 EDT --- Is it possible to emulate the new merge by setting up a new init script that only does a "vdsm-tool restorenets", and stick that up earlier in the boot sequence? --- Additional comment from Dan Kenigsberg on 2015-03-19 12:43:23 EDT --- I'm so sorry, Matt. Only now do I notice that you are using el6, where the referred patch has no effect at all. Your idea of extending it to el6 makes sense, but may need to wait a bit more (unless you post the patch). --- Additional comment from Yaniv Dary on 2015-03-30 08:39:04 EDT --- any updates on this one? --- Additional comment from Ido Barkan on 2015-03-30 10:33:44 EDT --- We have investigated this and came to a conclusion that a quick solution cannot be implemented. VDSM currently needs libvirt, and restoring networks leverages VDSM code. Since libvirt is network dependent, restoring networks can unfortunately take place only long after network service and libvirt are up. This is true both for el6 + el7. In the current unified persistence mode, VDSM tries to prevent the network service to configure all of it's networks, just for VDSM to tear them done while restoring them (hence the ONBOOT=no). This leaves us with 2 main choices: 1. try and drop our dependency in libvirt. This is not a trivial task. 2. revert back to old ifcfg persistence mode (this will require a downgrade path for the next release). There are a few more dirty hacks that we might pull, but we need to think more carefully and do some more analysis before we continue. --- Additional comment from Matt R on 2015-03-30 12:06:10 EDT --- Thanks for everyone's help. For the time being, I'm resorting to what might be the dirtiest hack. I've edited /etc/init.d/network to include the line: # Fix for VDSM bug /usr/bin/perl -pi -e 's/^ONBOOT.*$/ONBOOT=yes/g' /etc/sysconfig/network-scripts/ifcfg-em1 after it sources the 'functions' file. Additionally, I have our cf management (ansible) fixing the file periodically as well. Seems like this is a Catch-22 situation. --- Additional comment from Yaniv Dary on 2015-03-31 04:33:30 EDT --- I'm moving the fix for this to 3.5.3, since it is quite a complex fix and will not fit the schedule. --- Additional comment from Sandro Bonazzola on 2015-04-08 05:27:38 EDT --- Removing from the tracker as per comment #10 --- Additional comment from Dan Kenigsberg on 2015-04-24 05:24:19 EDT --- --- Additional comment from Dan Kenigsberg on 2015-06-08 06:08:20 EDT --- --- Additional comment from Dan Kenigsberg on 2015-06-08 06:19:41 EDT --- --- Additional comment from Yaniv Dary on 2015-06-23 11:59:06 EDT --- What is the status of this? --- Additional comment from Ido Barkan on 2015-06-24 03:09:01 EDT --- Late stages of code review. Should be pushed soon, but will need QA attention. --- Additional comment from Ido Barkan on 2015-06-29 07:03:56 EDT --- for QA: The following verifications must be done in order to release this safely: Axes: 1. persistence = "unified"/"ifcfg"- ifcfg is to test for regression. 2. rhel/rhev-h 3. scenarios: a. upgrade from 3.4.x to 3.5.4. b. upgrade from 3.5.x to 3.5.4. c. "selective restoration" (only supported with persistence=unified): o. setup networks. o. setSafeNetworkConfig o. manually change (some or all) networks (such as changing IP, bonding options etc.) o. reboot o. verify that VDSM restores *only* the networks you have 'sabotaged' two steps before. * all scenarios must include a complex network setup which involves bonds (including custom bonding options), Vlan devices and with/without bridges --- Additional comment from Barak on 2015-07-06 08:10:14 EDT --- Danken, what was the scenario that failed. --- Additional comment from Michael Burman on 2015-07-06 09:07:28 EDT --- Scenarios failed: (RHEL 7.1) Upgrade 3.5.3 >> 3.5.4 Upgrade 3.4.5 >> 3.5.4 Both with bonds. First boot after upgrade, vdsm don't see the slaves of the bond and recognize it as a change, as well for the network attached to the bond. Although manually change has been done on the bond via ifcfg-bond0 before upgrade, from bond mode=4 to bond mode=1, on the second and third reboots vdsm still recognize there was a change in bond0, cause the slaves of the bond weren't up in time. So every reboot vdsm will touch the bond and the network attached to him. MainThread::INFO::2015-07-02 15:21:12,243::vdsm-restore-net-config::163::root:_find_changed_or_missing) bond0 is different or missing from persistent configu ration. current: {'nics': [], 'options': ''}, persisted: {u'nics': [u'ens1f0', u'ens1f1'], u'options': u'miimon=100 mode=4'} MainThread::INFO::2015-07-02 15:21:12,243::vdsm-restore-net-config::163::root:_find_changed_or_missing) net_lb is different or missing from persistent config uration. current: None, persisted: {u'bondingOptions': u'mode=4 miimon=100', u'mtu': '1500', u'bonding': u'bond0', 'bootproto': 'none', 'stp': False, u'bridged ': True, 'defaultRoute': False} - Couldn't test RHEV-H, because latest rhev-h builds include only vdsm-4.16.20, and the fix was done on vdsm-4.16.21 --- Additional comment from Fabian Deutsch on 2015-07-07 05:46:02 EDT --- I tested vdsm-4.16.21 on RHEV-H and was still seeing this issue. --- Additional comment from Ido Barkan on 2015-07-14 01:56:37 EDT --- There were a quite a few bugs that were found thanks to QE during the integration process: 1. https://gerrit.ovirt.org/#/c/43507/ 2. https://gerrit.ovirt.org/#/c/43222/ 3. https://gerrit.ovirt.org/#/c/43238/ 4. https://gerrit.ovirt.org/#/c/43512/ 5. https://gerrit.ovirt.org/#/c/43382/ --- Additional comment from Michael Burman on 2015-07-15 01:10:35 EDT --- Not sure why this bug pushed to ON_QA. We still don't have full fix for this, new underlying bug discovered yesterday, when upgrading rhel 6.7 , from vdsm 3.5.3 >> 3.5.4 . This bug can't be verified at this point, there are more tests that need to be done here. vdsm still touching bond and the network attached to him every boot, although no change has been done. Ido, feel free to add any comment, thanks. --- Additional comment from Ido Barkan on 2015-07-19 01:24:29 EDT --- AFAIC all known problems are solved and merged. A lot of them thanks for QE helping on pre-integration. This can be moved to ON_QA and if it is blecked on another bug, let it be so. --- Additional comment from Michael Burman on 2015-07-29 08:29:08 EDT --- RHEL 6.7 and 7.1 tested and verified on vt16.3 --> vdsm-4.16.23-1. RHEV-H latest(vdsm-4.16.23-1) 6.7 and 7.1 failed QA. Working with DEV on the fix. RHEL tests: 1) BAsic flow and testing: Base tests for rhel 6.7 and 7.1 - Clean servers with vdsm.4.16.23-1 installed, network configurations via Setup Networks. Testing ifcfg-* generated by vdsm with ONBOOT=yes restart network service doesn't breaks the network configuration on server, as well reboots. PASS 2) Red Hat Enterprise Linux Server release 6.7 (Santiago) vdsm-4.14.18-7.el6ev.x86_64(3.4.5) >> vdsm-4.16.23-1.el6ev.x86_64(3.5.4) [root@pink-vds2 yum.repos.d]# brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no gggg 8000.00215e3fdb2e no eth1.145 rhevm 8000.00215e3fdb2c no eth0 t1 8000.001b21d0bb4a no bond0 virbr0 8000.525400dc8e40 yes virbr0-nic - upgrade to>> vdsm-4.16.23-1.el6ev.x86_64 [root@pink-vds2 ~]# vi /etc/sysconfig/network-scripts/ifcfg-bond0 change bond mode to mode=1 - rebooted - MainThread::INFO::2015-07-21 10:40:21,584::netconfpersistence::86::root::(setBonding) Adding bond0({'nics': ['eth2', 'eth3'], 'options': 'miimon=100 mode=1'}) MainThread::INFO::2015-07-21 10:40:21,584::vdsm-restore-net-config::220::root::(_find_changed_or_missing) bond0 is different or missing from persistent configuration. current: {'nics': ['eth2', 'eth3'], 'options': 'miimon=100 mode=1'}, persisted: {u'nics': [u'eth2', u'eth3'], u'options': u'miimon=100 mode=4'} MainThread::INFO::2015-07-21 10:40:21,584::vdsm-restore-net-config::224::root::(_find_changed_or_missing) gggg was not changed since last time it was persisted, skipping restoration. MainThread::INFO::2015-07-21 10:40:21,584::vdsm-restore-net-config::224::root::(_find_changed_or_missing) rhevm was not changed since last time it was persisted, skipping restoration. MainThread::INFO::2015-07-21 10:40:21,584::vdsm-restore-net-config::224::root::(_find_changed_or_missing) t1 was not changed since last time it was persisted, skipping restoration. MainThread::DEBUG::2015-07-21 10:40:21,585::vdsm-restore-net-config::91::root::(unified_restoration) Calling setupNetworks with networks ({}) and bond ({'bond0': {u'nics': [u'eth2', u'eth3'], u'options': u'mode=4 miimon=100'}}). *vdsm touching only the change made on the bond mode and restoring it to mode=4 - second reboot - vdsm is not touching anything. no change was done. Server is up and all network configuration are OK. PASS 3) Red Hat Enterprise Linux Server release 6.7 (Santiago) vdsm-4.16.16-1.el6ev.x86_64(3.5.3) >> vdsm-4.16.23-1.el6ev.x86_64(3.5.4) [root@pink-vds2 yum.repos.d]# brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no rhevm 8000.00215e3fdb2c no eth0 t1 8000.001b21d0bb4a no bond0 t2 8000.00215e3fdb2e no eth1.151 virbr0 8000.525400deea43 yes virbr0-nic - vim /etc/sysconfig/network-scripts/ifcfg-bond0 changed bond mode to mode=1 upgrade >> vdsm-4.16.23-1.el7ev.x86_64(3.5.4) - first reboot after upgrade MainThread::INFO::2015-07-21 13:35:38,051::netconfpersistence::75::root::(setNetwork) Adding network rhevm({'nic': 'eth0', 'mtu': '1500', 'bootproto': 'dhcp', 'stp': False, 'bridged': True, 'defaultRoute': True}) MainThread::INFO::2015-07-21 13:35:38,053::netconfpersistence::75::root::(setNetwork) Adding network t2({'nic': 'eth1', 'vlan': '151', 'mtu': '1500', 'bootproto': 'none', 'stp': False, 'bridged': True, 'defaultRou te': False}) MainThread::INFO::2015-07-21 13:35:38,053::netconfpersistence::75::root::(setNetwork) Adding network t1({'mtu': '1500', 'bonding': 'bond0', 'bootproto': 'none', 'stp': False, 'bridged': True, 'defaultRoute': False }) MainThread::INFO::2015-07-21 13:35:38,053::netconfpersistence::86::root::(setBonding) Adding bond0({'nics': ['eth2', 'eth3'], 'options': 'miimon=100 mode=1'}) MainThread::INFO::2015-07-21 13:35:38,053::vdsm-restore-net-config::220::root::(_find_changed_or_missing) bond0 is different or missing from persistent configuration. current: {'nics': ['eth2', 'eth3'], 'options': 'miimon=100 mode=1'}, persisted: {u'nics': [u'eth2', u'eth3'], u'options': u'miimon=100 mode=4'} MainThread::INFO::2015-07-21 13:35:38,054::vdsm-restore-net-config::224::root::(_find_changed_or_missing) rhevm was not changed since last time it was persisted, skipping restoration. MainThread::INFO::2015-07-21 13:35:38,054::vdsm-restore-net-config::224::root::(_find_changed_or_missing) t2 was not changed since last time it was persisted, skipping restoration. MainThread::INFO::2015-07-21 13:35:38,054::vdsm-restore-net-config::224::root::(_find_changed_or_missing) t1 was not changed since last time it was persisted, skipping restoration. MainThread::DEBUG::2015-07-21 13:35:38,054::vdsm-restore-net-config::91::root::(unified_restoration) Calling setupNetworks with networks ({}) and bond ({'bond0': {u'nics': [u'eth2', u'eth3'], u'options': u'mode=4 miimon=100'}}). *vdsm touching only the change made on the bond mode and restoring it to mode=4 - second reboot - vdsm is not touching anything. no change was done. Server is up and all network configurations are OK. PASS 4) Red Hat Enterprise Linux Server release 7.1 (Maipo) vdsm-4.16.16-1.el7ev.x86_64(3.5.3) >> vdsm-4.16.23-1.el7ev.x86_64(3.5.4) [root@navy-vds1 ~]# brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no rhevm 8000.00145edd0924 no enp4s0 t1 8000.001018244afc no bond0 t2 8000.00145edd0926 no enp6s0.151 - vim /etc/sysconfig/network-scripts/ifcfg-bond0 changed bond mode to mode=1 - upgrade >> vdsm-4.16.23-1.el7ev.x86_64 - first reboot after upgrade: MainThread::INFO::2015-07-21 14:28:16,100::netconfpersistence::75::root::(setNetwork) Adding network rhevm({'nic': 'enp4s0', 'mtu': '1500', 'bootproto': 'dhcp', 'stp': False, 'bridged': True, 'defaultRoute': True} ) MainThread::INFO::2015-07-21 14:28:16,101::netconfpersistence::75::root::(setNetwork) Adding network t2({'nic': 'enp6s0', 'vlan': '151', 'mtu': '1500', 'bootproto': 'none', 'stp': False, 'bridged': True, 'defaultR oute': False}) MainThread::INFO::2015-07-21 14:28:16,101::netconfpersistence::75::root::(setNetwork) Adding network t1({'mtu': '1500', 'bonding': 'bond0', 'bootproto': 'none', 'stp': False, 'bridged': True, 'defaultRoute': False }) MainThread::INFO::2015-07-21 14:28:16,101::netconfpersistence::86::root::(setBonding) Adding bond0({'nics': ['ens2f0', 'ens2f1'], 'options': 'miimon=100 mode=1'}) MainThread::INFO::2015-07-21 14:28:16,102::vdsm-restore-net-config::220::root::(_find_changed_or_missing) bond0 is different or missing from persistent configuration. current: {'nics': ['ens2f0', 'ens2f1'], 'optio ns': 'miimon=100 mode=1'}, persisted: {u'nics': [u'ens2f0', u'ens2f1'], u'options': u'miimon=100 mode=4'} MainThread::INFO::2015-07-21 14:28:16,102::vdsm-restore-net-config::224::root::(_find_changed_or_missing) rhevm was not changed since last time it was persisted, skipping restoration. MainThread::INFO::2015-07-21 14:28:16,102::vdsm-restore-net-config::224::root::(_find_changed_or_missing) t2 was not changed since last time it was persisted, skipping restoration. MainThread::INFO::2015-07-21 14:28:16,102::vdsm-restore-net-config::224::root::(_find_changed_or_missing) t1 was not changed since last time it was persisted, skipping restoration. MainThread::DEBUG::2015-07-21 14:28:16,102::vdsm-restore-net-config::91::root::(unified_restoration) Calling setupNetworks with networks ({}) and bond ({'bond0': {u'nics': [u'ens2f0', u'ens2f1'], u'options': u'mod e=4 miimon=100'}}). *vdsm touching only the change made on the bond mode and restoring it to mode=4 - second reboot - vdsm is not touching anything. no change was done. Server is up and all network configurations are OK. PASS * All rhev-h scenarios failed.
*** Bug 1249396 has been marked as a duplicate of this bug. ***