The same problem exists in downstream vdsm. +++ This bug was initially created as a clone of Bug #1154399 +++ Description of problem: I configure oVirt node with ovirtmgmt-interface. It's UP in ovirt-engine, but when I reboot node - it boots and after few second i lost connectivity. I connect to node via IPMI and in /etc/sysconfig/network-scripts/ there no ifcfg-bond0.X and ifcgf-ovirtmgmt. Also vdsm daemon didn't start: service vdsmd start libvirtd start/running, process 6113 vdsm: Running mkdirs vdsm: Running configure_coredump vdsm: Running configure_vdsm_logs vdsm: Running run_init_hooks vdsm: Running check_is_configured libvirt is already configured for vdsm vdsm: Running validate_configuration SUCCESS: ssl configured to true. No conflicts vdsm: Running prepare_transient_repository vdsm: Running syslog_available vdsm: Running nwfilter vdsm: Running dummybr vdsm: Running load_needed_modules vdsm: Running tune_system vdsm: Running test_space vdsm: Running test_lo vdsm: Running unified_network_persistence_upgrade vdsm: Running restore_nets Traceback (most recent call last): File "/usr/share/vdsm/vdsm-restore-net-config", line 137, in <module> restore() File "/usr/share/vdsm/vdsm-restore-net-config", line 123, in restore unified_restoration() File "/usr/share/vdsm/vdsm-restore-net-config", line 66, in unified_restoration persistentConfig.bonds) File "/usr/share/vdsm/vdsm-restore-net-config", line 91, in _filter_nets_bonds bonds[bond]['nics'], net) KeyError: u'bond0' vdsm: stopped during execute restore_nets task (task returned with error code 1). vdsm start [FAILED] It starts only if i manually restore network configutaion, delete /var/lib/vdsm/persistence/netconf and create nets_restored file in /var/lib/vdsm Version-Release number of selected component (if applicable): rpm -qa | grep vdsm vdsm-python-4.16.7-1.gitdb83943.el6.noarch vdsm-jsonrpc-4.16.7-1.gitdb83943.el6.noarch vdsm-python-zombiereaper-4.16.7-1.gitdb83943.el6.noarch vdsm-xmlrpc-4.16.7-1.gitdb83943.el6.noarch vdsm-yajsonrpc-4.16.7-1.gitdb83943.el6.noarch vdsm-4.16.7-1.gitdb83943.el6.x86_64 vdsm-cli-4.16.7-1.gitdb83943.el6.noarch cat /etc/redhat-release CentOS release 6.5 (Final) Steps to Reproduce: 1. configure network manually in /etc/sysconfig/network-scripts/ 2. add node to ovirt-engine 3. reboot node Actual results: Lost connectivity to node, vdsm scripts reset network configuration Expected results: normal reboot Additional info: --- Additional comment from Aleksandr on 2014-10-20 02:12:17 EDT --- I found the problem and how to solve it manually. I create bond interface manually by hand before installation of oVirt, and create ovirtmgmt interface manually to. After i install oVirt to node - VDSM scripts create folders with network configuration in var/lib/vdsm/persistence/netconf/ and there is folder "nets" with configutarion of ovirtmgmt interface, but there is no folder bonds for bonding configs. After creation of this folder with configuration for bond0 interface all starts to work and reboot normally. --- Additional comment from Dan Kenigsberg on 2014-10-25 20:55:59 EDT --- Toni, could you take a look? I thought http://gerrit.ovirt.org/32769 should have fixed that. --- Additional comment from Antoni Segura Puimedon on 2014-10-27 03:52:13 EDT --- @Aleksandr: Does the /etc/sysconfig/network-scripts/ifcfg-bond0 you manually create have any of these headers: - '# Generated by VDSM version' - '# automatically generated by vdsm' If that is the case, they will be removed at every boot. If that is not the case, are you calling 'persist /etc/sysconfig/network-scripts/ifcfg-bond0' in the command line after creating them? vdsm only persists the networks and bonds it creates and since ifcfg-bond0 is created by you, it assumes (wrongly or not) it will be there on boot. There are three ways to go about this: - creating the bond with vdsClient like so: vdsClient -s 0 setupNetworks bondings='{bond11:{nics:p1p3+p1p4}}' # Then create the network over it (which will persist the bond too in /var/lib/vdsm/persistence/netconf/bonds - Using the node persistence directly: persist /etc/sysconfig/network-scripts/ifcfg-bond0 - Code: Somehow detect that device configuration we depend on is not persisted and do like in the upgrade script to unified persistence. --- Additional comment from Aleksandr on 2014-10-27 04:11:43 EDT --- (In reply to Antoni Segura Puimedon from comment #3) > @Aleksandr: Does the /etc/sysconfig/network-scripts/ifcfg-bond0 you manually > create have any of these headers: > > - '# Generated by VDSM version' > - '# automatically generated by vdsm' > > If that is the case, they will be removed at every boot. If that is not the > case, are you calling 'persist /etc/sysconfig/network-scripts/ifcfg-bond0' > in the command line after creating them? > > vdsm only persists the networks and bonds it creates and since ifcfg-bond0 > is created by you, it assumes (wrongly or not) it will be there on boot. > There are three ways to go about this: > > - creating the bond with vdsClient like so: > vdsClient -s 0 setupNetworks bondings='{bond11:{nics:p1p3+p1p4}}' > # Then create the network over it (which will persist the bond too in > /var/lib/vdsm/persistence/netconf/bonds > - Using the node persistence directly: > persist /etc/sysconfig/network-scripts/ifcfg-bond0 > - Code: Somehow detect that device configuration we depend on is not > persisted and do like in the upgrade script to unified persistence. /etc/sysconfig/network-scripts/ifcfg-bond0 doesn't have such header. I create it manually before installing oVirt on this node. --- Additional comment from Dan Kenigsberg on 2014-10-29 10:08:58 EDT --- And when you run persist /etc/sysconfig/network-scripts/ifcfg-bond0 does the problem go away? If so - it's not a bug. Manual creation requires manual persistence. --- Additional comment from Dan Kenigsberg on 2014-11-17 11:25:48 EST --- (In reply to Dan Kenigsberg from comment #5) > does the problem go away? Please reopen if this is not the case. --- Additional comment from Dan Kenigsberg on 2015-02-07 12:20:56 EST --- We have heard more reports about our failure to revive bonds that where created outside Vdsm, but required by its networks. Since this is a common use case, particularly for hosted engine, it may require extreme measures such as consuming these bonds and making the ours.
*** Bug 1194267 has been marked as a duplicate of this bug. ***
Pavel, what's `chkconfig | grep network` on the affected hosts? I heard a report that `chkconfig network on` makes the problem go away. Could you verify that?
(In reply to Dan Kenigsberg from comment #2) > Pavel, what's `chkconfig | grep network` on the affected hosts? > I heard a report that `chkconfig network on` makes the problem go away. > Could you verify that? Seems like it's active already. network 0:off 1:off 2:on 3:on 4:on 5:on 6:off For RHEL host we can work this around by using ifcfg as persistence store. In that case vdsm doesn't complain with manually created bond devices
@Pavel could you grant me an access to your machine? I would like to check logs, versions etc. Then it would be great if we could try to setup machine to the state before upgrade and try to upgrade it with some changes or extra logging.
Sorry, Roman - I did not refresh my browser and did not see your recent update when I moved this bug to MODIFIED. I do not understand your report. This bug is about ifcfg-bond* files not existing upon upgrade. Is this the case with your reproduction? Can you attach the post-boot supervdsm.log? There may well be more issues regarding network upgrade on the node; I'm not sure that what you describe is the problem I'm trying to solve.
I suspect that the other problems we see are related to the fact that ovirt-node restarts networking while vdsm is starting up and performing network config upgrade. Hence this bug can go back to ON_QA.
Hi Yaniv, Please help to organize the Fixed in version and target release for this BZ, cause definitely we have some mess here. Thank you,
Please provide info needed in comment #23.
Backported patch haven't passed QA #1205711 https://bugzilla.redhat.com/show_bug.cgi?id=1205711#c10
Any updates on this and the QA process since last week?
Yes, please see the clone of this bug for the updates. Petr, can this bug get moved to MODIFIED as well?
I hope I was requested for info by mistake...
Relevant patches are merged.
How this BZ is ON_QA? Do we have a rhev-h 3.6.0 ?
(In reply to Michael Burman from comment #32) > How this BZ is ON_QA? > > Do we have a rhev-h 3.6.0 ? This affects RHEL as well, RHEV-H build should arrive soon.
Can we verify this on RHEL or we need to wait for 3.6 RHEV-H? If we need to wait for RHEV-H please remove the ON_QA from the bug.
(In reply to Meni Yakove from comment #34) > Can we verify this on RHEL or we need to wait for 3.6 RHEV-H? > If we need to wait for RHEV-H please remove the ON_QA from the bug. You need to VERIFY on both.
Verified on - 3.6.0.3-0.1.el6 and : - Red Hat Enterprise Virtualization Hypervisor release 7.2 (20151104.0.el7ev) - vdsm-4.17.10.1-0.el7ev.noarch - ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html