Description of problem: One of the two identical networks are missed from the host after reboot Version-Release number of selected component (if applicable): rhev-hypervisor7-7.1-20150603.0.iso How reproducible: 100% Steps to Reproduce: 1. Create 2 rhev networks using UI (no ip addresses assigned) 2. Attach the networks to hypervisor and activate it 3. Put host to maintenance and reboot 4. Activate host Actual results: Only one NIC is up Expected results: Both NIC should be UP Additional info: This is use case for fcoe multipathing
Created attachment 1049731 [details] var log content
(In reply to Dan Kenigsberg from comment #8#BZ1237212) > Is this issue a 3.5.0 regression? If so, I guess it is yet another > consequence of Bug 1203422 (which is to be fixed in 3.5.4). > > Both ifcfg-em1 and ifcfg-em2 have ONBOOT=no, and are taken up by vdsm too > late (after fcoe have failed to start on top them). > > # Generated by VDSM version 4.16.20-1.el7ev > DEVICE=em1 > HWADDR=XXXX > ONBOOT=no > MTU=9000 > DEFROUTE=no > NM_CONTROLLED=no > > However, I don't understand where the symmetry between em1 and em2 breaks. Hi Dan, I have reproduced the issue at home without enabling fcoe/lldpad even. The only problem I can see is source based route is the same for both networks...
It's weird. I did more tests. Seems like the network is missed randomly.
Pavel, can you attach the ifcfg files after reboot?
Created attachment 1049827 [details] ifcfg files after reboot NOTE:ifcfg-fabric2 and ifcfg-ens9 appears later than fabric1 and ens8.
(In reply to Pavel Zhukov from comment #5) > > NOTE:ifcfg-fabric2 and ifcfg-ens9 appears later than fabric1 and ens8. Could you rephrase that? Does the network fabric2 come up LATE, or not at all? In the logs I find MainThread::ERROR::2015-07-03 12:43:37,492::__init__::53::root::(__exit__) Failed rollback transaction last known good network. ERR=%s Traceback (most recent call last): File "/usr/share/vdsm/network/api.py", line 694, in setupNetworks File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 833, in updateDevices File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 739, in get File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 565, in _bridgeinfo File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 177, in ports OSError: [Errno 2] No such file or directory: '/sys/class/net/fabric2/brif' which suggests that we have a race between the two lines addNetwork(network, configurator=configurator, implicitBonding=True, _netinfo=_netinfo, **d) _netinfo.updateDevices() # Things like a bond mtu can change apparently, addNetwork returns before the bridge device exists in the hosts kernel. Somehow, this race shows up during boot (when the host is busy doing other things?). When bug 1203422 is fixed, this would become less of an issue (as we would commonly not call addNetwork on boot). The race should be understood and fixed, regardless.
(In reply to Dan Kenigsberg from comment #6) > (In reply to Pavel Zhukov from comment #5) > > > > NOTE:ifcfg-fabric2 and ifcfg-ens9 appears later than fabric1 and ens8. > > Could you rephrase that? Does the network fabric2 come up LATE, or not at > all? Not at all. It comes up after reboot.
Reproduced again. Steps to reproduce: 1) Install RHEVH with 3 interfaces 2) Configure 1st interface as management 3) Add two bridgeless "dummy" networks (no gateway, no IPs) 4) Configure hosts as fcoe client https://access.redhat.com/solutions/1268183 5) Activate host 6) Reboot the host Actual result: One of two network interfaces is down in random manner (one of them or neither one of them) RHEL6 based hypervisor works fine. Only RHEL7.1 is affected
Dan, The issue is reproduced without bridges at all. I think the summary is not correct...
Created attachment 1053085 [details] logs
Created attachment 1053086 [details] vdsm persistebt
in ovirt.log Jul 15 14:31:35 Hardware virtualization detected Restarting network (via systemctl): [ OK ] while messages have Jul 15 14:30:47 rhevh7 systemd: Starting Virtual Desktop Server Manager network restoration... Jul 15 14:31:47 rhevh7 systemd: Failed to start Virtual Desktop Server Manager network restoration. on the very same time. This cannot work. we must make sure that ovirt restarts its network way before it allows vdsm-network to run. Fabian, I've think we've seen this before. Do you recall?
ovirt#43781 fixed the bug for me. Tested using the same system as in https://bugzilla.redhat.com/show_bug.cgi?id=1240921#c10
Verified on - 3.6.1.1-0.1.el6 with: vdsm-4.17.12-0.el7ev.noarch Red Hat Enterprise Virtualization Hypervisor (Beta) release 7.2 (20151201.2.el7ev) ovirt-node-3.6.0-0.23.20151201git5eed7af.el7ev.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0378.html