Created attachment 1059935 [details] Logs Description of problem: First setup networks to an upgraded rhev-h 6.7 3.5.4 from 3.5.3 ends up with libvird and vdsmd not running after reboot. We have a contradiction between /etc/udev/rules.d/70-persistent-net.rules and /etc/udev/rules.d/71-persistent-node-net.rules on the server that fails. on different files there are different names for the same mac addresses. ON the other hand, on the server where VDSM starts correctly, both files match (consist of the same rules). the consequence of such rules mismatch is UDEV failure to rename the interfaces: root@navy-vds1 tmp]# ip l . . . 3: rename3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:14:5e:dd:09:26 brd ff:ff:ff:ff:ff:ff . . from /var/log/messages: ug 6 09:03:01 localhost udevd-work[4537]: error changing netif name 'eth3' to 'eth1': Device or resource busy since vdsm (in el6) tries to load interfaces that are not already up (and eth3 is one of them) it fails to do so and crashes during bootstrap. also, attached the ifcfg files from both of the servers, which are identical apart from the MAC addresses, proving that the hosts have the same network configuration. Version-Release number of selected component (if applicable): RHEV Hypervisor - 6.7 - 20150804.0.el6ev upgrade from rhev-h 3.5.3 >> 3.5.4 6.7 vdsm-4.16.13.1-1.el6ev >> vdsm-4.16.24-2.el6ev.x86_64 RHEV Hypervisor - 6.6 - 20150512.0.el6ev >> rhev-hypervisor6-6.7-20150804.0.el6ev How reproducible: 100 on Vendor: IBM IBM System x3550 -[797842G]- Steps to Reproduce: 1. Install rhev-h 6.7 20150512.0.el6ev 2. Add server to rhev-m and configure some networks via Setup Networks 3. Upgrade to latest rhev-h 6.7 4. After first reboot, perform a change via Setup networks and reboot Actual results: libvirtd and vdsmd are not running, host got ip, but stays in non-operational state in rhev-m cause vdsmd is not running. Expected results: libvird and vdsmd should run after reboot Additional info:
Created attachment 1059936 [details] ifcfgs
After upgrading the udev rules in 70-persistent-net.rules and in 71-persistent-node-net.rules are different The udev in 6.7 assignes the mac to names differently .
To me it looks as if the "virtual" NICs (i.e. bonds) have completely different MACs on 6.7. What does "ip l" on 6.6 and 6.7 say? Ying, have you seen this before?
A suggestion from Dan is to wait for udev settle after we mount persistaned files and before we start network. But, I'm not 100% sure about the cause yet.
Created attachment 1060069 [details] the host rules
Created attachment 1060070 [details] teh persisted rules
I try the following steps three times, but can not reproduce this bug: 1, PXE install rhevh-6.6-20150512.0.el6ev.iso, configure net1 then register to RHEV-M3.5.4 2, On RHEV-M web portal, the host status is up, then create eth0 and eth2 as bond0, create Network testnet1 and drag to bond0, create Network testnet2 and drag to eth3, save. 3, All the networks are up, then reboot the system, upgrage to rhevh-6.6-20150512.0.el6ev.iso to the rhev-hypervisor6-6.7-20150804.0.iso. 4, After upgraded successful and the system up, break bond0, then create eth0 and eth3 as bond0,drag testnet2 to bond0, drag testnet1 to eth2, save. 5, All the networks are up, then reboot the system. 6, After system up, check vdsmd and libvirtd service status, all of them are running.
I also haven't been able to reproduce this
This can be a dupe of bug 1228043 - Can we get informations about the network hardware?
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 11) 06:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 11) 14:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10) 14:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)
With the new ISO http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9657587 I get the same error. Tried the upgrade without the BOND and get the same error.
Created attachment 1060969 [details] sosreport
Meni, when encountering this bug, can you still roll back to the previous (BACKUP) RHEV-H (select it in the grub when booting the machine)?
Reproduce on VLAN env! Rewrite the steps in comment9: 1, PXE install rhevh-6.6-20150512.0.el6ev.iso, configure eth1 with vlan tag 20 then register to RHEV-M3.5.4 2, On RHEV-M web portal, the host status is up, then create eth0 and eth2 as bond0, create Network testnet1 and drag to bond0, create Network testnet2 and drag to eth3, save. 3, All the networks are up, then reboot the system, upgrage to rhevh-6.6-20150512.0.el6ev.iso to the rhev-hypervisor6-6.7-20150804.0.iso. 4, After upgraded successful and the system up, break bond0, then create eth0 and eth3 as bond0,drag testnet2 to bond0, drag testnet1 to eth2, save. 5, All the networks are up, then reboot the system. 6, Reboot system more than twice, check the vdsmd and libvirtd service status. All of them are not running. 7, Roll back to the previous rhevh6.6, sometimes vdsmd and libvirtd are running, sometimes are not. How reproducible: 80% Addition info: Some network configuration include rhevm, testnet2, bond0 are missing, please find the details in attachements. Replay to comment20: I roll back to the previous RHEV-H, find out that sometimes the vdsmd and libvirtd service are running correctly, sometimes are not, and the same symptom with the network configuration.
Created attachment 1061160 [details] libvirt error during system reboot
Created attachment 1061161 [details] libvirt connection error on TUI
Created attachment 1061162 [details] sosreport after install rhevh6.6
Created attachment 1061163 [details] sosreport upgrade to rhevh6.7 before reboot
Created attachment 1061164 [details] sosreport vdsmd arr libvirtd are running after rhevh6.7 reboot
> How reproducible: > 80% > Addition info: > Some network configuration include rhevm, testnet2, bond0 are missing, > please find the details in attachements. Chaofeng, the configuration files are lost after RHEV-H reboot, this is a bit different from description's bug. let's file a new bug for that defect. Thanks.
https://bugzilla.redhat.com/show_bug.cgi?id=1252268(In reply to Ying Cui from comment #28) > > How reproducible: > > 80% > > Addition info: > > Some network configuration include rhevm, testnet2, bond0 are missing, > > please find the details in attachements. > > Chaofeng, the configuration files are lost after RHEV-H reboot, this is a > bit different from description's bug. let's file a new bug for that defect. > Thanks. New bug to trace network configurations are lost issue: https://bugzilla.redhat.com/show_bug.cgi?id=1252268
*** Bug 1252268 has been marked as a duplicate of this bug. ***
Verified on - 3.6.0.3-0.1.el6 and: - Red Hat Enterprise Virtualization Hypervisor release 7.2 (20151104.0.el7ev) - vdsm-4.17.10.1-0.el7ev.noarch - ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch - libvirt-1.2.17-13.el7.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0378.html