Bug 1251040 - First setup networks to an upgraded rhev-h 6.7 3.5.4 from 3.5.3 ends up with libvird and vdsmd not running after reboot
Summary: First setup networks to an upgraded rhev-h 6.7 3.5.4 from 3.5.3 ends up with ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node
Version: 3.5.4
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Anatoly Litovsky
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On:
Blocks: 1254469
TreeView+ depends on / blocked
 
Reported: 2015-08-06 12:57 UTC by Michael Burman
Modified: 2016-03-09 14:34 UTC (History)
14 users (show)

Fixed In Version: ovirt-node-3.3.0-0.4.20150906git14a6024.el7ev
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1254469 (view as bug list)
Environment:
Last Closed: 2016-03-09 14:34:22 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs (435.31 KB, application/x-gzip)
2015-08-06 12:57 UTC, Michael Burman
no flags Details
ifcfgs (526 bytes, application/x-gzip)
2015-08-06 12:58 UTC, Michael Burman
no flags Details
the host rules (909 bytes, text/plain)
2015-08-06 20:06 UTC, Anatoly Litovsky
no flags Details
teh persisted rules (909 bytes, text/plain)
2015-08-06 20:06 UTC, Anatoly Litovsky
no flags Details
sosreport (7.15 MB, application/x-xz)
2015-08-10 08:57 UTC, Anatoly Litovsky
no flags Details
libvirt error during system reboot (489.70 KB, image/png)
2015-08-10 17:01 UTC, Chaofeng Wu
no flags Details
libvirt connection error on TUI (446.08 KB, image/png)
2015-08-10 17:02 UTC, Chaofeng Wu
no flags Details
sosreport after install rhevh6.6 (6.21 MB, application/x-xz)
2015-08-10 17:03 UTC, Chaofeng Wu
no flags Details
sosreport upgrade to rhevh6.7 before reboot (6.84 MB, application/x-xz)
2015-08-10 17:04 UTC, Chaofeng Wu
no flags Details
sosreport vdsmd arr libvirtd are running after rhevh6.7 reboot (6.85 MB, application/x-xz)
2015-08-10 17:07 UTC, Chaofeng Wu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1228043 0 medium CLOSED When user upgrade rhev-h, device name is changed then extinct network setting. 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1232338 1 high CLOSED [BLOCKED] - consume fix for bug 1241712 - [was: RHEV-H 7 - The hostname is lost somewhere in the process (also prevents... 2023-09-14 03:00:44 UTC
Red Hat Bugzilla 1255849 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Product Errata RHBA-2016:0378 0 normal SHIPPED_LIVE ovirt-node bug fix and enhancement update for RHEV 3.6 2016-03-09 19:06:36 UTC
oVirt gerrit 44574 0 None None None Never
oVirt gerrit 44615 0 ovirt-3.5 ABANDONED init: Move udev rules reloading prior to network start Never
oVirt gerrit 44616 0 ovirt-3.5 ABANDONED init: Trigger a change instead of an add Never
oVirt gerrit 44621 0 master ABANDONED init: Trigger a change instead of an add Never
oVirt gerrit 44685 0 ovirt-3.5 MERGED Move udev rules reloading prior to network start Never

Internal Links: 1228043 1232338 1255849

Description Michael Burman 2015-08-06 12:57:58 UTC
Created attachment 1059935 [details]
Logs

Description of problem:
First setup networks to an upgraded rhev-h 6.7 3.5.4 from 3.5.3 ends up with libvird and vdsmd not running after reboot.

We have a contradiction between /etc/udev/rules.d/70-persistent-net.rules
and /etc/udev/rules.d/71-persistent-node-net.rules on the server that fails. on different files there are different names for the same mac addresses.
ON the other hand, on the server where VDSM starts correctly, both files match (consist of the same rules).

the consequence of such rules mismatch is UDEV failure to rename the interfaces:
root@navy-vds1 tmp]# ip l
.
.
.
3: rename3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:14:5e:dd:09:26 brd ff:ff:ff:ff:ff:ff
.
.
from /var/log/messages:
ug  6 09:03:01 localhost udevd-work[4537]: error changing netif name 'eth3' to 'eth1': Device or resource busy

since vdsm (in el6) tries to load interfaces that are not already up (and eth3 is one of them) it fails to do so and crashes during bootstrap.

also, attached the ifcfg files from both of the servers, which are identical apart from the MAC addresses, proving that the hosts have the same network configuration.

Version-Release number of selected component (if applicable):
RHEV Hypervisor - 6.7 - 20150804.0.el6ev
upgrade from rhev-h 3.5.3 >> 3.5.4 6.7  vdsm-4.16.13.1-1.el6ev >> vdsm-4.16.24-2.el6ev.x86_64
RHEV Hypervisor - 6.6 - 20150512.0.el6ev >> rhev-hypervisor6-6.7-20150804.0.el6ev

How reproducible:
100 on Vendor: IBM
IBM System x3550 -[797842G]-


Steps to Reproduce:
1. Install rhev-h 6.7 20150512.0.el6ev
2. Add server to rhev-m and configure some networks via Setup Networks
3. Upgrade to latest rhev-h 6.7
4. After first reboot, perform a change via Setup networks and reboot

Actual results:
libvirtd and vdsmd are not running, host got ip, but stays in non-operational state in rhev-m cause vdsmd is not running.

Expected results:
libvird and vdsmd should run after reboot

Additional info:

Comment 1 Michael Burman 2015-08-06 12:58:35 UTC
Created attachment 1059936 [details]
ifcfgs

Comment 2 Anatoly Litovsky 2015-08-06 15:26:42 UTC
After upgrading the udev rules in 70-persistent-net.rules 
and in  71-persistent-node-net.rules are different

The udev in 6.7 assignes the mac to names differently .

Comment 3 Fabian Deutsch 2015-08-06 19:44:14 UTC
To me it looks as if the "virtual" NICs (i.e. bonds) have completely different MACs on 6.7.

What does "ip l" on 6.6 and 6.7 say?

Ying, have you seen this before?

Comment 4 Fabian Deutsch 2015-08-06 19:59:20 UTC
A suggestion from Dan is to wait for udev settle after we mount persistaned files and before we start network.

But, I'm not 100% sure about the cause yet.

Comment 5 Anatoly Litovsky 2015-08-06 20:06:06 UTC
Created attachment 1060069 [details]
the host rules

Comment 6 Anatoly Litovsky 2015-08-06 20:06:32 UTC
Created attachment 1060070 [details]
teh persisted rules

Comment 9 Chaofeng Wu 2015-08-07 08:48:41 UTC
I try the following steps three times, but can not reproduce this bug:

1, PXE install rhevh-6.6-20150512.0.el6ev.iso, configure net1 then register to RHEV-M3.5.4
2, On RHEV-M web portal, the host status is up, then create eth0 and eth2 as bond0, create Network testnet1 and drag to bond0, create Network testnet2 and drag to eth3, save.
3, All the networks are up, then reboot the system, upgrage to rhevh-6.6-20150512.0.el6ev.iso to the rhev-hypervisor6-6.7-20150804.0.iso.
4, After upgraded successful and the system up, break bond0, then create eth0 and eth3 as bond0,drag testnet2 to bond0, drag testnet1 to eth2, save.
5, All the networks are up, then reboot the system.
6, After system up, check vdsmd and libvirtd service status, all of them are running.

Comment 10 Ryan Barry 2015-08-07 14:13:52 UTC
I also haven't been able to reproduce this

Comment 16 Fabian Deutsch 2015-08-07 18:45:21 UTC
This can be a dupe of bug 1228043 - Can we get informations about the network hardware?

Comment 17 Meni Yakove 2015-08-09 07:38:47 UTC
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 11)
06:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 11)
14:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)
14:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)

Comment 18 Meni Yakove 2015-08-09 14:05:21 UTC
With the new ISO http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9657587 I get the same error.

Tried the upgrade without the BOND and get the same error.

Comment 19 Anatoly Litovsky 2015-08-10 08:57:28 UTC
Created attachment 1060969 [details]
sosreport

Comment 20 Fabian Deutsch 2015-08-10 11:06:23 UTC
Meni, when encountering this bug, can you still roll back to the previous (BACKUP) RHEV-H (select it in the grub when booting the machine)?

Comment 22 Chaofeng Wu 2015-08-10 17:00:34 UTC
Reproduce on VLAN env! 

Rewrite the steps in comment9:
1, PXE install rhevh-6.6-20150512.0.el6ev.iso, configure eth1 with vlan tag 20 then register to RHEV-M3.5.4
2, On RHEV-M web portal, the host status is up, then create eth0 and eth2 as bond0, create Network testnet1 and drag to bond0, create Network testnet2 and drag to eth3, save.
3, All the networks are up, then reboot the system, upgrage to rhevh-6.6-20150512.0.el6ev.iso to the rhev-hypervisor6-6.7-20150804.0.iso.
4, After upgraded successful and the system up, break bond0, then create eth0 and eth3 as bond0,drag testnet2 to bond0, drag testnet1 to eth2, save.
5, All the networks are up, then reboot the system.
6, Reboot system more than twice, check the vdsmd and libvirtd service status. All of them are not running.
7, Roll back to the previous rhevh6.6, sometimes vdsmd and libvirtd are running, sometimes are not.

How reproducible:
80%

Addition info:
Some network configuration include rhevm, testnet2, bond0 are missing, please find the details in attachements.

Replay to comment20:
I roll back to the previous RHEV-H, find out that sometimes the vdsmd and libvirtd service are running correctly, sometimes are not, and the same symptom with the network configuration.

Comment 23 Chaofeng Wu 2015-08-10 17:01:44 UTC
Created attachment 1061160 [details]
libvirt error during system reboot

Comment 24 Chaofeng Wu 2015-08-10 17:02:24 UTC
Created attachment 1061161 [details]
libvirt connection error on TUI

Comment 25 Chaofeng Wu 2015-08-10 17:03:37 UTC
Created attachment 1061162 [details]
sosreport after install rhevh6.6

Comment 26 Chaofeng Wu 2015-08-10 17:04:59 UTC
Created attachment 1061163 [details]
sosreport upgrade to rhevh6.7 before reboot

Comment 27 Chaofeng Wu 2015-08-10 17:07:01 UTC
Created attachment 1061164 [details]
sosreport vdsmd arr libvirtd are running after rhevh6.7 reboot

Comment 28 Ying Cui 2015-08-11 04:57:23 UTC
> How reproducible:
> 80%
> Addition info:
> Some network configuration include rhevm, testnet2, bond0 are missing,
> please find the details in attachements.

Chaofeng, the configuration files are lost after RHEV-H reboot, this is a bit different from description's bug. let's file a new bug for that defect. Thanks.

Comment 29 Chaofeng Wu 2015-08-11 06:27:33 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1252268(In reply to Ying Cui from comment #28)
> > How reproducible:
> > 80%
> > Addition info:
> > Some network configuration include rhevm, testnet2, bond0 are missing,
> > please find the details in attachements.
> 
> Chaofeng, the configuration files are lost after RHEV-H reboot, this is a
> bit different from description's bug. let's file a new bug for that defect.
> Thanks.

New bug to trace network configurations are lost issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1252268

Comment 32 Ido Barkan 2015-08-11 14:15:56 UTC
*** Bug 1252268 has been marked as a duplicate of this bug. ***

Comment 39 Michael Burman 2015-11-08 09:05:46 UTC
Verified on - 3.6.0.3-0.1.el6 and:
- Red Hat Enterprise Virtualization Hypervisor release 7.2 (20151104.0.el7ev)
- vdsm-4.17.10.1-0.el7ev.noarch
- ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch
- libvirt-1.2.17-13.el7.x86_64

Comment 41 errata-xmlrpc 2016-03-09 14:34:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0378.html


Note You need to log in before you can comment on or make changes to this bug.