Bug 1194068 - vdsm-3.5 network conf upgrade fails, due to `service network restart` by node
Summary: vdsm-3.5 network conf upgrade fails, due to `service network restart` by node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: rhev-hypervisor
Version: 3.5.0
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Fabian Deutsch
QA Contact: cshao
URL:
Whiteboard:
: 1195989 (view as bug list)
Depends On: 1270177
Blocks: 1206536
TreeView+ depends on / blocked
 
Reported: 2015-02-18 22:59 UTC by Robert McSwain
Modified: 2021-08-30 12:38 UTC (History)
33 users (show)

Fixed In Version: ovirt-node-3.3.0-0.10.20150928gite7ee3f1
Doc Type: Bug Fix
Doc Text:
Previously, a race between services during boot prevented network configuration from upgrading correctly. The risk for the race has now been reduced significantly to allow the upgrade of the network configuration to complete correctly.
Clone Of:
: 1206536 (view as bug list)
Environment:
Last Closed: 2016-03-09 14:46:16 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:
ylavi: Triaged+


Attachments (Terms of Use)
bond-issue.tag.gz (6.35 MB, application/x-gzip)
2015-03-24 08:17 UTC, cshao
no flags Details
network-fixed.png (177.09 KB, image/png)
2015-03-24 09:55 UTC, cshao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43254 0 None None None 2021-08-30 12:38:03 UTC
Red Hat Knowledge Base (Solution) 1359783 0 None None None Never
Red Hat Product Errata RHSA-2016:0379 0 normal SHIPPED_LIVE Important: rhev-hypervisor security, bug fix and enhancement update 2016-03-09 19:10:28 UTC
oVirt gerrit 38979 0 master MERGED init: Move the on-boot hook to the end of post 2021-02-11 16:01:54 UTC
oVirt gerrit 39212 0 None MERGED logger: replace .debug statement for .warning 2021-02-11 16:01:54 UTC
oVirt gerrit 39248 0 master MERGED init: Move on-boot hooks to a reachable place 2021-02-11 16:01:54 UTC
oVirt gerrit 39263 0 ovirt-3.5 MERGED init: Move the on-boot hook to the end of post 2021-02-11 16:01:55 UTC
oVirt gerrit 39264 0 ovirt-3.5 MERGED init: Move on-boot hooks to a reachable place 2021-02-11 16:01:55 UTC

Description Robert McSwain 2015-02-18 22:59:44 UTC
Description of problem:
The network interfaces on RHEV hypervisors that have been upgraded to boot image 20150128 are being set to ONBOOT=no after being upgraded and when they are rebooted.

Version-Release number of selected component (if applicable):
RHEV Hypervisor 20150128


Steps to Reproduce:
RHEV-H - We were fully updated to build image 20150123 prior to starting the updates to build image 20150128. After installing via the RHEV-M the ONBOOT was set to no.

RHEV-M - We were running 3.4.5 prior to the upgrade to 3.5. In case it matters, this RHEV-M system was original installed in April 2012 using 3.0.

Actual results:
ONBOOT parameter set to no

Expected results:
ONBOOT stays as the previously set value of yes

Additional info:

Comment 2 Fabian Deutsch 2015-02-19 08:55:37 UTC
Looking at the log files I do not see ONBOOT=no at a first glance.
Never the less, node does not set ONBOOT=no (only ONBOOT=yes), and because it's after registration I tend to look at vdsm.

Dan, does this look like a vdsm issue?

Comment 4 cshao 2015-02-25 06:27:46 UTC
RHEV-H QE can't reproduce this bug with following steps:

Test version:
RHEV-H 6.6-20150123.2.el6ev
RHEV-H 6.6-20150128.0.el6ev
RHEVM vt13.11 (3.5.0-0.32.el6ev)
I noticed your RHEV-M system was original installed in April 2012 using 3.0, but we have no such env, so I just used the latest RHEVM 3.5 for testing.

Test steps:
1. Install RHEV-H 6.6-20150123.2.el6ev.
2. Register to RHEVM and set to up.
3. Upgrade RHEV-H 6.6-20150123.2.el6ev to RHEV-H 6.6-20150128.0.el6ev via RHEVM.
4. Check ONBOOT parameter

Test result:
1. Upgrade via RHEVM can succeed.
2. RHEV-H can up, ONBOOT parameter still show as "yes".

Hi Rmcswain,

Could you please help us provide the details steps to reproduce this bug? can you reproduce this issue one hundred percent?

Thanks!

Comment 5 Mike Fagan 2015-02-26 17:34:39 UTC
We're having this same problem.  I'm upgrading hypervisors from  6.5-20140603.2.el6ev to 6.6-20150128.0.el6ev in 3.4.1 in preparation for upgrading to 3.5. 

After the upgrade all networks except for the RHEVM network are marked as ONBOOT=no where they were not previously configured that way.  The rest of the network config (IP addresses, bonds etc) is as before the upgrade and running an ifup on those interfaces activates them.  

Going to the network configuration for the host in RHEV-M and clicking verify and save doesn't change anything.

I've manually gone in to the /etc/sysconfig/network-scripts files and changed them all back to ONBOOT=yes

This cluster was originally built on 3.4 around 8 months ago.

Comment 6 Fabian Deutsch 2015-02-26 19:27:37 UTC
Can someone please provide the contents of /config and /var/lib/vdsm?

Comment 7 Dan Kenigsberg 2015-02-27 12:16:22 UTC
rhev-3.5 hosts intentionally set ONBOOT=no on all networks, except for the management one (rhevm).

This is done in order to move away from ifcfg-based persistence of networking to unified persistence. The network service is expected to start only the management network; other networks are to be started by vdsm seconds later.

I understand that vdsm failed to do that for one reason or another. Could you provide the output of `vdsClient -s 0 getVdsCaps` after boot to verify that? Please provide vdsm.log and supervdsm.log of the time of upgrade - I'm trying to download the whole sosreport you provided, but d/l is slow and failing.

Comment 8 Dan Kenigsberg 2015-02-27 12:22:47 UTC
As a workaround, set net_persistence=ifcfg in /etc/vdsm/vdsm.conf and restart vdsmd. After that, vdsm would keep ONBOOT=yes on ifcfg files. Please report if this works for the customer at hand.

Comment 9 Fabian Deutsch 2015-02-27 12:58:38 UTC
According to comment 7 this bug is related to vdsm, thus changing the component, and raising the priority because it is breaking updates.

Comment 10 cshao 2015-02-28 03:33:22 UTC
Cancel needinfo due to current component is vdsm.

Comment 20 Dan Kenigsberg 2015-03-09 16:08:54 UTC
Mike, ONBOOT=no does not surprise me. I *am* surprised that Vdsm does not set up networking properly slightly afterwards. Could you provide your supervdsm.log?

Comment 22 Marina Kalinin 2015-03-09 20:56:45 UTC
I just tried this upgrade with host having rhevm logical network on bond mode1, WITHOUT vlan tagging. And my upgrade was successful.
-> seems like vlan tags are the ones causing the problems.
Also from this:
https://bugzilla.redhat.com/show_bug.cgi?id=1154399#c19

Comment 23 Marina Kalinin 2015-03-17 20:54:07 UTC
I am wondering if this bug is a duplicate of this:
https://bugzilla.redhat.com/show_bug.cgi?id=1194553

Comment 24 Yaniv Lavi 2015-03-18 12:49:00 UTC
*** Bug 1195989 has been marked as a duplicate of this bug. ***

Comment 25 Roman Hodain 2015-03-20 09:19:47 UTC
Hi Fabian,

can you please check with Danken if the ifcfg scripts should be set as ONBOOT=on or not as vdsm handles the networks on its own and properly sets the networks up.

The problem is the ovirt-post service


/usr/libexec/ovirt-init-functions.sh:
   1365 start_ovirt_post() {
...
   1458         # Small hack to fix https://bugzilla.redhat.com/show_bug.cgi?id=805313
   1459         service network restart 2>/dev/null

This basically resets the networking and follows the ifcfg scripts.

Comment 26 Fabian Deutsch 2015-03-20 13:23:51 UTC
The /etc/rc* dirs contain:

98ovirt-post
99vdsmd

Thus ovirt-post is here always started before vdsmd.

But, we have a hook in ovirt-node-plugin-vdsm which is starting vdsmd during the run of ovirt-post.
And the service network restart is also happening during ovirt-post.

Comment 27 Fabian Deutsch 2015-03-20 13:25:59 UTC
But if vdsm takes care to unpersist all ifcfg-* files, then there shouldn't be any ifcfg-* files around.

Comment 28 Fabian Deutsch 2015-03-20 13:57:21 UTC
According to Dan, mburman could reproduce this bug with the following steps:

1. Install RHEV-H 6.5
2. Configure networking with:
   network + bridge + vlan + bond via the node TUI
3. Upgrade to RHEV-H 6.6

Comment 29 Dan Kenigsberg 2015-03-20 14:52:00 UTC
(In reply to Fabian Deutsch from comment #27)
> But if vdsm takes care to unpersist all ifcfg-* files, then there shouldn't
> be any ifcfg-* files around.

Vdsm upersists ifcfg-* files only in ovirt-3.5, after (what vdsm considers) a successful upgrade. The upgrade itself depends upon all network services being up. We attempted to make sure this is the case in bug 1174611. I did not anticipate the ovirt-node may restart the network service while vdsm is starting up.

Comment 30 cshao 2015-03-24 07:16:06 UTC
I can reproduce this issue with below steps:

Test version:
RHEV-H 6.5-20141017.0.el6ev
RHEV-H 6.6-20150128.0.el6ev
RHEVM vt14.1 (3.5.1-0.2.el6ev)

Test steps:
1. Install RHEV-H 6.5-20141017.0.el6ev
2. Configure networking with: bond + vlan via the node TUI
3. Register to RHEVM
4. Set host to maintenance mode and upgrade to RHEV-H 6.6-20150128.0.el6ev via RHEVM.
5. Check network configure after reboot.

Test result:
Host gone to non-operational state, bond disappeared.

Comment 31 cshao 2015-03-24 08:17:30 UTC
Created attachment 1005729 [details]
bond-issue.tag.gz

Comment 34 cshao 2015-03-24 09:55:35 UTC
Created attachment 1005772 [details]
network-fixed.png

Comment 42 Robert McSwain 2015-04-20 13:44:51 UTC
Any updates on this?

Comment 43 Ying Cui 2015-04-21 02:33:37 UTC
(In reply to Robert McSwain from comment #42)
> Any updates on this?

This bug was already cloned to 3.5.z, see bug comment 41, bug #1206536 is verified.

Comment 50 cshao 2015-10-10 06:31:40 UTC
This bug is blocked by bug 1270177, I will verify this bug after 1270177 fixed.

Comment 51 cshao 2015-12-16 03:41:22 UTC
Test version:
rhev-hypervisor7-7.2-20151201.2
rhev-hypervisor7-7.2-20151210.1
rhevm-3.6.1.3-0.1.el6

Test steps:
1. Install RHEV-H rhev-hypervisor7-7.2-20151201.2
2. Configure networking with: bond + vlan via the node TUI
3. Register to RHEVM
4. Set host to maintenance mode and upgrade to rhev-hypervisor7-7.2-20151210.1 via RHEVM.
5. Check network configure after reboot.
6. Active Host on RHEV-M side.

Test result:
RHEV-H can up after upgrade. 

So the bug is fixed, change bug status to VERIFIED.

Comment 53 errata-xmlrpc 2016-03-09 14:46:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-0379.html


Note You need to log in before you can comment on or make changes to this bug.