Bug 1443347 - Adding rhvh-4.1-20170417.0 to engine failed with bond(active+backup) configured by cockpit
Summary: Adding rhvh-4.1-20170417.0 to engine failed with bond(active+backup) configur...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
low
urgent
Target Milestone: ovirt-4.2.0
: ---
Assignee: Edward Haas
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On: 1420708 1472965
Blocks: 1463218
TreeView+ depends on / blocked
 
Reported: 2017-04-19 06:35 UTC by dguo
Modified: 2019-05-16 13:06 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
: 1463218 (view as bug list)
Environment:
Last Closed: 2018-05-15 17:51:25 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine log (403.66 KB, text/plain)
2017-04-19 06:38 UTC, dguo
no flags Details
network-scripts (88.08 KB, application/x-gzip)
2017-04-19 06:50 UTC, dguo
no flags Details
vdsm logs and deploy log (39.53 KB, application/x-gzip)
2017-04-19 06:53 UTC, dguo
no flags Details
network-scripts in previous build 0403 (44.24 KB, application/x-gzip)
2017-04-19 08:24 UTC, dguo
no flags Details
vdsm.log, hosted-engine.log, ifcfg files (47.15 KB, application/x-bzip)
2017-04-20 02:43 UTC, Yihui Zhao
no flags Details
All files of 04017 (144.11 KB, application/x-gzip)
2017-04-20 03:04 UTC, dguo
no flags Details
All files of 0403 (139.64 KB, application/x-gzip)
2017-04-20 03:05 UTC, dguo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:1489 0 None None None 2018-05-15 17:53:40 UTC
oVirt gerrit 77933 0 master MERGED net,static: Configure NM with slaves-order=name 2017-06-19 05:52:50 UTC
oVirt gerrit 78362 0 ovirt-4.1 MERGED net,static: Configure NM with slaves-order=name 2017-06-23 05:06:35 UTC

Description dguo 2017-04-19 06:35:21 UTC
Description of problem:
Adding rhvh-4.1-20170417.0 to engine failed with bond(active+backup) configured by cockpit

Version-Release number of selected component (if applicable):
Red Hat Virtualization Manager Version: 4.1.1.8-0.1.el7
redhat-virtualization-host-4.1-20170417.0.x86_64
imgbased-0.9.23-0.1.el7ev.noarch
vdsm-4.19.10.1-1.el7ev.x86_64
cockpit-ovirt-dashboard-0.10.7-0.0.17.el7ev.noarch
cockpit-system-135-4.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install a rhvh4.1
2. Configure bond0(active+backup) via cockpit on rhvh4.1
3. Add this host to engine4.1

Actual results:
1.After step#3, adding failed. During installing process, the ip address was changed, and after adding failed, the ip address was disappear

Expected results:
1. After step#3, the host can be added successfully

Additional info:
1. Regression since no such issue in previous build
2. Tested with (vlan over bond) configured by cockpit, also adding failed
3. Bond configured by ifcfg-files manually can be added successfully

Comment 1 dguo 2017-04-19 06:38:08 UTC
Created attachment 1272497 [details]
engine log

Comment 2 Ryan Barry 2017-04-19 06:39:07 UTC
Is this the same version of rhvm?

Can you grab the engine log and the generated ifcfg files from the previous RHVH build and this one?

The problem is either engine or platform cockpit, but this information is needed for root cause analysis

Comment 3 dguo 2017-04-19 06:50:48 UTC
Created attachment 1272500 [details]
network-scripts

Comment 4 dguo 2017-04-19 06:53:47 UTC
Created attachment 1272501 [details]
vdsm logs and deploy log

Comment 5 cshao 2017-04-19 07:00:40 UTC
NOTE for Regression & Testblocker:
No such issue on previous version(redhat-virtualization-host-4.1-20170413.) and this bug will block add RHVH to engine with bond configured test scenario.

Comment 6 Ryan Barry 2017-04-19 07:08:05 UTC
Same engine version? This is critical.

Absolutely nothing changed in RHVH which would affect this (in general, but especially from 0413 to 0417). If the interface comes up properly in Cockpit, I'd also expect engine to work.

Comment 7 dguo 2017-04-19 08:22:53 UTC
(In reply to Ryan Barry from comment #6)

Ryan,

> Same engine version? This is critical.
Yes, same version rhvm-4.1.1.8-0.1.el7

> 
> Absolutely nothing changed in RHVH which would affect this (in general, but
> especially from 0413 to 0417). If the interface comes up properly in
> Cockpit, I'd also expect engine to work.

I should correct comments 5 rhvh verison to rhvh-4.1-20170403, not 0413. Since respin in 0413, I did not test this bond scenario there, thus, finally found in 0417

It should be noted that there is a big change between 0403 to 0417, which is cockpit version.

ON rhvh-4.1-20170403:
cockpit-shell-126-1.el7.noarch
cockpit-ovirt-dashboard-0.10.7-0.0.16.el7ev.noarch

On rhvh-4.1-20170417:
cockpit-ovirt-dashboard-0.10.7-0.0.17.el7ev.noarch
cockpit-system-135-4.el7.noarch

and there is a bug fix for network issue in Cockpit 132, which might affect.
https://bugzilla.redhat.com/show_bug.cgi?id=1395108
https://bugzilla.redhat.com/show_bug.cgi?id=1420708

Comment 8 dguo 2017-04-19 08:24:57 UTC
Created attachment 1272520 [details]
network-scripts in previous build 0403

Comment 9 Ryan Barry 2017-04-19 13:07:48 UTC
(In reply to dguo from comment #7)
> https://bugzilla.redhat.com/show_bug.cgi?id=1395108
> https://bugzilla.redhat.com/show_bug.cgi?id=1420708

Since this actually works until an attempt to register to engine is made, I expect that Cockpit is actually working here, and the problem is some confusion in the ifcfg scripts, but I'm looking

Comment 10 Ryan Barry 2017-04-19 13:27:33 UTC
It appears that host-deploy is not adding the vlan to ovirtmgmt.

This makes comparison difficult, though, since the previous ifcfg scripts do not contain a VLAN config. Can you please attach new ifcfgs with a matching config?

If "network-scripts.after_add" is without a vlan (ifcfg-bond0 has no vlan config here), then the attachment is more confusing, since before_add has a vlan...

Comment 11 Ryan Barry 2017-04-19 13:34:47 UTC
engine.log and vdsm.log both have messages about SSL handshake errors rather than 'no route to host', so networking is probably up.

Can you please provide the following:

Configure a system with a bond OR bond+vlan, but keep the configuration the same:

ifcfg files 0403 before and after add
ifcfg files 0417 before and after add

host-deploy, vdsm, and engine logs from the failed addition

Comment 12 Yihui Zhao 2017-04-20 02:43:39 UTC
Created attachment 1272833 [details]
vdsm.log, hosted-engine.log, ifcfg files

Comment 13 Yihui Zhao 2017-04-20 02:47:18 UTC
Deploy the HE with bond(bond+vlan) during the bond's ip changed.

Upload the vdsm.log , hosted-engine.log, ifcfg files(before setup bond0), ifcfg files(setup bond0), ifcfg files(deploy HE failed).

Attachment : https://bugzilla.redhat.com/attachment.cgi?id=1272833

Comment 14 Yihui Zhao 2017-04-20 03:02:41 UTC
(In reply to Yihui Zhao from comment #13)
> Deploy the HE with bond(bond+vlan) during the bond's ip changed.
> 
> Upload the vdsm.log , hosted-engine.log, ifcfg files(before setup bond0),
> ifcfg files(setup bond0), ifcfg files(deploy HE failed).
> 
> Attachment : https://bugzilla.redhat.com/attachment.cgi?id=1272833

So, the bug will also block HE testing (HE with bond or bond+vlan).

Comment 15 dguo 2017-04-20 03:04:21 UTC
Created attachment 1272840 [details]
All files of 04017

Comment 16 dguo 2017-04-20 03:05:01 UTC
Created attachment 1272841 [details]
All files of 0403

Comment 17 dguo 2017-04-20 03:10:00 UTC
(In reply to Ryan Barry from comment #11)
> engine.log and vdsm.log both have messages about SSL handshake errors rather
> than 'no route to host', so networking is probably up.
> 
> Can you please provide the following:
> 
> Configure a system with a bond OR bond+vlan, but keep the configuration the
> same:
> 
> ifcfg files 0403 before and after add
> ifcfg files 0417 before and after add
> 
> host-deploy, vdsm, and engine logs from the failed addition

Ryan, Attach all files required, and clarify them into 0403 and 0417.

Comment 18 dguo 2017-04-20 03:39:06 UTC
From all tests did on 0417, we observed the following phenomenon:
1. Create bond0 over em1 + em2(em1 was set to master slave), The bond0 got the em2's mac, which ip was 10.73.131.184. 
2. Add host over bond0, during the installation, the bond0's mac was changed to em1's, which ip was 10.73.131.65.
3. After adding failed, the bond0's ip was disappear. 

But for tests did on 0403:
1. Bond0 got em1(master)'s mac, which ip was 10.73.131.65.
2. Add host over bond0, the mac there was not changed, and the ip was always 10.73.131.65

Comment 20 Ryan Barry 2017-04-25 15:06:07 UTC
Reassigning to vdsm for tracking.

The cause of this seems to be a known problem with NM/cockpit changing IPs if the active mac changes. There are workaround for this.

Comment 21 Edward Haas 2017-06-07 11:20:23 UTC
The proposed patch (https://gerrit.ovirt.org/77933) should be suitable for RHVH, as the VDSM has been already installed on it with the NM configuration file.

Note that the NM configuration that enables adding slaves to a bond in the order of the slaves names (same as initscripts order) will be available in RHEL 7.4, with NM version 1.8.

Comment 23 cshao 2017-07-05 02:49:58 UTC
Move to Modify status due to no 4.2 build available to verify this bug.

Comment 25 dguo 2017-11-03 07:49:05 UTC
Verified on build rhvh-4.2-0.20171102.0+1 over a bond without specified mac address on cockpit

Test version:
vdsm-4.20.6-1.el7ev.x86_64
rhvh-4.2-0.20171102.0+1
rhvm: 4.2.0-0.4.master.el7
NetworkManager-1.8.0-11.el7_4.x86_64

Test steps:
1. Install rhvh via pxe
2. Login to cockpit, enter into Network page
3. Setup a dhcp bond(active+backup mode) over two nics, do not specify mac address
4. Add rhvh to engine over the bond 

Actual result:
1. After step#4, rhvh were added to engine successfully, status is up

Additional info:
1. If specify the mac address on cockpit while configuring bond, add can be failed , which was tracked in bug 1422430.

Comment 27 Ryan Barry 2018-03-23 00:55:03 UTC
Edward, can you please check that there are no regressions here?

See https://bugzilla.redhat.com/show_bug.cgi?id=1556666#c14

If a MAC is specified, everything goes haywire rebooting -- the IP changes, NM reports that ifcfg files are removed and NM is reloaded, etc.

This only happens on the first reboot, and everything works (with the changed IP) every time after that.

See https://bugzilla.redhat.com/show_bug.cgi?id=1556666#c8 for relevant entries

Comment 28 Edward Haas 2018-03-23 06:24:15 UTC
We have covered several bond related issues in 4.2, some have been triggered by RHEL 7.5.
This BZ has been verified on 4.2 while the new BZ is on 4.1.
If I'm not mistaken, the target version for this BZ is 4.2 anyway.

I'm guessing that some new point has been touched.
I would suggest checking if this problem exists on 4.2.
If this is a 4.1 only problem, then we can assume it was resolved in 4.2 and can negotiate backporting some changes.
Here is one that smells related: https://gerrit.ovirt.org/#/c/83399
But there are probably some others as well.

Comment 34 errata-xmlrpc 2018-05-15 17:51:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1489

Comment 35 Franta Kust 2019-05-16 13:06:06 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.