1491561 – Failed to add host to engine via dhcp bond(active+backup) configured by NM during anaconda installation

Bug 1491561 - Failed to add host to engine via dhcp bond(active+backup) configured by NM during anaconda installation

Summary: Failed to add host to engine via dhcp bond(active+backup) configured by NM du...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	General
Sub Component:
Version:	4.19.30
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	ovirt-4.1.7
Target Release:	4.19.35
Assignee:	Edward Haas
QA Contact:	dguo
Docs Contact:
URL:
Whiteboard:
Depends On:	1463218
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-14 07:26 UTC by dguo
Modified:	2017-11-13 12:29 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-11-13 12:29:08 UTC
oVirt Team:	Network
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.1+ rule-engine: blocker+

Attachments	(Terms of Use)
All the logs including engine.log vdsm.log host-deploy.log ifcfg-files (226.38 KB, application/x-gzip) 2017-09-14 07:26 UTC, dguo	no flags	Details
View All

Description dguo 2017-09-14 07:26:22 UTC

Created attachment 1325786 [details]
All the logs including engine.log vdsm.log host-deploy.log ifcfg-files

Description of problem:
Add host to engine over dhcp bond failed, and the bond were configured during the anaconda installation.


Version-Release number of selected component (if applicable):
redhat-release-virtualization-host-4.1-6.0.el7.x86_64
vdsm-4.19.31-1.el7ev.x86_64
Red Hat Virtualization Manager Version: 4.1.6.2-0.1.el7

How reproducible:
100%

Steps to Reproduce:
1. Install rhvh via anaconda
2. On the network page, add a dhcp bond(active+backup) contains two slave (em1 and em2)
3. After the installation, add rhvh to engine over this bond

Actual results:
1.After step#3, Failed to configure management network on the host, and bond's ip disappeared

Expected results:
1. After step#3, Add to engine successfully

Additional info:
1. Can not reproduce this issue over static bond configured during anaconda
2. This might be a regression issue since the problem was fixed after vdsm-4.19.1, please refer to bug 1400784.

Comment 1 Tomas Jelinek 2017-09-14 09:36:55 UTC

sounds like network?

Comment 2 Dan Kenigsberg 2017-09-14 20:42:24 UTC

Isn't this precisely bug 1463218, Edy ?

dguo, I don't suppose you wanted to refer to Bug 1400784 - Clarify the scenario in section 4.4. Accessing a CIFS share with SSSD.

Comment 3 Red Hat Bugzilla Rules Engine 2017-09-14 20:42:31 UTC

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 4 dguo 2017-09-15 01:44:51 UTC

(In reply to Dan Kenigsberg from comment #2)
> Isn't this precisely bug 1463218, Edy ?
> 
> dguo, I don't suppose you wanted to refer to Bug 1400784 - Clarify the
> scenario in section 4.4. Accessing a CIFS share with SSSD.

Sorry, it's Bug 1400874 - Failed to add rhvh to engine via dhcp or static bond configured by NM during anaconda

Comment 5 Edward Haas 2017-09-17 05:49:43 UTC

> Isn't this precisely bug 1463218, Edy ?
It may be, but it requires more investigation to confirm.

I know of at least 3 issues at the moment:
- NM bug that affects bonds (it should be on its way to the z-stream per what I could understand). This is the intended fix for BZ#1463218.
- The default NM bond slave ordering is not the one we use when VDSM is installed. Therefore, if the bond is setup before NM config file is updated, the slave order may be different from the one after VDSM is deployed.
- If the mac address of the bond is defined by configuration, VDSM ignores it and therefore, the mac selected for the bond may be different when VDSM is deployed. (we have an RFE for this one)

Repeating the same check we had here will help clarify things:
https://bugzilla.redhat.com/show_bug.cgi?id=1463218#c35

Comment 6 dguo 2017-09-21 07:51:53 UTC

Before adding host to engine:
[root@dell-per515-01 ~]# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 256
    link/ether 00:c0:dd:20:13:e8 brd ff:ff:ff:ff:ff:ff
3: em1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 08:9e:01:63:2c:b2 brd ff:ff:ff:ff:ff:ff
4: em2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 08:9e:01:63:2c:b2 brd ff:ff:ff:ff:ff:ff
5: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1b:21:a6:3d:7a brd ff:ff:ff:ff:ff:ff
6: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1b:21:a6:3d:7b brd ff:ff:ff:ff:ff:ff
7: p2p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1b:21:a6:64:6c brd ff:ff:ff:ff:ff:ff
8: p2p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 00:1b:21:a6:64:6d brd ff:ff:ff:ff:ff:ff
9: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 08:9e:01:63:2c:b2 brd ff:ff:ff:ff:ff:ff
    inet 10.73.73.17/22 brd 10.73.75.255 scope global dynamic bond0
       valid_lft 40924sec preferred_lft 40924sec
    inet6 2620:52:0:4948:80fa:f283:f67e:d17a/64 scope global noprefixroute dynamic 
       valid_lft 2591946sec preferred_lft 604746sec
    inet6 fe80::2db1:4a61:1481:77a8/64 scope link 
       valid_lft forever preferred_lft forever


During installing and before 120s timeout, the ip address:
[root@localhost ~]# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 256
    link/ether 00:c0:dd:20:13:e8 brd ff:ff:ff:ff:ff:ff
3: em1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether f2:ad:61:24:5f:c4 brd ff:ff:ff:ff:ff:ff
4: em2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether f2:ad:61:24:5f:c4 brd ff:ff:ff:ff:ff:ff
5: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1b:21:a6:3d:7a brd ff:ff:ff:ff:ff:ff
6: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1b:21:a6:3d:7b brd ff:ff:ff:ff:ff:ff
7: p2p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1b:21:a6:64:6c brd ff:ff:ff:ff:ff:ff
8: p2p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 00:1b:21:a6:64:6d brd ff:ff:ff:ff:ff:ff
9: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovirtmgmt state UP qlen 1000
    link/ether f2:ad:61:24:5f:c4 brd ff:ff:ff:ff:ff:ff
25: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 76:31:ee:d6:18:95 brd ff:ff:ff:ff:ff:ff
26: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether f2:ad:61:24:5f:c4 brd ff:ff:ff:ff:ff:ff
    inet 10.73.75.16/22 brd 10.73.75.255 scope global dynamic ovirtmgmt
       valid_lft 43105sec preferred_lft 43105sec
    inet6 2620:52:0:4948:f0ad:61ff:fe24:5fc4/64 scope global mngtmpaddr dynamic 
       valid_lft 2591906sec preferred_lft 604706sec
    inet6 fe80::f0ad:61ff:fe24:5fc4/64 scope link 
       valid_lft forever preferred_lft forever

At last, the ip address was disappeared. 

It seems the same issue as bug 1463218.

Comment 7 Edward Haas 2017-10-15 09:27:40 UTC

The release of the NM fix is expected on the 17th of Oct which should include NetworkManager-1.8.0-10.el7_4.

The first RHV-H image that includes it should resolve this BZ.

Comment 8 dguo 2017-10-25 10:38:22 UTC

Verified on rhvh-4.1-0.20171024.0+1

Test version:
Red Hat Virtualization Manager Version: 4.1.7.4-0.1.el7
[root@localhost ~]# rpm -q redhat-release-virtualization-host
redhat-release-virtualization-host-4.1-7.0.el7.x86_64
[root@localhost ~]# imgbase w
You are on rhvh-4.1-0.20171024.0+1
[root@localhost ~]# rpm -q NetworkManager
NetworkManager-1.8.0-11.el7_4.x86_64
[root@localhost ~]# rpm -q vdsm
vdsm-4.19.35-1.el7ev.x86_64

Test steps:
1. Install rhvh via anaconda
2. On the network page, add a dhcp bond(active+backup) contains two slave (em1 and em2)
3. After the installation, add rhvh to engine over this bond

Actual result:
1.After step #3, rhvh were added to engine successfully

Note You need to log in before you can comment on or make changes to this bug.