Bug 1447739 - Failed to add host to engine via bond+vlan configured by NM during anaconda
Summary: Failed to add host to engine via bond+vlan configured by NM during anaconda
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.0.7
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: dguo
URL:
Whiteboard:
Depends On: 1414323
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-03 15:50 UTC by Marina Kalinin
Modified: 2024-03-25 14:59 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1414323
Environment:
Last Closed: 2017-05-10 09:11:53 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 71204 0 None None None 2017-05-03 15:50:51 UTC
oVirt gerrit 71570 0 None None None 2017-05-03 15:50:51 UTC

Description Marina Kalinin 2017-05-03 15:50:51 UTC
+++ This bug was initially created as a clone of Bug #1414323 +++

Description of problem:
Failed to add rhvh4.1 to engine via the bond+vlan configured by NM during anaconda installation

Version-Release number of selected component (if applicable):
Red Hat Virtualization Manager Version: 4.1.0-0.3.beta2.el7 
redhat-virtualization-host-4.1-20170116
imgbased-0.9.4-0.1.el7ev.noarch
vdsm-4.19.1-1.el7ev.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Install a rhvh4.1 via anaconda UI
2. Enter network page, firstly setup a bond0(for IPV4 setting, disabled, for IPV6 setting, ignore), then setup a vlan bond0.50 over this bond0
3. Reboot the rhvh
4. Login to rhvh, modify "VDSM/disableNetworkManager=bool:False" in /etc/ovirt-host-deploy.conf.d/90-ngn-do-not-keep-networkmanager.conf 
5. Add host to engine

Actual results:
1. After step #5, add host to engine failed

Expected results:
1. After step #5, the host can be added to engine successfully

Additional info:
1. Both dhcp and static vlan(over bond) were failed

--- Additional comment from Edward Haas on 2017-01-18 07:25:37 EST ---

I know there was a problem with one of the vdsm-jsonrpc versions.
I think you should run with 1.3.6, please check what you run with there.

From VDSM side, there is nothing going on there, it never gets a setupNetwork command and current caps report include the bond and vlan. The ifcfg files also show that they have never been acquired by VDSM (as expected, because there was no setupNetwork issued).

--- Additional comment from  on 2017-01-18 07:41:46 EST ---

(In reply to Edward Haas from comment #1)
> I know there was a problem with one of the vdsm-jsonrpc versions.
> I think you should run with 1.3.6, please check what you run with there.
> 

Yes, I had notice this version issue in other mail thread, and already replaced the version to 1.3.6-1 on my test rhevm.
[root@rhvm41-vlan50-1 ~]# rpm -qa|grep vdsm-jsonrpc
vdsm-jsonrpc-java-1.3.6-1.el7ev.noarch

> From VDSM side, there is nothing going on there, it never gets a
> setupNetwork command and current caps report include the bond and vlan. The
> ifcfg files also show that they have never been acquired by VDSM (as
> expected, because there was no setupNetwork issued).

Yet, all the other scenarios can be added to engine successfully.
dhcp and static bond, dhcp and static vlan configured during anaconda installation, they can be added successfully, while only this vlan over bond is failed.

--- Additional comment from Edward Haas on 2017-01-18 09:45:25 EST ---

(In reply to dguo from comment #2)

I have tried to go over the Engine logs but they seem to be out of sync with the vdsm logs (different time periods).

Could you please send synced logs and mention from what time to look in the logs?

--- Additional comment from shaochen on 2017-01-18 22:23:29 EST ---

huzhao,

Could you help to provide the logs info for #c3 due to dguo is on PTO.

Thanks.

--- Additional comment from Huijuan Zhao on 2017-01-19 04:16 EST ---



--- Additional comment from Huijuan Zhao on 2017-01-19 04:39:10 EST ---

(In reply to Edward Haas from comment #3)
> (In reply to dguo from comment #2)
> 
> I have tried to go over the Engine logs but they seem to be out of sync with
> the vdsm logs (different time periods).
> 
> Could you please send synced logs and mention from what time to look in the
> logs?

Edward, see attachment " All logs in engine side and RHVH side".
For the logs in engine side:
- engine.log, from 2017-01-19 03:44:32
- ovirt-host-deploy-20170119034458-192.168.50.138-52c3e735.log, record the rhvh time, from 2017-01-19 08:44:34

--- Additional comment from Edward Haas on 2017-01-19 04:58:54 EST ---

(In reply to Huijuan Zhao from comment #6)
> Edward, see attachment " All logs in engine side and RHVH side".
> For the logs in engine side:
> - engine.log, from 2017-01-19 03:44:32
> - ovirt-host-deploy-20170119034458-192.168.50.138-52c3e735.log, record the
> rhvh time, from 2017-01-19 08:44:34

Do you mean that there is a 5 hours difference between the logs?
A record on Engine at 03:44 will show up on the host side at 08:44?

--- Additional comment from Huijuan Zhao on 2017-01-19 05:05:20 EST ---

(In reply to Edward Haas from comment #7)
> Do you mean that there is a 5 hours difference between the logs?
> A record on Engine at 03:44 will show up on the host side at 08:44?

Yes

--- Additional comment from Edward Haas on 2017-01-19 05:17:12 EST ---

(In reply to Huijuan Zhao from comment #6)

In these logs we do see the setupNetwork.
It shows that the address with which the host was added (dhcp based) has changed after adding ovirtmgmt bridge. I guess the mac changed, perhaps the bond swapped its mac (with the other slave mac).
(To check this, you need to do collect the output of "ip link" before adding the host and after (in the 120sec window, as after it we rollback).

If this is the case, there is nothing much we can do. Bond mac may change and we do not control it. (Doing so is a full blown RFE)

--- Additional comment from Edward Haas on 2017-01-19 05:19:24 EST ---

Here is the relevant line from supervdsm log:

sourceRoute::INFO::2017-01-19 08:45:58,252::sourceroute::76::root::(configure) Configuring gateway - ip: 192.168.50.144, network: 192.168.50.0/24, subnet: 255.255.255.0, gateway: 192.168.50.1, table: 3232248464, device: ovirtmgmt

(Original address was 192.168.50.138)

--- Additional comment from Huijuan Zhao on 2017-01-19 11:57:09 EST ---

(In reply to Edward Haas from comment #9)

Yes, the MAC of bond0.50 and bond0 is changed during adding host to engine.

1. Before adding host to engine
# ip link
4: eno3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
12: bond0.50@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff

2. The middle status during adding host to engine
# ip link
4: eno3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
12: bond0.50@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovirtmgmt state UP mode DEFAULT qlen 1000
    link/ether 12:fb:c4:7c:bd:08 brd ff:ff:ff:ff:ff:ff
13: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 7e:c2:98:de:0d:37 brd ff:ff:ff:ff:ff:ff
14: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000
    link/ether 12:fb:c4:7c:bd:08 brd ff:ff:ff:ff:ff:ff

3. The last output after adding host to engine failed
# ip link
4: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:50 brd ff:ff:ff:ff:ff:ff
13: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 7e:c2:98:de:0d:37 brd ff:ff:ff:ff:ff:ff
15: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT qlen 1000
    link/ether ca:73:47:3c:e2:21 brd ff:ff:ff:ff:ff:ff
16: bond0.50@bond0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT qlen 1000
    link/ether ca:73:47:3c:e2:21 brd ff:ff:ff:ff:ff:ff

--- Additional comment from Edward Haas on 2017-01-19 13:48:15 EST ---

(In reply to Huijuan Zhao from comment #11)
> (In reply to Edward Haas from comment #9)
> 
> Yes, the MAC of bond0.50 and bond0 is changed during adding host to engine.
> 

I am guessing that the middle state is during the 120sec after the setupNetwork has been applied and before the rollback.

Looks like only the VLAN mac has changed, originally it was the same as the mac of the bond itself, then, after setupNetworks it changed and is no longer the same mac as of the bond.

--- Additional comment from Edward Haas on 2017-01-24 03:38:18 EST ---

It seems that this is related to the order of actions taken to setup the device hierarchy.
With NM, the order and end result seems to be consistent where all addresses are the same.

When doing the same with the ip tool, the inconsistency due to the step order can be seen:


All mac addresses are identical:

ip link add dummy_88 type dummy
ip link add dummy_99 type dummy
ip link add bond99 type bond
ip link set dummy_88 master bond99
ip link set dummy_99 master bond99
ip link add link bond99 name bond99.101 type vlan id 101


VLAN mac address is different from all the rest:

ip link add dummy_88 type dummy
ip link add dummy_99 type dummy
ip link add bond99 type bond
ip link add link bond99 name bond99.101 type vlan id 101
ip link set dummy_88 master bond99
ip link set dummy_99 master bond99


We need to investigate how to assure that the VLAN interface is created after the slaves have been enslaved to the bond.

--- Additional comment from Edward Haas on 2017-01-25 03:35:33 EST ---

Unfortunately, I could not reproduce it.
Could you reproduce it on a VM and share the image with us?

In the original comment, it was mentioned that this fails with a static IP as well. The logs included the dhcp scenario and we identified it as related to the mac address change, but I do not expect mac changes to have the same affect.
Could you please also add logs for the static IP scenario, just to check if this is the same problem or not. Please also add the 'ip link' output as you did last time so we can see if this is the same mac address issue.

--- Additional comment from Edward Haas on 2017-01-26 02:42:41 EST ---

We seem to have managed to recreate this here with the help of mburman, for the moment we do not need the VM asked in comment 14.

--- Additional comment from  on 2017-02-27 01:34:22 EST ---

Just verified on rhvh-4.1-20170222.0(vdsm-4.19.6-1.el7ev)


Note You need to log in before you can comment on or make changes to this bug.