Bug 1414323 - Failed to add host to engine via bond+vlan configured by NM during anaconda
Summary: Failed to add host to engine via bond+vlan configured by NM during anaconda
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ovirt-4.1.1
: 4.19.5
Assignee: Edward Haas
QA Contact: dguo
URL:
Whiteboard:
Depends On:
Blocks: 1447739
TreeView+ depends on / blocked
 
Reported: 2017-01-18 09:57 UTC by dguo
Modified: 2019-04-28 14:17 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1447739 (view as bug list)
Environment:
Last Closed: 2017-04-21 09:36:03 UTC
oVirt Team: Network
rule-engine: ovirt-4.1+
rule-engine: blocker+
ylavi: testing_plan_complete?


Attachments (Terms of Use)
all logs (917.48 KB, application/x-gzip)
2017-01-18 09:57 UTC, dguo
no flags Details
All logs in engine side and RHVH side (10.48 MB, application/x-gzip)
2017-01-19 09:16 UTC, Huijuan Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 71204 0 'None' MERGED net: vlan@bond config with ifcfg requires hwaddrs sync check 2020-09-23 09:57:50 UTC
oVirt gerrit 71570 0 'None' MERGED net: vlan@bond config with ifcfg requires hwaddrs sync check 2020-09-23 09:57:46 UTC

Description dguo 2017-01-18 09:57:37 UTC
Created attachment 1242108 [details]
all logs

Description of problem:
Failed to add rhvh4.1 to engine via the bond+vlan configured by NM during anaconda installation

Version-Release number of selected component (if applicable):
Red Hat Virtualization Manager Version: 4.1.0-0.3.beta2.el7 
redhat-virtualization-host-4.1-20170116
imgbased-0.9.4-0.1.el7ev.noarch
vdsm-4.19.1-1.el7ev.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Install a rhvh4.1 via anaconda UI
2. Enter network page, firstly setup a bond0(for IPV4 setting, disabled, for IPV6 setting, ignore), then setup a vlan bond0.50 over this bond0
3. Reboot the rhvh
4. Login to rhvh, modify "VDSM/disableNetworkManager=bool:False" in /etc/ovirt-host-deploy.conf.d/90-ngn-do-not-keep-networkmanager.conf 
5. Add host to engine

Actual results:
1. After step #5, add host to engine failed

Expected results:
1. After step #5, the host can be added to engine successfully

Additional info:
1. Both dhcp and static vlan(over bond) were failed

Comment 1 Edward Haas 2017-01-18 12:25:37 UTC
I know there was a problem with one of the vdsm-jsonrpc versions.
I think you should run with 1.3.6, please check what you run with there.

From VDSM side, there is nothing going on there, it never gets a setupNetwork command and current caps report include the bond and vlan. The ifcfg files also show that they have never been acquired by VDSM (as expected, because there was no setupNetwork issued).

Comment 2 dguo 2017-01-18 12:41:46 UTC
(In reply to Edward Haas from comment #1)
> I know there was a problem with one of the vdsm-jsonrpc versions.
> I think you should run with 1.3.6, please check what you run with there.
> 

Yes, I had notice this version issue in other mail thread, and already replaced the version to 1.3.6-1 on my test rhevm.
[root@rhvm41-vlan50-1 ~]# rpm -qa|grep vdsm-jsonrpc
vdsm-jsonrpc-java-1.3.6-1.el7ev.noarch

> From VDSM side, there is nothing going on there, it never gets a
> setupNetwork command and current caps report include the bond and vlan. The
> ifcfg files also show that they have never been acquired by VDSM (as
> expected, because there was no setupNetwork issued).

Yet, all the other scenarios can be added to engine successfully.
dhcp and static bond, dhcp and static vlan configured during anaconda installation, they can be added successfully, while only this vlan over bond is failed.

Comment 3 Edward Haas 2017-01-18 14:45:25 UTC
(In reply to dguo from comment #2)

I have tried to go over the Engine logs but they seem to be out of sync with the vdsm logs (different time periods).

Could you please send synced logs and mention from what time to look in the logs?

Comment 4 cshao 2017-01-19 03:23:29 UTC
huzhao,

Could you help to provide the logs info for #c3 due to dguo is on PTO.

Thanks.

Comment 5 Huijuan Zhao 2017-01-19 09:16:12 UTC
Created attachment 1242416 [details]
All logs in engine side and RHVH side

Comment 6 Huijuan Zhao 2017-01-19 09:39:10 UTC
(In reply to Edward Haas from comment #3)
> (In reply to dguo from comment #2)
> 
> I have tried to go over the Engine logs but they seem to be out of sync with
> the vdsm logs (different time periods).
> 
> Could you please send synced logs and mention from what time to look in the
> logs?

Edward, see attachment " All logs in engine side and RHVH side".
For the logs in engine side:
- engine.log, from 2017-01-19 03:44:32
- ovirt-host-deploy-20170119034458-192.168.50.138-52c3e735.log, record the rhvh time, from 2017-01-19 08:44:34

Comment 7 Edward Haas 2017-01-19 09:58:54 UTC
(In reply to Huijuan Zhao from comment #6)
> Edward, see attachment " All logs in engine side and RHVH side".
> For the logs in engine side:
> - engine.log, from 2017-01-19 03:44:32
> - ovirt-host-deploy-20170119034458-192.168.50.138-52c3e735.log, record the
> rhvh time, from 2017-01-19 08:44:34

Do you mean that there is a 5 hours difference between the logs?
A record on Engine at 03:44 will show up on the host side at 08:44?

Comment 8 Huijuan Zhao 2017-01-19 10:05:20 UTC
(In reply to Edward Haas from comment #7)
> Do you mean that there is a 5 hours difference between the logs?
> A record on Engine at 03:44 will show up on the host side at 08:44?

Yes

Comment 9 Edward Haas 2017-01-19 10:17:12 UTC
(In reply to Huijuan Zhao from comment #6)

In these logs we do see the setupNetwork.
It shows that the address with which the host was added (dhcp based) has changed after adding ovirtmgmt bridge. I guess the mac changed, perhaps the bond swapped its mac (with the other slave mac).
(To check this, you need to do collect the output of "ip link" before adding the host and after (in the 120sec window, as after it we rollback).

If this is the case, there is nothing much we can do. Bond mac may change and we do not control it. (Doing so is a full blown RFE)

Comment 10 Edward Haas 2017-01-19 10:19:24 UTC
Here is the relevant line from supervdsm log:

sourceRoute::INFO::2017-01-19 08:45:58,252::sourceroute::76::root::(configure) Configuring gateway - ip: 192.168.50.144, network: 192.168.50.0/24, subnet: 255.255.255.0, gateway: 192.168.50.1, table: 3232248464, device: ovirtmgmt

(Original address was 192.168.50.138)

Comment 11 Huijuan Zhao 2017-01-19 16:57:09 UTC
(In reply to Edward Haas from comment #9)

Yes, the MAC of bond0.50 and bond0 is changed during adding host to engine.

1. Before adding host to engine
# ip link
4: eno3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
12: bond0.50@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff

2. The middle status during adding host to engine
# ip link
4: eno3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
12: bond0.50@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovirtmgmt state UP mode DEFAULT qlen 1000
    link/ether 12:fb:c4:7c:bd:08 brd ff:ff:ff:ff:ff:ff
13: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 7e:c2:98:de:0d:37 brd ff:ff:ff:ff:ff:ff
14: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000
    link/ether 12:fb:c4:7c:bd:08 brd ff:ff:ff:ff:ff:ff

3. The last output after adding host to engine failed
# ip link
4: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:4f brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 08:94:ef:21:c0:50 brd ff:ff:ff:ff:ff:ff
13: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 7e:c2:98:de:0d:37 brd ff:ff:ff:ff:ff:ff
15: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT qlen 1000
    link/ether ca:73:47:3c:e2:21 brd ff:ff:ff:ff:ff:ff
16: bond0.50@bond0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT qlen 1000
    link/ether ca:73:47:3c:e2:21 brd ff:ff:ff:ff:ff:ff

Comment 12 Edward Haas 2017-01-19 18:48:15 UTC
(In reply to Huijuan Zhao from comment #11)
> (In reply to Edward Haas from comment #9)
> 
> Yes, the MAC of bond0.50 and bond0 is changed during adding host to engine.
> 

I am guessing that the middle state is during the 120sec after the setupNetwork has been applied and before the rollback.

Looks like only the VLAN mac has changed, originally it was the same as the mac of the bond itself, then, after setupNetworks it changed and is no longer the same mac as of the bond.

Comment 13 Edward Haas 2017-01-24 08:38:18 UTC
It seems that this is related to the order of actions taken to setup the device hierarchy.
With NM, the order and end result seems to be consistent where all addresses are the same.

When doing the same with the ip tool, the inconsistency due to the step order can be seen:


All mac addresses are identical:

ip link add dummy_88 type dummy
ip link add dummy_99 type dummy
ip link add bond99 type bond
ip link set dummy_88 master bond99
ip link set dummy_99 master bond99
ip link add link bond99 name bond99.101 type vlan id 101


VLAN mac address is different from all the rest:

ip link add dummy_88 type dummy
ip link add dummy_99 type dummy
ip link add bond99 type bond
ip link add link bond99 name bond99.101 type vlan id 101
ip link set dummy_88 master bond99
ip link set dummy_99 master bond99


We need to investigate how to assure that the VLAN interface is created after the slaves have been enslaved to the bond.

Comment 14 Edward Haas 2017-01-25 08:35:33 UTC
Unfortunately, I could not reproduce it.
Could you reproduce it on a VM and share the image with us?

In the original comment, it was mentioned that this fails with a static IP as well. The logs included the dhcp scenario and we identified it as related to the mac address change, but I do not expect mac changes to have the same affect.
Could you please also add logs for the static IP scenario, just to check if this is the same problem or not. Please also add the 'ip link' output as you did last time so we can see if this is the same mac address issue.

Comment 15 Edward Haas 2017-01-26 07:42:41 UTC
We seem to have managed to recreate this here with the help of mburman, for the moment we do not need the VM asked in comment 14.

Comment 16 dguo 2017-02-27 06:34:22 UTC
Just verified on rhvh-4.1-20170222.0(vdsm-4.19.6-1.el7ev)


Note You need to log in before you can comment on or make changes to this bug.