Bug 1523661 - When adding the host over an existing bond-vlan, it looses network connection after reboot.
Summary: When adding the host over an existing bond-vlan, it looses network connection...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.20.9
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.2.1
: ---
Assignee: Edward Haas
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-08 14:51 UTC by Bernhard Seidl
Modified: 2018-02-12 11:56 UTC (History)
4 users (show)

Fixed In Version: upstream 4.2.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-12 11:56:14 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
Logfiles (1.13 MB, application/x-xz)
2017-12-08 16:18 UTC, Bernhard Seidl
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 85243 0 master MERGED net: Overwrite bond ifcfg file on external bond acquire 2017-12-11 21:23:46 UTC

Description Bernhard Seidl 2017-12-08 14:51:20 UTC
Description of problem:
Cluster member looses network connection after reboot. Network is setup using bonding and vlans.

Version-Release number of selected component (if applicable):
ovirt-node-ng-installer-master-2017120709.iso
4.2.1-0.0.master.20171206161426.git88e9120.el7.centos
Might also apply to 4.2.0 branch

How reproducible:


Steps to Reproduce:
1. Setup ovirt node with bonding and one vlan on top for management on three nodes
2. Setup self hosted engine with gluster
3. Complete cluster setup by adding hosts and storage domains
4. Set one of the hosts in maintenance mode
5. Reboot this host

Actual results:
reboot completes and host is not accessible via network

Expected results:
reboot completes and hosts is available again

Additional info:
Node network:
em3 + em4 => bond0 => bond0.78 => ovirtmgmt

Comment 1 Dan Kenigsberg 2017-12-08 15:08:53 UTC
Please attach supervdsm and vdsm logs so we can look at the problem

Comment 2 Bernhard Seidl 2017-12-08 16:18:06 UTC
Created attachment 1364959 [details]
Logfiles

Comment 3 Edward Haas 2017-12-10 15:19:44 UTC
It seems that we are not acquiring correctly an external bond when it has a VLAN over it.
I'm not exactly sure why it causes a disconnection in the presented scenario, but fixing the acquirement should result in a more predictive setup after reboot.

Comment 4 Bernhard Seidl 2017-12-11 14:03:23 UTC
I just investigated a bit further. It seems that bond0 was set to be down. After executing the following command the connection is working again:

# ip link set dev bond0 up

Comment 5 Bernhard Seidl 2018-01-02 13:30:24 UTC
Just tested.

Version used: ovirt-node-ng-installer-master-2018010109

Result: Works, no errrors

Comment 6 Michael Burman 2018-01-14 10:49:11 UTC
Verified on - vdsm-4.20.13-1.el7ev.x86_64 and 
4.2.1.1-0.1.el7
cockpit-155-1.el7.x86_64

The Scenario is - 

1) Create bond1 with IP and bond1.162 on top of him with IP as well(created via cockpit). 
Both hosts has default route before add host - 
[root@silver-vdsb yum.repos.d]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.35.1x8.254   0.0.0.0         UG    300    0        0 bond1
0.0.0.0         10.35.1x9.254   0.0.0.0         UG    400    0        0 bond1.162
10.35.1x8.0     0.0.0.0         255.255.255.0   U     300    0        0 bond1
10.35.1x9.0     0.0.0.0         255.255.255.0   U     400    0        0 bond1.162

[root@silver-vdsb yum.repos.d]# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
6: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    inet 10.35.1x8.x/24 brd 10.35.1x8.255 scope global dynamic bond1
       valid_lft 42379sec preferred_lft 42379sec
7: bond1.162@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    inet 10.35.1x9.x/24 brd 10.35.1x9.255 scope global dynamic bond1.162
       valid_lft 42409sec preferred_lft 42409sec

[root@silver-vdsb yum.repos.d]# ping -I bond1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) from 10.35.1x8.x bond1: 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=48 time=62.5 ms

[root@silver-vdsb yum.repos.d]# ping -I bond1.162 8.8.8.8
PING 8.8.8.8 (8.8.8.8) from 10.35.1x9.x bond1.162: 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=48 time=128 ms

2) Add host to RHV on top of the vlan bond bond1.162 
3) ovirtmgmt network configured on top of the bond.162
[root@silver-vdsb ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
;vdsmdummy;             8000.000000000000       no
ovirtmgmt             8000.001d096871c1       no              bond1.162

4) vdsm take ownership on the bond1 - 
[root@silver-vdsb ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond1
# Generated by VDSM version 4.20.13-1.el7ev
DEVICE=bond1
BONDING_OPTS='mode=1 miimon=100 primary=eno1'
MACADDR=00:1d:09:68:71:c1
ONBOOT=yes
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
[root@silver-vdsb ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond1.162 
# Generated by VDSM version 4.20.13-1.el7ev
DEVICE=bond1.162
VLAN=yes
BRIDGE=new-default
ONBOOT=yes
MTU=1500
DEFROUTE=no
NM_CONTROLLED=no
IPV6INIT=no

5) and persist MACADDR in /var/lib/vdsm/persistence/netconf/bonds/bond1 
{
    "hwaddr": "00:1d:09:68:71:c1", 
    "nics": [
        "eno1", 
        "eno2"
    ], 
    "switch": "legacy", 
    "options": "mode=1 miimon=100 primary=eno1"

6) The default route after add host will be only via bond1.162 on which the management network configured on. 
bond1 has it's IP, but no default route via this interface after the host(we set default route for the management network).

6) Survive reboot

Comment 7 Sandro Bonazzola 2018-02-12 11:56:14 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.