Bug 1867392 - Failed to deploy openshift cluster on baremetal
Summary: Failed to deploy openshift cluster on baremetal
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Tim Rozet
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-09 14:59 UTC by Sebastian Scheinkman
Modified: 2021-04-05 17:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:26:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1999 0 None closed Bug 1867392: Fixes multiple NICs OVN/OVS config 2020-11-11 17:34:16 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:26:53 UTC

Description Sebastian Scheinkman 2020-08-09 14:59:01 UTC
Description of problem:
Enable to deploy openshift version 4.6 on baremetal with ovn-kubernetes SDN


How reproducible:
100%

Steps to Reproduce:
1. deploy IPI openshift base on 4.6 release with ovn-kubernetes


Additional info:

The issue is related to the shared mode gateway PR that was merge.

There are two issues with the deployment.

Related PRs:
https://github.com/openshift/machine-config-operator/pull/1860
https://github.com/openshift/cluster-network-operator/pull/727

First one:
The ovs-connection service is waiting for NetworkManager-wait-online.service but on baremetal this service failed so the ovs service is not executed.

[systemd]
Failed Units: 1
  NetworkManager-wait-online.service


● ovs-configuration.service - Configures OVS with proper host networking configuration
   Loaded: loaded (/etc/systemd/system/ovs-configuration.service; enabled; vendor preset: enabled)
   Active: inactive (dead)

Aug 09 00:04:20 cnfdb3.clus2.t5g.lab.eng.bos.redhat.com systemd[1]: Dependency failed for Configures OVS with proper host networking configuration.
Aug 09 00:04:20 cnfdb3.clus2.t5g.lab.eng.bos.redhat.com systemd[1]: ovs-configuration.service: Job ovs-configuration.service/start failed with result 'dependency'.



I was able to find the second issue when I run it manually by running the "NetworkManager-wait-online.service"(I think it fail at the beginning because the timeout[30 sec] is to low for baremetal machines with multiple network nics). Then I try to run the "ovs-configuration" service and lose connection to the machine.

Using console connection I was able to find that:
1.  # store old conn for bringing down later
    old_conn=$(nmcli --fields UUID,DEVICE conn show --active | grep ${iface} | awk '{print $1}')
    # bring down any old iface
    nmcli conn down $old_conn
 
This remove the baremetal network because all the nm connections use the same one on start

NAME              UUID                                  TYPE      DEVICE     
Wired Connection  a2d82d7b-9cb8-43db-8035-d1e5672fa44b  ethernet  enp1s0f4u4 
Wired Connection  a2d82d7b-9cb8-43db-8035-d1e5672fa44b  ethernet  ens1f0     
Wired Connection  a2d82d7b-9cb8-43db-8035-d1e5672fa44b  ethernet  eno1  


cat /etc/NetworkManager/system-connections/default_connection.nmconnection 
[connection]
id=Wired Connection
uuid=a2d82d7b-9cb8-43db-8035-d1e5672fa44b
type=ethernet
multi-connect=3
permissions=

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

[proxy]


2. The default gateway interface (that is connected to the br-ext) got a different Ip address when I run "ip ro" I was able to see both the br-ext and the eno1f0 had routing when I remove the eno1f0 router I was able to connect into the node using SSH again.
I think we should add to the ovs-configuration script some command to validate and clean the default interface if needed

Comment 1 Sebastian Scheinkman 2020-08-09 15:12:50 UTC
More information:

[root@cnfdb3 ~]# vi /usr/local/bin/configure-ovs.sh
[root@cnfdb3 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 94:40:c9:25:40:14 brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.143/24 brd 172.22.0.255 scope global dynamic noprefixroute eno1
       valid_lft 2038sec preferred_lft 2038sec
    inet6 fe80::9640:c9ff:fe25:4014/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 94:40:c9:25:40:15 brd ff:ff:ff:ff:ff:ff
4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 94:40:c9:25:40:16 brd ff:ff:ff:ff:ff:ff
5: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 48:df:37:c9:f9:54 brd ff:ff:ff:ff:ff:ff
    inet 10.19.32.132/26 brd 10.19.32.191 scope global dynamic noprefixroute ens1f0
       valid_lft 75559sec preferred_lft 75559sec
    inet6 2620:52:0:1342:4adf:37ff:fec9:f954/64 scope global dynamic noprefixroute 
       valid_lft 2591974sec preferred_lft 604774sec
    inet6 fe80::4adf:37ff:fec9:f954/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
6: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 94:40:c9:25:40:17 brd ff:ff:ff:ff:ff:ff
7: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 48:df:37:c9:f9:55 brd ff:ff:ff:ff:ff:ff
8: eno5: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 48:df:37:c3:24:b0 brd ff:ff:ff:ff:ff:ff
9: eno6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 48:df:37:c3:24:b8 brd ff:ff:ff:ff:ff:ff
10: ens4f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 48:df:37:c9:f8:f8 brd ff:ff:ff:ff:ff:ff
11: ens4f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 48:df:37:c9:f8:f9 brd ff:ff:ff:ff:ff:ff
12: enp1s0f4u4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether 0a:cf:fb:9f:d4:a9 brd ff:ff:ff:ff:ff:ff
    inet 16.1.15.2/30 brd 16.1.15.3 scope global dynamic noprefixroute enp1s0f4u4
       valid_lft 8585959sec preferred_lft 8585959sec
    inet6 fe80::8cf:fbff:fe9f:d4a9/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
13: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2e:7b:6f:f7:eb:6f brd ff:ff:ff:ff:ff:ff
14: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
    link/ether 86:de:ca:8a:26:17 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::84de:caff:fe8a:2617/64 scope link 
       valid_lft forever preferred_lft forever
15: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c2:bb:b8:d0:c9:41 brd ff:ff:ff:ff:ff:ff
[root@cnfdb3 ~]# 
[root@cnfdb3 ~]# /usr/local/bin/configure-ovs.sh
+ iface=
+ counter=0
+ '[' 0 -lt 12 ']'
++ ip -j route show default
++ jq -r '.[0].dev'
+ iface=ens1f0
+ [[ -n ens1f0 ]]
+ [[ ens1f0 != \n\u\l\l ]]
+ echo 'IPv4 Default gateway interface found: ens1f0'
IPv4 Default gateway interface found: ens1f0
+ break
+ '[' ens1f0 = br-ex ']'
+ '[' -z ens1f0 ']'
+ iface_mac=48:df:37:c9:f9:54
+ echo 'MAC address found for iface: ens1f0: 48:df:37:c9:f9:54'
MAC address found for iface: ens1f0: 48:df:37:c9:f9:54
++ ip -j link show ens1f0
++ jq -r '.[0].mtu'
+ iface_mtu=1500
+ [[ -z 1500 ]]
+ [[ 1500 == \n\u\l\l ]]
+ echo 'MTU found for iface: ens1f0: 1500'
MTU found for iface: ens1f0: 1500
+ nmcli connection show br-ex
+ nmcli c add type ovs-bridge conn.interface br-ex con-name br-ex 802-3-ethernet.mtu 1500 802-3-ethernet.cloned-mac-address 48:df:37:c9:f9:54
Connection 'br-ex' (7904e24f-15de-43d7-bf78-79607a5807d1) successfully added.
++ nmcli --fields UUID,DEVICE conn show --active
++ grep ens1f0
++ awk '{print $1}'
+ old_conn=a2d82d7b-9cb8-43db-8035-d1e5672fa44b
+ nmcli connection show ovs-port-phys0
+ nmcli c add type ovs-port conn.interface ens1f0 master br-ex con-name ovs-port-phys0
Connection 'ovs-port-phys0' (5ee7223a-1168-44a9-b05d-d64370d061b6) successfully added.
+ nmcli connection show ovs-port-br-ex
+ nmcli c add type ovs-port conn.interface br-ex master br-ex con-name ovs-port-br-ex
Connection 'ovs-port-br-ex' (6e911556-96b6-45ae-b88f-0e072e61376e) successfully added.
+ nmcli conn down a2d82d7b-9cb8-43db-8035-d1e5672fa44b
Connection 'Wired Connection' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/6)
Connection 'Wired Connection' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)
Connection 'Wired Connection' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/1)
+ nmcli connection show ovs-if-phys0
+ nmcli c add type 802-3-ethernet conn.interface ens1f0 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500
[54088.854729] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
Connection 'ovs-[54088.892387] device ens1f0 entered promiscuous mode
if-phys0' (06c758ef-bcec-4369-86b9-80ea135e8c73) successfully added.
+ nmcli connection show ovs-if-br-ex
+ nmcli c add type ovs-interface slave-type ovs-port conn.interface br-ex master ovs-port-br-ex con-name ovs-if-br-ex 802-3-ethernet.mtu 1500 802-3-ethernet.cloned-mac-address 48:df:37:c9:f9:54
Connection 'ovs-if-br-ex' (5e9d6584-ab58-4853-9b54-a2dd2da74c73) successfully added.
+ counter=0
+ '[' 0 -lt 5 ']'
+ sleep 5
[54088.961548] device br-ex entered promiscuous mode
[54089.446366] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[54089.471556] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[54089.983828] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[54090.498132] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[54091.000691] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[54091.573045] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[54092.087467] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[54092.636735] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
+ nmcli --fields GENERAL.STATE conn show ovs-if-br-ex
+ grep -i activated
GENERAL.STATE:                          activated
+ echo 'OVS successfully configured'
OVS successfully configured
+ ip a show br-ex
16: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 48:df:37:c9:f9:54 brd ff:ff:ff:ff:ff:ff
    inet 10.19.32.132/26 brd 10.19.32.191 scope global dynamic noprefixroute br-ex
       valid_lft 86398sec preferred_lft 86398sec
    inet6 2620:52:0:1342:99fa:7096:599a:2b5f/64 scope global dynamic noprefixroute 
       valid_lft 2591998sec preferred_lft 604798sec
    inet6 fe80::e56e:bb4c:1415:3078/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
+ exit 0


[root@cnfdb3 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 94:40:c9:25:40:14 brd ff:ff:ff:ff:ff:ff
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 94:40:c9:25:40:15 brd ff:ff:ff:ff:ff:ff
4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 94:40:c9:25:40:16 brd ff:ff:ff:ff:ff:ff
5: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether 48:df:37:c9:f9:54 brd ff:ff:ff:ff:ff:ff
    inet 10.19.32.142/26 scope global ens1f0
       valid_lft forever preferred_lft forever
6: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 94:40:c9:25:40:17 brd ff:ff:ff:ff:ff:ff
7: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 48:df:37:c9:f9:55 brd ff:ff:ff:ff:ff:ff
8: eno5: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 48:df:37:c3:24:b0 brd ff:ff:ff:ff:ff:ff
9: eno6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 48:df:37:c3:24:b8 brd ff:ff:ff:ff:ff:ff
10: ens4f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 48:df:37:c9:f8:f8 brd ff:ff:ff:ff:ff:ff
11: ens4f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 48:df:37:c9:f8:f9 brd ff:ff:ff:ff:ff:ff
12: enp1s0f4u4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether 0a:cf:fb:9f:d4:a9 brd ff:ff:ff:ff:ff:ff
13: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2e:7b:6f:f7:eb:6f brd ff:ff:ff:ff:ff:ff
14: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
    link/ether 86:de:ca:8a:26:17 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::84de:caff:fe8a:2617/64 scope link 
       valid_lft forever preferred_lft forever
15: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c2:bb:b8:d0:c9:41 brd ff:ff:ff:ff:ff:ff
16: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 48:df:37:c9:f9:54 brd ff:ff:ff:ff:ff:ff
    inet 10.19.32.132/26 brd 10.19.32.191 scope global dynamic noprefixroute br-ex
       valid_lft 86358sec preferred_lft 86358sec
    inet6 2620:52:0:1342:99fa:7096:599a:2b5f/64 scope global dynamic noprefixroute 
       valid_lft 2591958sec preferred_lft 604758sec
    inet6 fe80::e56e:bb4c:1415:3078/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever



ovs-if-phys0      06c758ef-bcec-4369-86b9-80ea135e8c73  ethernet       ens1f0 

[root@cnfdb3 ~]# nmcli c down 06c758ef-bcec-4369-86b9-80ea135e8c73
[54377.470658] device ens1f0 left promiscuous mode
Connection 'ovs-if-phys0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/844)
[root@cnfdb3 ~]# [54377.713925] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)

[root@cnfdb3 ~]# nmcli c up 06c758ef-bcec-4369-86b9-80ea135e8c73
[54388.502278] device ens1f0 entered promiscuous mode
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/845)
[root@cnfdb3 ~]# [54388.689566] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)

Comment 5 zhaozhanqi 2020-09-11 00:53:18 UTC
Move this bug to verified since ipi for baremetal is working well for ovn

Comment 7 errata-xmlrpc 2020-10-27 16:26:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 8 W. Trevor King 2021-04-05 17:46:39 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475


Note You need to log in before you can comment on or make changes to this bug.