2040933 – OCP 4.10 nightly build will fail to install if multiple NICs are defined on KVM nodes

Bug 2040933 - OCP 4.10 nightly build will fail to install if multiple NICs are defined on KVM nodes

Summary: OCP 4.10 nightly build will fail to install if multiple NICs are defined on K...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.10
Hardware:	s390x
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Tim Rozet
QA Contact:	Philip Chan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2009709 2092501
TreeView+	depends on / blocked

Reported:	2022-01-14 22:49 UTC by Philip Chan
Modified:	2022-08-10 10:42 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: multiple default gateways defined causes ovn to pick the incorrect one Consequence: ovn kube pod crashes, networking is not up Fix: fix is to find the default route based on the interface defined in the config Result: the correct default route is picked
Clone Of:
Clones:	2092501 (view as bug list)
Environment:
Last Closed:	2022-08-10 10:42:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
master-0.kubelet.log (21.25 KB, text/plain) 2022-01-14 22:49 UTC, Philip Chan	no flags	Details
bootkube.service.log (1.97 KB, text/plain) 2022-01-14 22:50 UTC, Philip Chan	no flags	Details
bootstrap.full.journal.01182022.log (1.84 MB, text/plain) 2022-01-19 03:30 UTC, Philip Chan	no flags	Details
master-0.full.journal.01182022.log (2.63 MB, text/plain) 2022-01-19 03:30 UTC, Philip Chan	no flags	Details
must-gather-01182022.tar.gz (2.54 MB, application/gzip) 2022-01-19 03:31 UTC, Philip Chan	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 947	None	Merged	Bug 2011525: [DownstreamMerge] Downstream merge 08-02-2022	2022-02-14 19:06:56 UTC
Github	ovn-org ovn-kubernetes pull 2782	None	open	Fixes finding default gateway for configured GW interface	2022-01-31 21:51:19 UTC
Red Hat Issue Tracker	MULTIARCH-2039	None	None	None	2022-01-14 22:50:38 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-10 10:42:52 UTC

Description Philip Chan 2022-01-14 22:49:11 UTC

Created attachment 1850871 [details]
master-0.kubelet.log

Description of problem:


Version-Release number of selected component (if applicable):
Performing a OCP 4.10 on Z Cluster installation fails for OCP 4.10.0-0.nightly-s390x-2022-01-14-030142 with RHCOS build 410.84.202201132002-0 will fail when installing with two NICs defined on the control plane nodes.  The master nodes will start and have network connectivity.  However, the status for the master nodes will show NotReady, and its kubelet.service log will show the following error:

Jan 14 15:30:33 master-00.pok-106.ocptest.pok.stglabs.ibm.com hyperkube[2540]: E0114 15:30:33.585209    2540 pod_workers.go:918] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?" pod="openshift-multus/network-metrics-daemon-lq6jp" podUID=0ad24722-75d6-49f5-8ac1-e90f40095a7d

Since the control plane nodes do not completely come online, the worker nodes will fail to boot and install RHCOS.

I have verified that this installation failure occurs with following 4.10 nightly builds:

4.10.0-0.nightly-s390x-2022-01-13-022003 & RHCOS 410.84.202201121602-0
4.10.0-0.nightly-s390x-2022-01-05-011736 & RHCOS 410.84.202201041402-0
4.10.0-0.nightly-s390x-2022-01-14-030142 & RHCOS 410.84.202201132002-0

I have performed the same installation successfully using the following 4.9 and older 4.10 nightly builds:

4.9.15 & RHCOS 4.9.0
4.9.14 & RHCOS 4.9.0
4.10.0-0.nightly-s390x-2021-12-09-171055 & RHCOS 410.84.202112062233-0
4.10.0-0.nightly-s390x-2021-12-10-233457 & RHCOS 410.84.202112091602-0

ALL the above OCP installations use a networkType of OVNKubernetes under the install-config.yaml.

However, if I were to change the networkType to use openshiftSDN.  OCP installation will succeed with all OCP 4.10 nightly builds.

Version-Release number of selected component (if applicable):
1. OCP 4.10 nightly build 4.10.0-0.nightly-s390x-2022-01-14-030142
2. RHCOS build 410.84.202201132002-0

How reproducible:
Consistently reproducible.

Steps to Reproduce:
1. Attempt to install OCP 4.10 nightly build 4.10.0-0.nightly-s390x-2022-01-14-030142 with RHCOS 410.84.202201132002-0.
2. Start bootstrap, master(control planes), and worker(compute) nodes with multiple network interfaces defined.
3. For example, pass two --network parameters and two IP addresses for —extra-args within the vert-install command.

Actual results:
Bootstrap and master (control plane) nodes will boot.  Master nodes will show a status of NotReady and worker (compute) nodes will fail to boot and install RHCOS.

Expected results:
All of the bootstrap, master (control plane), and worker (compute) nodes should all successfully install the RHCOS build successfully and become Ready. 

Additional info:
I have attached the logs from bootstrap.service (bootstrap-0) and kubelet.service(master-0) for a failed installation that uses the OVNKubernetes neworkType.

Comment 1 Philip Chan 2022-01-14 22:50:23 UTC

Created attachment 1850872 [details]
bootkube.service.log

Comment 2 Prashanth Sundararaman 2022-01-15 05:50:48 UTC

looks like the ovnkube-node pod is crashing:

Jan 14 15:30:02 master-00.pok-106.ocptest.pok.stglabs.ibm.com hyperkube[2540]: E0114 15:30:02.586987    2540 pod_workers.go:918] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"ovnkube-node\" with CrashLoopBackOff: \"back-off 1m20s restarting failed container=ovnkube-node pod=ovnkube-node-6tsx6_openshift-ovn-kubernetes(8a627578-1c69-47e7-9cff-5e3fac5a9886)\"" pod="openshift-ovn-kubernetes/ovnkube-node-6tsx6" podUID=8a627578-1c69-47e7-9cff-5e3fac5a9886

can you collect the must-gather for this?

Comment 3 Muhammad Adeel (IBM) 2022-01-18 09:27:45 UTC

Phil, did you set the masterNetwork parameter in install-config.yaml? If its not set bootstrap will have problems which should lead to a master NotReady status. Could you attach the full journal logs from bootstrap? The one which is attached is not the complete.

Comment 4 Philip Chan 2022-01-19 03:29:11 UTC

Hi Muhammad, the last we spoke, I did not have the machineNetwork parameter in my install-config.yaml.  This is a copy of the latest install-config.yaml I have attempted installing with using machineNetwork:

apiVersion: v1
baseDomain: "{{ cluster_base_domain }}"
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: {{ cluster_nodes['masters'].keys() | length }}
metadata:
  name: "{{ cluster_name }}"
networking:
  machineNetwork:
  - cidr: 10.0.0.0/24
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: '{{ ocp4_pull_secret | to_json }}'
sshKey: '{{ bastion_pubkey.content | b64decode }}'

I have tried all the following variations without any success:
  - cidr: 192.168.79.0/24
  - cidr: 172.18.231.0/24
  - cidr: 10.0.0.0/16 
  - cidr: 10.0.0.0/24
  - cidr: 192.0.0.0/24
  - cidr: 172.0.0.0/24

Each OCP Cluster node has two NICs defined residing on 192.168.79.0 and 9.12.23.0 networks.  The primary network is the 192.168.79.0 network, where both DNS and HAProxy listen on.

Prashanth, Muhammad, I've captured new full journal logs from both bootstrap and master-0 with the above install-config.yaml setup and will attach here.

Comment 5 Philip Chan 2022-01-19 03:30:03 UTC

Created attachment 1851774 [details]
bootstrap.full.journal.01182022.log

Comment 6 Philip Chan 2022-01-19 03:30:47 UTC

Created attachment 1851775 [details]
master-0.full.journal.01182022.log

Comment 7 Philip Chan 2022-01-19 03:31:17 UTC

Created attachment 1851777 [details]
must-gather-01182022.tar.gz

Comment 8 Dan Li 2022-01-19 15:39:03 UTC

Setting "Blocker+" flag for the moment as the Z team mentioned that this bug is blocking them. Our team will evaluate the bug and re-evaluate the blocker status as necessary.

Comment 9 Prashanth Sundararaman 2022-01-19 17:28:00 UTC

tail of the ovnkube-node logs:

```
2022-01-19T02:42:23.473357305Z I0119 02:42:23.473352   16825 ovs.go:208] exec(10): stderr: ""
2022-01-19T02:42:23.473370604Z I0119 02:42:23.473361   16825 ovs.go:204] exec(11): /usr/bin/ovs-vsctl --timeout=15 get Interface 76fddd5a-ca6b-468e-beb4-2ac1d890f1e5 Type
2022-01-19T02:42:23.476381444Z I0119 02:42:23.476353   16825 ovs.go:207] exec(11): stdout: "system\n"
2022-01-19T02:42:23.476381444Z I0119 02:42:23.476368   16825 ovs.go:208] exec(11): stderr: ""
2022-01-19T02:42:23.476381444Z I0119 02:42:23.476378   16825 ovs.go:204] exec(12): /usr/bin/ovs-vsctl --timeout=15 get interface enc1 ofport
2022-01-19T02:42:23.479420502Z I0119 02:42:23.479394   16825 ovs.go:207] exec(12): stdout: "2\n"
2022-01-19T02:42:23.479420502Z I0119 02:42:23.479408   16825 ovs.go:208] exec(12): stderr: ""
2022-01-19T02:42:23.479420502Z I0119 02:42:23.479416   16825 ovs.go:204] exec(13): /usr/bin/ovs-vsctl --timeout=15 --if-exists get interface br-ex mac_in_use
2022-01-19T02:42:23.482684750Z I0119 02:42:23.482668   16825 ovs.go:207] exec(13): stdout: "\"52:54:00:d9:5b:d8\"\n"
2022-01-19T02:42:23.482684750Z I0119 02:42:23.482678   16825 ovs.go:208] exec(13): stderr: ""
2022-01-19T02:42:23.482695944Z I0119 02:42:23.482688   16825 ovs.go:204] exec(14): /usr/bin/ovs-vsctl --timeout=15 --if-exists get Open_vSwitch . external_ids:ovn-bridge-mappings
2022-01-19T02:42:23.485780641Z I0119 02:42:23.485753   16825 ovs.go:207] exec(14): stdout: "\"physnet:br-ex\"\n"
2022-01-19T02:42:23.485780641Z I0119 02:42:23.485767   16825 ovs.go:208] exec(14): stderr: ""
2022-01-19T02:42:23.485780641Z I0119 02:42:23.485775   16825 ovs.go:204] exec(15): /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . external_ids:ovn-bridge-mappings=physnet:br-ex
2022-01-19T02:42:23.489066056Z I0119 02:42:23.489036   16825 ovs.go:207] exec(15): stdout: ""
2022-01-19T02:42:23.489066056Z I0119 02:42:23.489048   16825 ovs.go:208] exec(15): stderr: ""
2022-01-19T02:42:23.489066056Z I0119 02:42:23.489056   16825 ovs.go:204] exec(16): /usr/bin/ovs-vsctl --timeout=15 --if-exists get Open_vSwitch . external_ids:system-id
2022-01-19T02:42:23.492357268Z I0119 02:42:23.492340   16825 ovs.go:207] exec(16): stdout: "\"1857b02c-24f4-4ced-8b64-8f4a9d018a60\"\n"
2022-01-19T02:42:23.492357268Z I0119 02:42:23.492351   16825 ovs.go:208] exec(16): stderr: ""
2022-01-19T02:42:23.492366811Z I0119 02:42:23.492357   16825 ovs.go:204] exec(17): /usr/bin/ovs-appctl --timeout=15 dpif/show-dp-features br-ex
2022-01-19T02:42:23.494296714Z I0119 02:42:23.494238   16825 ovs.go:207] exec(17): stdout: "Masked set action: Yes\nTunnel push pop: No\nUfid: Yes\nTruncate action: Yes\nClone action: Yes\nSample nesting: 10\nConntrack eventmask: Yes\nConntrack clear: Yes\nMax dp_hash algorithm: 0\nCheck pkt length action: Yes\nConntrack timeout policy: Yes\nExplicit Drop action: No\nOptimized Balance TCP mode: No\nConntrack all-zero IP SNAT: Yes\nMax VLAN headers: 2\nMax MPLS depth: 3\nRecirc: Yes\nCT state: Yes\nCT zone: Yes\nCT mark: Yes\nCT label: Yes\nCT state NAT: Yes\nCT orig tuple: Yes\nCT orig tuple for IPv6: Yes\nIPv6 ND Extension: No\n"
2022-01-19T02:42:23.494296714Z I0119 02:42:23.494259   16825 ovs.go:208] exec(17): stderr: ""
2022-01-19T02:42:23.494567844Z F0119 02:42:23.494543   16825 ovnkube.go:133] unable to add OVN masquerade route to host, error: failed to add route for subnet 169.254.169.0/30 via gateway 9.12.23.1 with mtu 0: network is unreachable
```
I would defer to the networking team to look at this as i am not sure if shared gateway supports a 2nd nic.

Comment 10 Tim Rozet 2022-01-19 21:54:37 UTC

So to summarize the network setup:
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: 2: enc2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]:     link/ether 52:54:00:5e:14:bb brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]:     inet 9.12.23.51/24 brd 9.12.23.255 scope global noprefixroute enc2
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]:        valid_lft forever preferred_lft forever
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: 3: enc1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]:     link/ether 52:54:00:da:70:41 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]:     inet 192.168.79.21/24 brd 192.168.79.255 scope global dynamic noprefixroute enc1
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]:        valid_lft 900sec preferred_lft 900sec
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]:     inet6 fe80::5054:ff:feda:7041/64 scope link tentative noprefixroute
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]:        valid_lft forever preferred_lft forever
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: + ip route show
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com nm-dispatcher[1299]: Error: Device '' not found.
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: default via 192.168.79.1 dev enc1 proto dhcp metric 100
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: default via 9.12.23.1 dev enc2 proto static metric 101

configure-ovs.sh picks enc1 to move onto the shared gateway bridge:
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: IPv4 Default gateway interface found: enc1

Then when ovnkube-node starts up, it will try to detect the available nexthops, and set a the route mentioned for 169.254.169.0/30 towards that nexthop. This should have been 192.168.79.1, but instead it attempts via  9.12.23.1, which is reported as network unreachable for some reason.

The code that determines this nexthop is:
https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/pkg/node/gateway_init.go#L141
https://github.com/openshift/ovn-kubernetes/blob/f568deaadf40638f36781d0a8c55c2ca28a9162e/go-controller/pkg/node/helper_linux.go#L21

Unfortunately it looks like this code is looking for any default route on the host, and just returning the first one it finds, rather than scoping to the device br-ex. However, this code has not changed between 4.9 and 4.10...so I don't understand how this worked before and not now.

Comment 11 Tim Rozet 2022-01-19 22:07:07 UTC

Phillip, can i see the output of ip route show on the master-0 node while ovnkube-node is failing? I'm curious why you get network unreachable on the route add.

Also, do you have any of the must gathers and system journals for one of the successful runs?

Comment 12 Philip Chan 2022-01-19 22:54:53 UTC

Hi Tim,

We determined the problem was due to two gateways defined under the routing table for each OCP node.  For example, on master-0 node:

[core@master-0 ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         9.12.23.1       0.0.0.0         UG    100    0        0 enc2
0.0.0.0         192.168.79.1    0.0.0.0         UG    100    0        0 br-ex
9.12.23.0       0.0.0.0         255.255.255.0   U     100    0        0 enc2
10.128.0.0      10.129.0.1      255.252.0.0     UG    0      0        0 ovn-k8s-mp0
10.129.0.0      0.0.0.0         255.255.254.0   U     0      0        0 ovn-k8s-mp0
169.254.169.3   10.129.0.1      255.255.255.255 UGH   0      0        0 ovn-k8s-mp0
192.168.79.0    0.0.0.0         255.255.255.0   U     100    0        0 br-ex

This caused the ovnkube-node pods to be in a CrashLoopBackOff status:

[root@bastion multiarch-upi-playbooks-master]# oc get pods -n openshift-ovn-kubernetes -o wide
NAME                   READY   STATUS             RESTARTS         AGE    IP           NODE                                           NOMINATED NODE   READINESS GATES
ovnkube-master-4bgn5   6/6     Running            1 (142m ago)     142m   9.12.23.54   master-1.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-master-9pmfh   6/6     Running            3 (141m ago)     141m   9.12.23.56   master-2.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-master-9wpm6   6/6     Running            0                142m   9.12.23.51   master-0.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-node-8hgqx     4/5     CrashLoopBackOff   19 (68m ago)     142m   9.12.23.56   master-2.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-node-hvxzj     4/5     CrashLoopBackOff   19 (68m ago)     142m   9.12.23.54   master-1.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-node-m8fr7     4/5     CrashLoopBackOff   16 (4m39s ago)   142m   9.12.23.51   master-0.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>

Once I manually omitted the 9.12.23.1 route from each of the master nodes, the ovnkube-node pods would begin running and the master’s would report a Ready status.

Just to clarify, for this particular OCP cluster, we were trying to setup two NICs.  For reference, my virt-install command would include the following kernel arguments used for the IP parameters:

ip=enc1:dhcp ip=9.12.23.50::9.12.23.1:24:bootstrap-0.pok-106.ocptest.pok.stglabs.ibm.com:enc2:none nameserver=192.168.79.1”

enc1 would request a 192.dot IP address from our DHCP Server 
enc2 would define a static 9.dot IP address

I have now removed the gw 9.12.23.1 from the static IP address parm - ip=9.12.23.50:::24:bootstrap-0.pok-106.ocptest.pok.stglabs.ibm.com:enc2:none and successfully performed a new installation with 4.10.0-0.nightly-s390x-2022-01-14-030142.  The ovnkube-node pods now show the correct IP to use 192.168.79.xxx network:

[root@bastion ~]# oc get pods -n openshift-ovn-kubernetes -o wide
NAME                   READY   STATUS    RESTARTS      AGE   IP              NODE                                           NOMINATED NODE   READINESS GATES
ovnkube-master-72t2f   6/6     Running   2 (48m ago)   58m   192.168.79.21   master-0.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-master-k82r6   6/6     Running   6 (56m ago)   58m   192.168.79.23   master-2.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-master-r7lhf   6/6     Running   6 (56m ago)   58m   192.168.79.22   master-1.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-node-5lkkf     5/5     Running   0             58m   192.168.79.22   master-1.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-node-f4tkz     5/5     Running   0             19m   192.168.79.25   worker-1.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-node-ggn2v     5/5     Running   0             58m   192.168.79.21   master-0.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-node-x6hsx     5/5     Running   0             19m   192.168.79.24   worker-0.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>
ovnkube-node-z6jlf     5/5     Running   0             58m   192.168.79.23   master-2.pok-106.ocptest.pok.stglabs.ibm.com   <none>           <none>

As noted in the above comments, we have successfully run using the same kernel arguments(two gateways) for OCP 4.9.x and OCP 4.10 nightly builds(up to 4.10.0-0.nightly-s390x-2021-12-13-233722). It is only until the recent OCP 4.10 nightly builds as of late December 2021 and January 2022 where it requires this update to our virt-install command.  There must have been an update that forced this requirement?  I also have the following within my install-config.yaml:

networking:
  machineNetwork:
  - cidr: 192.168.79.0/24
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16

...which should have isolated the ovn pods to use the 192.168.79.xxx network, but that does not seem to be matter unless I make sure not to define the 9.dot gateway.  Please let me know your thoughts?  

Thank you.

Comment 13 Muhammad Adeel (IBM) 2022-01-20 09:04:48 UTC

When multiple default gateway is defined, the NetworkManager or Go libraries could pick wrong gateway randomly to route primary network traffic. This was identified in the 4.9 Multi-NIC testing and we added the following note in the OpenShift documentation:
"If the additional network gateway is different from the primary network gateway, the default gateway must be the primary network gateway. To configure the route for the additional network:
rd.route=20.20.20.0/24:20.20.20.254:enp2s0" 
https://docs.openshift.com/container-platform/4.9/installing/installing_ibm_z/installing-ibm-z.html#installation-user-infra-machines-routing-bonding_installing-ibm-z

Phil, also in your environment you could assign IP address for additional network via DHCP and it should also solve the problem. However, you don't need to define following entry for additional networks:
host worker-11 {
        --OMITTED--
}
IP addresses could be assigned with:
subnet 192.168.80.0 netmask 255.255.255.0 {
        range 192.168.80.2 192.168.80.255;
        option routers 192.168.80.1;
        option subnet-mask 255.255.255.0;        
}
Note the above will also add a default gw, but with a higher metric. If it is not the case then please remove the default gw above and add it with a rd.route static entry.

Comment 15 Tim Rozet 2022-01-20 15:05:13 UTC

There are multiple things to clarify about what's happening here:
1. It should be fine to have multiple default gateway routes. Network Manager will assign different metrics to each route of the same NIC type, which I can see happens in the original log:
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: default via 192.168.79.1 dev enc1 proto dhcp metric 100
Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: default via 9.12.23.1 dev enc2 proto static metric 101

2. However, in your output it shows both metrics have 100 after configure-ovs moves the interface to br-ex. I believe this is resolved by: https://github.com/openshift/machine-config-operator/pull/2898
[core@master-0 ~]$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 9.12.23.1 0.0.0.0 UG 100 0 0 enc2
0.0.0.0 192.168.79.1 0.0.0.0 UG 100 0 0 br-ex

3. configure-ovs and kubelet will decide the node ip based on the lowest metric. Can you retry your original deployment with the fix ^

4. I would still like to know why you got "network unreachable" on the route add. If you could share the "ip route show" output when it is broken so I can see if there is a scope link route that would be helpful.

5. Regardless of the previous points, there is still a bug in ovn-kube on how it chooses the default next hop. That isn't a regression though its been that way for several releases.

Comment 16 Philip Chan 2022-01-20 16:35:41 UTC

Thank you Tim for clarifying all those key points.

I was in the process of setting up another OCP cluster the day before using a newer build - 4.10.0-0.nightly-s390x-2022-01-17-171822.  The configuration for this particular cluster was slightly different in that both networks were using a single DHCP server to obtain their IP addresses(192.168.79.dot and 192.168.80.dot).  Like Muhammad mentioned in comment 13, I was able to remove the default route from one of the networks(192.168.80.dot), so that it was able to have a single one for each node. The OCP install succeeded after. 

I will pick up a newer OCP build which is currently 4.10.0-fc.2 and retry the original deployment with two gateways.  I'll report back my findings including the information you asked from above.

Thanks.

Comment 17 Philip Chan 2022-01-20 17:36:34 UTC

I attempted an original install deployment using OCP build 4.10.0-fc.2 with RHCOS 410.84.202201190402-0. The original problem occurs where the master nodes remain in NotReady state.

The routing info from bootstrap-0 and master-0 nodes:

[core@bootstrap-00 ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.79.1    0.0.0.0         UG    100    0        0 enc1
0.0.0.0         9.12.23.1       0.0.0.0         UG    101    0        0 enc2
9.12.23.0       0.0.0.0         255.255.255.0   U     101    0        0 enc2
192.168.79.0    0.0.0.0         255.255.255.0   U     100    0        0 enc1

[core@bootstrap-00 ~]$ ip route show
default via 192.168.79.1 dev enc1 proto dhcp metric 100
default via 9.12.23.1 dev enc2 proto static metric 101
9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.50 metric 101
192.168.79.0/24 dev enc1 proto kernel scope link src 192.168.79.20 metric 100

[core@master-00 ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         9.12.23.1       0.0.0.0         UG    100    0        0 enc2
0.0.0.0         192.168.79.1    0.0.0.0         UG    100    0        0 br-ex
9.12.23.0       0.0.0.0         255.255.255.0   U     100    0        0 enc2
192.168.79.0    0.0.0.0         255.255.255.0   U     100    0        0 br-ex

[core@master-00 ~]$ ip route show
default via 9.12.23.1 dev enc2 proto static metric 100
default via 192.168.79.1 dev br-ex proto dhcp metric 100
9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.51 metric 100
10.128.0.0/14 via 10.130.0.1 dev ovn-k8s-mp0
10.130.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.130.0.2
169.254.169.3 via 10.130.0.1 dev ovn-k8s-mp0
192.168.79.0/24 dev br-ex proto kernel scope link src 192.168.79.21 metric 100

With the above, it looks like the metrics are properly set on bootstrap only, the masters still remain at 100 for both networks - is that the problem?

Verified in the master log that enc1(192.168.79.1) is the chosen shared gateway bridge:
Jan 20 17:13:23 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1321]: IPv4 Default gateway interface found: enc1

Then viewing the ovn pod for master-0 we see that the network is unreachable when trying to go over 9.12.23.1 gw -
# oc describe pod ovnkube-node-tp87d -n openshift-ovn-kubernetes
...
I0120 17:27:59.480738   16406 ovs.go:204] exec(17): /usr/bin/ovs-appctl --timeout=15 dpif/show-dp-features br-ex
I0120 17:27:59.482776   16406 ovs.go:207] exec(17): stdout: "Masked set action: Yes\nTunnel push pop: No\nUfid: Yes\nTruncate action: Yes\nClone action: Yes\nSample nesting: 10\nConntrack eventmask: Yes\nConntrack clear: Yes\nMax dp_hash algorithm: 0\nCheck pkt length action: Yes\nConntrack timeout policy: Yes\nExplicit Drop action: No\nOptimized Balance TCP mode: No\nConntrack all-zero IP SNAT: Yes\nMax VLAN headers: 2\nMax MPLS depth: 3\nRecirc: Yes\nCT state: Yes\nCT zone: Yes\nCT mark: Yes\nCT label: Yes\nCT state NAT: Yes\nCT orig tuple: Yes\nCT orig tuple for IPv6: Yes\nIPv6 ND Extension: No\n"
I0120 17:27:59.482807   16406 ovs.go:208] exec(17): stderr: ""
F0120 17:27:59.483076   16406 ovnkube.go:133] unable to add OVN masquerade route to host, error: failed to add route for subnet 169.254.169.0/30 via gateway 9.12.23.1 with mtu 0: network is unreachable

Under my virt-install command, this is the syntax I'm using to define a bridge and macvtap network:

--network bridge=br-ocp,mac=52:54:00:da:70:41 \
--network direct,source=vlan508,source.mode=bridge,mac=52:54:00:5e:14:bb \

I'm not sure if that should matter or be a reason for concern, but as you stated, it seems to only pick the first one in the routing table list.

Comment 18 Muhammad Adeel (IBM) 2022-01-25 14:47:40 UTC

Irrespective of this BZ, in general it is not a good idea to add multiple default gateways. Because as its name suggests, the default gateway should only be ONE in a system.
When we add multiple ip= kernel argument each with its own gateway, the Kernel provides all of them as default gateway to the NetworkManager and it could not decide which of the route is default or assign a higher metric value.
 
Similarly when you add one DHCP provided gateway and another static gateway, the system has no clue which one it should use to do default routing despite different metric values. It could be that a higher metric route is running your haproxy, named etc service. 

However, this problem you're able to reproduce with OVN should be solved with the PR https://github.com/openshift/machine-config-operator/pull/2898. We will just have to wait before we get a OCP release with the fix. But again it was just by chance that the NM picks up (192.168.79.1) as your default route and assigned the lowest metric. If for some reason it would have picked (9.12.23.1) as your default route with lowest metric, not even the above PR will fix the installation failure.

Comment 20 Tim Rozet 2022-02-14 19:10:25 UTC

[core@master-00 ~]$ ip route show
default via 9.12.23.1 dev enc2 proto static metric 100
default via 192.168.79.1 dev br-ex proto dhcp metric 100
9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.51 metric 100
10.128.0.0/14 via 10.130.0.1 dev ovn-k8s-mp0
10.130.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.130.0.2
169.254.169.3 via 10.130.0.1 dev ovn-k8s-mp0
192.168.79.0/24 dev br-ex proto kernel scope link src 192.168.79.21 metric 100

With the above, it looks like the metrics are properly set on bootstrap only, the masters still remain at 100 for both networks - is that the problem?

It should have been 49 if you had this fix: https://github.com/openshift/machine-config-operator/pull/2898/files#diff-afb45a3711a77d94f26471d9d94a7f7a03d931d9e72bdf849f2e26e2711d6fd7R125

Comment 22 Philip Chan 2022-02-15 23:16:34 UTC

Hi Tim,

I verified back with OCP build 4.10.0-fc.4 that the fix resolves the metric assignment.  This is what shows in the routing table for bootstrap, master and worker nodes:

[core@bootstrap-00 ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.79.1    0.0.0.0         UG    100    0        0 enc1
0.0.0.0         9.12.23.1       0.0.0.0         UG    101    0        0 enc2
9.12.23.0       0.0.0.0         255.255.255.0   U     101    0        0 enc2
192.168.79.0    0.0.0.0         255.255.255.0   U     100    0        0 enc1

[core@master-00 ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.79.1    0.0.0.0         UG    49     0        0 br-ex
0.0.0.0         9.12.23.1       0.0.0.0         UG    100    0        0 enc2
9.12.23.0       0.0.0.0         255.255.255.0   U     100    0        0 enc2
10.128.0.0      10.130.0.1      255.252.0.0     UG    0      0        0 ovn-k8s-mp0
10.130.0.0      0.0.0.0         255.255.254.0   U     0      0        0 ovn-k8s-mp0
169.254.169.0   192.168.79.1    255.255.255.252 UG    0      0        0 br-ex
169.254.169.3   10.130.0.1      255.255.255.255 UGH   0      0        0 ovn-k8s-mp0
172.30.0.0      192.168.79.1    255.255.0.0     UG    0      0        0 br-ex
192.168.79.0    0.0.0.0         255.255.255.0   U     49     0        0 br-ex

[core@worker-00 ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.79.1    0.0.0.0         UG    49     0        0 br-ex
0.0.0.0         9.12.23.1       0.0.0.0         UG    100    0        0 enc2
9.12.23.0       0.0.0.0         255.255.255.0   U     100    0        0 enc2
10.128.0.0      10.131.0.1      255.252.0.0     UG    0      0        0 ovn-k8s-mp0
10.131.0.0      0.0.0.0         255.255.254.0   U     0      0        0 ovn-k8s-mp0
169.254.169.0   192.168.79.1    255.255.255.252 UG    0      0        0 br-ex
169.254.169.3   10.131.0.1      255.255.255.255 UGH   0      0        0 ovn-k8s-mp0
172.30.0.0      192.168.79.1    255.255.0.0     UG    0      0        0 br-ex
192.168.79.0    0.0.0.0         255.255.255.0   U     49     0        0 br-ex

I was able to perform the installation successfully using the original deployment commands.

The resolution for this bug and https://bugzilla.redhat.com/show_bug.cgi?id=2046520 is a documentation update to clarify the default gateway requirements.

Comment 23 Anurag saxena 2022-02-22 17:38:50 UTC

@chanphil.com WOuld you be able to verify this bug? Thanks

Comment 24 Dan Li 2022-02-25 14:08:15 UTC

Making Comment 23 un-private as Phil may not be able to see private comment

Comment 26 Dan Li 2022-03-28 15:49:01 UTC

Hi @chanphil.com just checking to see if you are able to verify this per above comment. Thank you!

Comment 27 Philip Chan 2022-04-21 21:47:15 UTC

Hi,

We re-verified our network tests using OCP 4.10.6 and RHCOS 4.10.3.  

When deploying the OCP nodes with our original network kernel parameters using virt-install - where we define two networks, each with their own default gateways for a total of two defaults gateways.  The OCP installation completes, and the following routing tables are present on each of the following nodes for bootstrap, master and worker.

[core@bootstrap-00 ~]$ ip route show
default via 192.168.79.1 dev enc1 proto dhcp metric 100
default via 9.12.23.1 dev enc2 proto static metric 101
9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.49 metric 101
192.168.79.0/24 dev enc1 proto kernel scope link src 192.168.79.20 metric 100
[core@bootstrap-00 ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.79.1    0.0.0.0         UG    100    0        0 enc1
0.0.0.0         9.12.23.1       0.0.0.0         UG    101    0        0 enc2
9.12.23.0       0.0.0.0         255.255.255.0   U     101    0        0 enc2
192.168.79.0    0.0.0.0         255.255.255.0   U     100    0        0 enc1

[core@master-00 ~]$ ip route show
default via 192.168.79.1 dev br-ex proto dhcp metric 49
default via 9.12.23.1 dev enc2 proto static metric 100
9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.50 metric 100
10.128.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.128.0.2
10.128.0.0/14 via 10.128.0.1 dev ovn-k8s-mp0
169.254.169.0/30 via 192.168.79.1 dev br-ex
169.254.169.3 via 10.128.0.1 dev ovn-k8s-mp0
172.30.0.0/16 via 192.168.79.1 dev br-ex mtu 1400
192.168.79.0/24 dev br-ex proto kernel scope link src 192.168.79.21 metric 49
[core@master-00 ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.79.1    0.0.0.0         UG    49     0        0 br-ex
0.0.0.0         9.12.23.1       0.0.0.0         UG    100    0        0 enc2
9.12.23.0       0.0.0.0         255.255.255.0   U     100    0        0 enc2
10.128.0.0      0.0.0.0         255.255.254.0   U     0      0        0 ovn-k8s-mp0
10.128.0.0      10.128.0.1      255.252.0.0     UG    0      0        0 ovn-k8s-mp0
169.254.169.0   192.168.79.1    255.255.255.252 UG    0      0        0 br-ex
169.254.169.3   10.128.0.1      255.255.255.255 UGH   0      0        0 ovn-k8s-mp0
172.30.0.0      192.168.79.1    255.255.0.0     UG    0      0        0 br-ex
192.168.79.0    0.0.0.0         255.255.255.0   U     49     0        0 br-ex

[core@worker-00 ~]$ ip route show
default via 192.168.79.1 dev br-ex proto dhcp metric 49
default via 9.12.23.1 dev enc2 proto static metric 100
9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.56 metric 100
10.128.0.0/14 via 10.131.0.1 dev ovn-k8s-mp0
10.131.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.131.0.2
169.254.169.0/30 via 192.168.79.1 dev br-ex
169.254.169.3 via 10.131.0.1 dev ovn-k8s-mp0
172.30.0.0/16 via 192.168.79.1 dev br-ex mtu 1400
192.168.79.0/24 dev br-ex proto kernel scope link src 192.168.79.24 metric 49
[core@worker-00 ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.79.1    0.0.0.0         UG    49     0        0 br-ex
0.0.0.0         9.12.23.1       0.0.0.0         UG    100    0        0 enc2
9.12.23.0       0.0.0.0         255.255.255.0   U     100    0        0 enc2
10.128.0.0      10.131.0.1      255.252.0.0     UG    0      0        0 ovn-k8s-mp0
10.131.0.0      0.0.0.0         255.255.254.0   U     0      0        0 ovn-k8s-mp0
169.254.169.0   192.168.79.1    255.255.255.252 UG    0      0        0 br-ex
169.254.169.3   10.131.0.1      255.255.255.255 UGH   0      0        0 ovn-k8s-mp0
172.30.0.0      192.168.79.1    255.255.0.0     UG    0      0        0 br-ex
192.168.79.0    0.0.0.0         255.255.255.0   U     49     0        0 br-ex

As Muhammad noted in comment 18 above, it is not a good practice to use multiple default gateways.  As a result, we suggested a minor update to the documentation as noted in Bugzilla 2046520, which was accepted and changed.

We then tested variations of assigning the IP address statically or through DHCP, but only assigning one default gateway as recommended.  We verified that the correct default gateway and network we wanted to use for OVN network would be selected, and the OCP cluster would successfully install.  And we no longer encounter the issue as originally reported.  Thank you for the resolution.

Comment 28 Dan Li 2022-04-28 14:19:41 UTC

Changing to Verified based on Phil's Comment 27

Comment 32 errata-xmlrpc 2022-08-10 10:42:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.