Created attachment 1850871 [details] master-0.kubelet.log Description of problem: Version-Release number of selected component (if applicable): Performing a OCP 4.10 on Z Cluster installation fails for OCP 4.10.0-0.nightly-s390x-2022-01-14-030142 with RHCOS build 410.84.202201132002-0 will fail when installing with two NICs defined on the control plane nodes. The master nodes will start and have network connectivity. However, the status for the master nodes will show NotReady, and its kubelet.service log will show the following error: Jan 14 15:30:33 master-00.pok-106.ocptest.pok.stglabs.ibm.com hyperkube[2540]: E0114 15:30:33.585209 2540 pod_workers.go:918] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?" pod="openshift-multus/network-metrics-daemon-lq6jp" podUID=0ad24722-75d6-49f5-8ac1-e90f40095a7d Since the control plane nodes do not completely come online, the worker nodes will fail to boot and install RHCOS. I have verified that this installation failure occurs with following 4.10 nightly builds: 4.10.0-0.nightly-s390x-2022-01-13-022003 & RHCOS 410.84.202201121602-0 4.10.0-0.nightly-s390x-2022-01-05-011736 & RHCOS 410.84.202201041402-0 4.10.0-0.nightly-s390x-2022-01-14-030142 & RHCOS 410.84.202201132002-0 I have performed the same installation successfully using the following 4.9 and older 4.10 nightly builds: 4.9.15 & RHCOS 4.9.0 4.9.14 & RHCOS 4.9.0 4.10.0-0.nightly-s390x-2021-12-09-171055 & RHCOS 410.84.202112062233-0 4.10.0-0.nightly-s390x-2021-12-10-233457 & RHCOS 410.84.202112091602-0 ALL the above OCP installations use a networkType of OVNKubernetes under the install-config.yaml. However, if I were to change the networkType to use openshiftSDN. OCP installation will succeed with all OCP 4.10 nightly builds. Version-Release number of selected component (if applicable): 1. OCP 4.10 nightly build 4.10.0-0.nightly-s390x-2022-01-14-030142 2. RHCOS build 410.84.202201132002-0 How reproducible: Consistently reproducible. Steps to Reproduce: 1. Attempt to install OCP 4.10 nightly build 4.10.0-0.nightly-s390x-2022-01-14-030142 with RHCOS 410.84.202201132002-0. 2. Start bootstrap, master(control planes), and worker(compute) nodes with multiple network interfaces defined. 3. For example, pass two --network parameters and two IP addresses for —extra-args within the vert-install command. Actual results: Bootstrap and master (control plane) nodes will boot. Master nodes will show a status of NotReady and worker (compute) nodes will fail to boot and install RHCOS. Expected results: All of the bootstrap, master (control plane), and worker (compute) nodes should all successfully install the RHCOS build successfully and become Ready. Additional info: I have attached the logs from bootstrap.service (bootstrap-0) and kubelet.service(master-0) for a failed installation that uses the OVNKubernetes neworkType.
Created attachment 1850872 [details] bootkube.service.log
looks like the ovnkube-node pod is crashing: Jan 14 15:30:02 master-00.pok-106.ocptest.pok.stglabs.ibm.com hyperkube[2540]: E0114 15:30:02.586987 2540 pod_workers.go:918] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"ovnkube-node\" with CrashLoopBackOff: \"back-off 1m20s restarting failed container=ovnkube-node pod=ovnkube-node-6tsx6_openshift-ovn-kubernetes(8a627578-1c69-47e7-9cff-5e3fac5a9886)\"" pod="openshift-ovn-kubernetes/ovnkube-node-6tsx6" podUID=8a627578-1c69-47e7-9cff-5e3fac5a9886 can you collect the must-gather for this?
Phil, did you set the masterNetwork parameter in install-config.yaml? If its not set bootstrap will have problems which should lead to a master NotReady status. Could you attach the full journal logs from bootstrap? The one which is attached is not the complete.
Hi Muhammad, the last we spoke, I did not have the machineNetwork parameter in my install-config.yaml. This is a copy of the latest install-config.yaml I have attempted installing with using machineNetwork: apiVersion: v1 baseDomain: "{{ cluster_base_domain }}" compute: - hyperthreading: Enabled name: worker replicas: 0 controlPlane: hyperthreading: Enabled name: master replicas: {{ cluster_nodes['masters'].keys() | length }} metadata: name: "{{ cluster_name }}" networking: machineNetwork: - cidr: 10.0.0.0/24 clusterNetworks: - cidr: 10.128.0.0/14 hostPrefix: 23 networkType: OVNKubernetes serviceNetwork: - 172.30.0.0/16 platform: none: {} pullSecret: '{{ ocp4_pull_secret | to_json }}' sshKey: '{{ bastion_pubkey.content | b64decode }}' I have tried all the following variations without any success: - cidr: 192.168.79.0/24 - cidr: 172.18.231.0/24 - cidr: 10.0.0.0/16 - cidr: 10.0.0.0/24 - cidr: 192.0.0.0/24 - cidr: 172.0.0.0/24 Each OCP Cluster node has two NICs defined residing on 192.168.79.0 and 9.12.23.0 networks. The primary network is the 192.168.79.0 network, where both DNS and HAProxy listen on. Prashanth, Muhammad, I've captured new full journal logs from both bootstrap and master-0 with the above install-config.yaml setup and will attach here.
Created attachment 1851774 [details] bootstrap.full.journal.01182022.log
Created attachment 1851775 [details] master-0.full.journal.01182022.log
Created attachment 1851777 [details] must-gather-01182022.tar.gz
Setting "Blocker+" flag for the moment as the Z team mentioned that this bug is blocking them. Our team will evaluate the bug and re-evaluate the blocker status as necessary.
tail of the ovnkube-node logs: ``` 2022-01-19T02:42:23.473357305Z I0119 02:42:23.473352 16825 ovs.go:208] exec(10): stderr: "" 2022-01-19T02:42:23.473370604Z I0119 02:42:23.473361 16825 ovs.go:204] exec(11): /usr/bin/ovs-vsctl --timeout=15 get Interface 76fddd5a-ca6b-468e-beb4-2ac1d890f1e5 Type 2022-01-19T02:42:23.476381444Z I0119 02:42:23.476353 16825 ovs.go:207] exec(11): stdout: "system\n" 2022-01-19T02:42:23.476381444Z I0119 02:42:23.476368 16825 ovs.go:208] exec(11): stderr: "" 2022-01-19T02:42:23.476381444Z I0119 02:42:23.476378 16825 ovs.go:204] exec(12): /usr/bin/ovs-vsctl --timeout=15 get interface enc1 ofport 2022-01-19T02:42:23.479420502Z I0119 02:42:23.479394 16825 ovs.go:207] exec(12): stdout: "2\n" 2022-01-19T02:42:23.479420502Z I0119 02:42:23.479408 16825 ovs.go:208] exec(12): stderr: "" 2022-01-19T02:42:23.479420502Z I0119 02:42:23.479416 16825 ovs.go:204] exec(13): /usr/bin/ovs-vsctl --timeout=15 --if-exists get interface br-ex mac_in_use 2022-01-19T02:42:23.482684750Z I0119 02:42:23.482668 16825 ovs.go:207] exec(13): stdout: "\"52:54:00:d9:5b:d8\"\n" 2022-01-19T02:42:23.482684750Z I0119 02:42:23.482678 16825 ovs.go:208] exec(13): stderr: "" 2022-01-19T02:42:23.482695944Z I0119 02:42:23.482688 16825 ovs.go:204] exec(14): /usr/bin/ovs-vsctl --timeout=15 --if-exists get Open_vSwitch . external_ids:ovn-bridge-mappings 2022-01-19T02:42:23.485780641Z I0119 02:42:23.485753 16825 ovs.go:207] exec(14): stdout: "\"physnet:br-ex\"\n" 2022-01-19T02:42:23.485780641Z I0119 02:42:23.485767 16825 ovs.go:208] exec(14): stderr: "" 2022-01-19T02:42:23.485780641Z I0119 02:42:23.485775 16825 ovs.go:204] exec(15): /usr/bin/ovs-vsctl --timeout=15 set Open_vSwitch . external_ids:ovn-bridge-mappings=physnet:br-ex 2022-01-19T02:42:23.489066056Z I0119 02:42:23.489036 16825 ovs.go:207] exec(15): stdout: "" 2022-01-19T02:42:23.489066056Z I0119 02:42:23.489048 16825 ovs.go:208] exec(15): stderr: "" 2022-01-19T02:42:23.489066056Z I0119 02:42:23.489056 16825 ovs.go:204] exec(16): /usr/bin/ovs-vsctl --timeout=15 --if-exists get Open_vSwitch . external_ids:system-id 2022-01-19T02:42:23.492357268Z I0119 02:42:23.492340 16825 ovs.go:207] exec(16): stdout: "\"1857b02c-24f4-4ced-8b64-8f4a9d018a60\"\n" 2022-01-19T02:42:23.492357268Z I0119 02:42:23.492351 16825 ovs.go:208] exec(16): stderr: "" 2022-01-19T02:42:23.492366811Z I0119 02:42:23.492357 16825 ovs.go:204] exec(17): /usr/bin/ovs-appctl --timeout=15 dpif/show-dp-features br-ex 2022-01-19T02:42:23.494296714Z I0119 02:42:23.494238 16825 ovs.go:207] exec(17): stdout: "Masked set action: Yes\nTunnel push pop: No\nUfid: Yes\nTruncate action: Yes\nClone action: Yes\nSample nesting: 10\nConntrack eventmask: Yes\nConntrack clear: Yes\nMax dp_hash algorithm: 0\nCheck pkt length action: Yes\nConntrack timeout policy: Yes\nExplicit Drop action: No\nOptimized Balance TCP mode: No\nConntrack all-zero IP SNAT: Yes\nMax VLAN headers: 2\nMax MPLS depth: 3\nRecirc: Yes\nCT state: Yes\nCT zone: Yes\nCT mark: Yes\nCT label: Yes\nCT state NAT: Yes\nCT orig tuple: Yes\nCT orig tuple for IPv6: Yes\nIPv6 ND Extension: No\n" 2022-01-19T02:42:23.494296714Z I0119 02:42:23.494259 16825 ovs.go:208] exec(17): stderr: "" 2022-01-19T02:42:23.494567844Z F0119 02:42:23.494543 16825 ovnkube.go:133] unable to add OVN masquerade route to host, error: failed to add route for subnet 169.254.169.0/30 via gateway 9.12.23.1 with mtu 0: network is unreachable ``` I would defer to the networking team to look at this as i am not sure if shared gateway supports a 2nd nic.
So to summarize the network setup: Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: 2: enc2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: link/ether 52:54:00:5e:14:bb brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: inet 9.12.23.51/24 brd 9.12.23.255 scope global noprefixroute enc2 Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: valid_lft forever preferred_lft forever Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: 3: enc1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: link/ether 52:54:00:da:70:41 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: inet 192.168.79.21/24 brd 192.168.79.255 scope global dynamic noprefixroute enc1 Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: valid_lft 900sec preferred_lft 900sec Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: inet6 fe80::5054:ff:feda:7041/64 scope link tentative noprefixroute Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: valid_lft forever preferred_lft forever Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: + ip route show Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com nm-dispatcher[1299]: Error: Device '' not found. Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: default via 192.168.79.1 dev enc1 proto dhcp metric 100 Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: default via 9.12.23.1 dev enc2 proto static metric 101 configure-ovs.sh picks enc1 to move onto the shared gateway bridge: Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: IPv4 Default gateway interface found: enc1 Then when ovnkube-node starts up, it will try to detect the available nexthops, and set a the route mentioned for 169.254.169.0/30 towards that nexthop. This should have been 192.168.79.1, but instead it attempts via 9.12.23.1, which is reported as network unreachable for some reason. The code that determines this nexthop is: https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/pkg/node/gateway_init.go#L141 https://github.com/openshift/ovn-kubernetes/blob/f568deaadf40638f36781d0a8c55c2ca28a9162e/go-controller/pkg/node/helper_linux.go#L21 Unfortunately it looks like this code is looking for any default route on the host, and just returning the first one it finds, rather than scoping to the device br-ex. However, this code has not changed between 4.9 and 4.10...so I don't understand how this worked before and not now.
Phillip, can i see the output of ip route show on the master-0 node while ovnkube-node is failing? I'm curious why you get network unreachable on the route add. Also, do you have any of the must gathers and system journals for one of the successful runs?
Hi Tim, We determined the problem was due to two gateways defined under the routing table for each OCP node. For example, on master-0 node: [core@master-0 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 9.12.23.1 0.0.0.0 UG 100 0 0 enc2 0.0.0.0 192.168.79.1 0.0.0.0 UG 100 0 0 br-ex 9.12.23.0 0.0.0.0 255.255.255.0 U 100 0 0 enc2 10.128.0.0 10.129.0.1 255.252.0.0 UG 0 0 0 ovn-k8s-mp0 10.129.0.0 0.0.0.0 255.255.254.0 U 0 0 0 ovn-k8s-mp0 169.254.169.3 10.129.0.1 255.255.255.255 UGH 0 0 0 ovn-k8s-mp0 192.168.79.0 0.0.0.0 255.255.255.0 U 100 0 0 br-ex This caused the ovnkube-node pods to be in a CrashLoopBackOff status: [root@bastion multiarch-upi-playbooks-master]# oc get pods -n openshift-ovn-kubernetes -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-4bgn5 6/6 Running 1 (142m ago) 142m 9.12.23.54 master-1.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-master-9pmfh 6/6 Running 3 (141m ago) 141m 9.12.23.56 master-2.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-master-9wpm6 6/6 Running 0 142m 9.12.23.51 master-0.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-node-8hgqx 4/5 CrashLoopBackOff 19 (68m ago) 142m 9.12.23.56 master-2.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-node-hvxzj 4/5 CrashLoopBackOff 19 (68m ago) 142m 9.12.23.54 master-1.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-node-m8fr7 4/5 CrashLoopBackOff 16 (4m39s ago) 142m 9.12.23.51 master-0.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> Once I manually omitted the 9.12.23.1 route from each of the master nodes, the ovnkube-node pods would begin running and the master’s would report a Ready status. Just to clarify, for this particular OCP cluster, we were trying to setup two NICs. For reference, my virt-install command would include the following kernel arguments used for the IP parameters: ip=enc1:dhcp ip=9.12.23.50::9.12.23.1:24:bootstrap-0.pok-106.ocptest.pok.stglabs.ibm.com:enc2:none nameserver=192.168.79.1” enc1 would request a 192.dot IP address from our DHCP Server enc2 would define a static 9.dot IP address I have now removed the gw 9.12.23.1 from the static IP address parm - ip=9.12.23.50:::24:bootstrap-0.pok-106.ocptest.pok.stglabs.ibm.com:enc2:none and successfully performed a new installation with 4.10.0-0.nightly-s390x-2022-01-14-030142. The ovnkube-node pods now show the correct IP to use 192.168.79.xxx network: [root@bastion ~]# oc get pods -n openshift-ovn-kubernetes -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-72t2f 6/6 Running 2 (48m ago) 58m 192.168.79.21 master-0.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-master-k82r6 6/6 Running 6 (56m ago) 58m 192.168.79.23 master-2.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-master-r7lhf 6/6 Running 6 (56m ago) 58m 192.168.79.22 master-1.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-node-5lkkf 5/5 Running 0 58m 192.168.79.22 master-1.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-node-f4tkz 5/5 Running 0 19m 192.168.79.25 worker-1.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-node-ggn2v 5/5 Running 0 58m 192.168.79.21 master-0.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-node-x6hsx 5/5 Running 0 19m 192.168.79.24 worker-0.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> ovnkube-node-z6jlf 5/5 Running 0 58m 192.168.79.23 master-2.pok-106.ocptest.pok.stglabs.ibm.com <none> <none> As noted in the above comments, we have successfully run using the same kernel arguments(two gateways) for OCP 4.9.x and OCP 4.10 nightly builds(up to 4.10.0-0.nightly-s390x-2021-12-13-233722). It is only until the recent OCP 4.10 nightly builds as of late December 2021 and January 2022 where it requires this update to our virt-install command. There must have been an update that forced this requirement? I also have the following within my install-config.yaml: networking: machineNetwork: - cidr: 192.168.79.0/24 clusterNetworks: - cidr: 10.128.0.0/14 hostPrefix: 23 networkType: OVNKubernetes serviceNetwork: - 172.30.0.0/16 ...which should have isolated the ovn pods to use the 192.168.79.xxx network, but that does not seem to be matter unless I make sure not to define the 9.dot gateway. Please let me know your thoughts? Thank you.
When multiple default gateway is defined, the NetworkManager or Go libraries could pick wrong gateway randomly to route primary network traffic. This was identified in the 4.9 Multi-NIC testing and we added the following note in the OpenShift documentation: "If the additional network gateway is different from the primary network gateway, the default gateway must be the primary network gateway. To configure the route for the additional network: rd.route=20.20.20.0/24:20.20.20.254:enp2s0" https://docs.openshift.com/container-platform/4.9/installing/installing_ibm_z/installing-ibm-z.html#installation-user-infra-machines-routing-bonding_installing-ibm-z Phil, also in your environment you could assign IP address for additional network via DHCP and it should also solve the problem. However, you don't need to define following entry for additional networks: host worker-11 { --OMITTED-- } IP addresses could be assigned with: subnet 192.168.80.0 netmask 255.255.255.0 { range 192.168.80.2 192.168.80.255; option routers 192.168.80.1; option subnet-mask 255.255.255.0; } Note the above will also add a default gw, but with a higher metric. If it is not the case then please remove the default gw above and add it with a rd.route static entry.
There are multiple things to clarify about what's happening here: 1. It should be fine to have multiple default gateway routes. Network Manager will assign different metrics to each route of the same NIC type, which I can see happens in the original log: Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: default via 192.168.79.1 dev enc1 proto dhcp metric 100 Jan 19 02:28:12 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1324]: default via 9.12.23.1 dev enc2 proto static metric 101 2. However, in your output it shows both metrics have 100 after configure-ovs moves the interface to br-ex. I believe this is resolved by: https://github.com/openshift/machine-config-operator/pull/2898 [core@master-0 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 9.12.23.1 0.0.0.0 UG 100 0 0 enc2 0.0.0.0 192.168.79.1 0.0.0.0 UG 100 0 0 br-ex 3. configure-ovs and kubelet will decide the node ip based on the lowest metric. Can you retry your original deployment with the fix ^ 4. I would still like to know why you got "network unreachable" on the route add. If you could share the "ip route show" output when it is broken so I can see if there is a scope link route that would be helpful. 5. Regardless of the previous points, there is still a bug in ovn-kube on how it chooses the default next hop. That isn't a regression though its been that way for several releases.
Thank you Tim for clarifying all those key points. I was in the process of setting up another OCP cluster the day before using a newer build - 4.10.0-0.nightly-s390x-2022-01-17-171822. The configuration for this particular cluster was slightly different in that both networks were using a single DHCP server to obtain their IP addresses(192.168.79.dot and 192.168.80.dot). Like Muhammad mentioned in comment 13, I was able to remove the default route from one of the networks(192.168.80.dot), so that it was able to have a single one for each node. The OCP install succeeded after. I will pick up a newer OCP build which is currently 4.10.0-fc.2 and retry the original deployment with two gateways. I'll report back my findings including the information you asked from above. Thanks.
I attempted an original install deployment using OCP build 4.10.0-fc.2 with RHCOS 410.84.202201190402-0. The original problem occurs where the master nodes remain in NotReady state. The routing info from bootstrap-0 and master-0 nodes: [core@bootstrap-00 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.79.1 0.0.0.0 UG 100 0 0 enc1 0.0.0.0 9.12.23.1 0.0.0.0 UG 101 0 0 enc2 9.12.23.0 0.0.0.0 255.255.255.0 U 101 0 0 enc2 192.168.79.0 0.0.0.0 255.255.255.0 U 100 0 0 enc1 [core@bootstrap-00 ~]$ ip route show default via 192.168.79.1 dev enc1 proto dhcp metric 100 default via 9.12.23.1 dev enc2 proto static metric 101 9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.50 metric 101 192.168.79.0/24 dev enc1 proto kernel scope link src 192.168.79.20 metric 100 [core@master-00 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 9.12.23.1 0.0.0.0 UG 100 0 0 enc2 0.0.0.0 192.168.79.1 0.0.0.0 UG 100 0 0 br-ex 9.12.23.0 0.0.0.0 255.255.255.0 U 100 0 0 enc2 192.168.79.0 0.0.0.0 255.255.255.0 U 100 0 0 br-ex [core@master-00 ~]$ ip route show default via 9.12.23.1 dev enc2 proto static metric 100 default via 192.168.79.1 dev br-ex proto dhcp metric 100 9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.51 metric 100 10.128.0.0/14 via 10.130.0.1 dev ovn-k8s-mp0 10.130.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.130.0.2 169.254.169.3 via 10.130.0.1 dev ovn-k8s-mp0 192.168.79.0/24 dev br-ex proto kernel scope link src 192.168.79.21 metric 100 With the above, it looks like the metrics are properly set on bootstrap only, the masters still remain at 100 for both networks - is that the problem? Verified in the master log that enc1(192.168.79.1) is the chosen shared gateway bridge: Jan 20 17:13:23 master-00.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1321]: IPv4 Default gateway interface found: enc1 Then viewing the ovn pod for master-0 we see that the network is unreachable when trying to go over 9.12.23.1 gw - # oc describe pod ovnkube-node-tp87d -n openshift-ovn-kubernetes ... I0120 17:27:59.480738 16406 ovs.go:204] exec(17): /usr/bin/ovs-appctl --timeout=15 dpif/show-dp-features br-ex I0120 17:27:59.482776 16406 ovs.go:207] exec(17): stdout: "Masked set action: Yes\nTunnel push pop: No\nUfid: Yes\nTruncate action: Yes\nClone action: Yes\nSample nesting: 10\nConntrack eventmask: Yes\nConntrack clear: Yes\nMax dp_hash algorithm: 0\nCheck pkt length action: Yes\nConntrack timeout policy: Yes\nExplicit Drop action: No\nOptimized Balance TCP mode: No\nConntrack all-zero IP SNAT: Yes\nMax VLAN headers: 2\nMax MPLS depth: 3\nRecirc: Yes\nCT state: Yes\nCT zone: Yes\nCT mark: Yes\nCT label: Yes\nCT state NAT: Yes\nCT orig tuple: Yes\nCT orig tuple for IPv6: Yes\nIPv6 ND Extension: No\n" I0120 17:27:59.482807 16406 ovs.go:208] exec(17): stderr: "" F0120 17:27:59.483076 16406 ovnkube.go:133] unable to add OVN masquerade route to host, error: failed to add route for subnet 169.254.169.0/30 via gateway 9.12.23.1 with mtu 0: network is unreachable Under my virt-install command, this is the syntax I'm using to define a bridge and macvtap network: --network bridge=br-ocp,mac=52:54:00:da:70:41 \ --network direct,source=vlan508,source.mode=bridge,mac=52:54:00:5e:14:bb \ I'm not sure if that should matter or be a reason for concern, but as you stated, it seems to only pick the first one in the routing table list.
Irrespective of this BZ, in general it is not a good idea to add multiple default gateways. Because as its name suggests, the default gateway should only be ONE in a system. When we add multiple ip= kernel argument each with its own gateway, the Kernel provides all of them as default gateway to the NetworkManager and it could not decide which of the route is default or assign a higher metric value. Similarly when you add one DHCP provided gateway and another static gateway, the system has no clue which one it should use to do default routing despite different metric values. It could be that a higher metric route is running your haproxy, named etc service. However, this problem you're able to reproduce with OVN should be solved with the PR https://github.com/openshift/machine-config-operator/pull/2898. We will just have to wait before we get a OCP release with the fix. But again it was just by chance that the NM picks up (192.168.79.1) as your default route and assigned the lowest metric. If for some reason it would have picked (9.12.23.1) as your default route with lowest metric, not even the above PR will fix the installation failure.
[core@master-00 ~]$ ip route show default via 9.12.23.1 dev enc2 proto static metric 100 default via 192.168.79.1 dev br-ex proto dhcp metric 100 9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.51 metric 100 10.128.0.0/14 via 10.130.0.1 dev ovn-k8s-mp0 10.130.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.130.0.2 169.254.169.3 via 10.130.0.1 dev ovn-k8s-mp0 192.168.79.0/24 dev br-ex proto kernel scope link src 192.168.79.21 metric 100 With the above, it looks like the metrics are properly set on bootstrap only, the masters still remain at 100 for both networks - is that the problem? It should have been 49 if you had this fix: https://github.com/openshift/machine-config-operator/pull/2898/files#diff-afb45a3711a77d94f26471d9d94a7f7a03d931d9e72bdf849f2e26e2711d6fd7R125
Hi Tim, I verified back with OCP build 4.10.0-fc.4 that the fix resolves the metric assignment. This is what shows in the routing table for bootstrap, master and worker nodes: [core@bootstrap-00 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.79.1 0.0.0.0 UG 100 0 0 enc1 0.0.0.0 9.12.23.1 0.0.0.0 UG 101 0 0 enc2 9.12.23.0 0.0.0.0 255.255.255.0 U 101 0 0 enc2 192.168.79.0 0.0.0.0 255.255.255.0 U 100 0 0 enc1 [core@master-00 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.79.1 0.0.0.0 UG 49 0 0 br-ex 0.0.0.0 9.12.23.1 0.0.0.0 UG 100 0 0 enc2 9.12.23.0 0.0.0.0 255.255.255.0 U 100 0 0 enc2 10.128.0.0 10.130.0.1 255.252.0.0 UG 0 0 0 ovn-k8s-mp0 10.130.0.0 0.0.0.0 255.255.254.0 U 0 0 0 ovn-k8s-mp0 169.254.169.0 192.168.79.1 255.255.255.252 UG 0 0 0 br-ex 169.254.169.3 10.130.0.1 255.255.255.255 UGH 0 0 0 ovn-k8s-mp0 172.30.0.0 192.168.79.1 255.255.0.0 UG 0 0 0 br-ex 192.168.79.0 0.0.0.0 255.255.255.0 U 49 0 0 br-ex [core@worker-00 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.79.1 0.0.0.0 UG 49 0 0 br-ex 0.0.0.0 9.12.23.1 0.0.0.0 UG 100 0 0 enc2 9.12.23.0 0.0.0.0 255.255.255.0 U 100 0 0 enc2 10.128.0.0 10.131.0.1 255.252.0.0 UG 0 0 0 ovn-k8s-mp0 10.131.0.0 0.0.0.0 255.255.254.0 U 0 0 0 ovn-k8s-mp0 169.254.169.0 192.168.79.1 255.255.255.252 UG 0 0 0 br-ex 169.254.169.3 10.131.0.1 255.255.255.255 UGH 0 0 0 ovn-k8s-mp0 172.30.0.0 192.168.79.1 255.255.0.0 UG 0 0 0 br-ex 192.168.79.0 0.0.0.0 255.255.255.0 U 49 0 0 br-ex I was able to perform the installation successfully using the original deployment commands. The resolution for this bug and https://bugzilla.redhat.com/show_bug.cgi?id=2046520 is a documentation update to clarify the default gateway requirements.
@chanphil.com WOuld you be able to verify this bug? Thanks
Making Comment 23 un-private as Phil may not be able to see private comment
Hi @chanphil.com just checking to see if you are able to verify this per above comment. Thank you!
Hi, We re-verified our network tests using OCP 4.10.6 and RHCOS 4.10.3. When deploying the OCP nodes with our original network kernel parameters using virt-install - where we define two networks, each with their own default gateways for a total of two defaults gateways. The OCP installation completes, and the following routing tables are present on each of the following nodes for bootstrap, master and worker. [core@bootstrap-00 ~]$ ip route show default via 192.168.79.1 dev enc1 proto dhcp metric 100 default via 9.12.23.1 dev enc2 proto static metric 101 9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.49 metric 101 192.168.79.0/24 dev enc1 proto kernel scope link src 192.168.79.20 metric 100 [core@bootstrap-00 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.79.1 0.0.0.0 UG 100 0 0 enc1 0.0.0.0 9.12.23.1 0.0.0.0 UG 101 0 0 enc2 9.12.23.0 0.0.0.0 255.255.255.0 U 101 0 0 enc2 192.168.79.0 0.0.0.0 255.255.255.0 U 100 0 0 enc1 [core@master-00 ~]$ ip route show default via 192.168.79.1 dev br-ex proto dhcp metric 49 default via 9.12.23.1 dev enc2 proto static metric 100 9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.50 metric 100 10.128.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.128.0.2 10.128.0.0/14 via 10.128.0.1 dev ovn-k8s-mp0 169.254.169.0/30 via 192.168.79.1 dev br-ex 169.254.169.3 via 10.128.0.1 dev ovn-k8s-mp0 172.30.0.0/16 via 192.168.79.1 dev br-ex mtu 1400 192.168.79.0/24 dev br-ex proto kernel scope link src 192.168.79.21 metric 49 [core@master-00 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.79.1 0.0.0.0 UG 49 0 0 br-ex 0.0.0.0 9.12.23.1 0.0.0.0 UG 100 0 0 enc2 9.12.23.0 0.0.0.0 255.255.255.0 U 100 0 0 enc2 10.128.0.0 0.0.0.0 255.255.254.0 U 0 0 0 ovn-k8s-mp0 10.128.0.0 10.128.0.1 255.252.0.0 UG 0 0 0 ovn-k8s-mp0 169.254.169.0 192.168.79.1 255.255.255.252 UG 0 0 0 br-ex 169.254.169.3 10.128.0.1 255.255.255.255 UGH 0 0 0 ovn-k8s-mp0 172.30.0.0 192.168.79.1 255.255.0.0 UG 0 0 0 br-ex 192.168.79.0 0.0.0.0 255.255.255.0 U 49 0 0 br-ex [core@worker-00 ~]$ ip route show default via 192.168.79.1 dev br-ex proto dhcp metric 49 default via 9.12.23.1 dev enc2 proto static metric 100 9.12.23.0/24 dev enc2 proto kernel scope link src 9.12.23.56 metric 100 10.128.0.0/14 via 10.131.0.1 dev ovn-k8s-mp0 10.131.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.131.0.2 169.254.169.0/30 via 192.168.79.1 dev br-ex 169.254.169.3 via 10.131.0.1 dev ovn-k8s-mp0 172.30.0.0/16 via 192.168.79.1 dev br-ex mtu 1400 192.168.79.0/24 dev br-ex proto kernel scope link src 192.168.79.24 metric 49 [core@worker-00 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.79.1 0.0.0.0 UG 49 0 0 br-ex 0.0.0.0 9.12.23.1 0.0.0.0 UG 100 0 0 enc2 9.12.23.0 0.0.0.0 255.255.255.0 U 100 0 0 enc2 10.128.0.0 10.131.0.1 255.252.0.0 UG 0 0 0 ovn-k8s-mp0 10.131.0.0 0.0.0.0 255.255.254.0 U 0 0 0 ovn-k8s-mp0 169.254.169.0 192.168.79.1 255.255.255.252 UG 0 0 0 br-ex 169.254.169.3 10.131.0.1 255.255.255.255 UGH 0 0 0 ovn-k8s-mp0 172.30.0.0 192.168.79.1 255.255.0.0 UG 0 0 0 br-ex 192.168.79.0 0.0.0.0 255.255.255.0 U 49 0 0 br-ex As Muhammad noted in comment 18 above, it is not a good practice to use multiple default gateways. As a result, we suggested a minor update to the documentation as noted in Bugzilla 2046520, which was accepted and changed. We then tested variations of assigning the IP address statically or through DHCP, but only assigning one default gateway as recommended. We verified that the correct default gateway and network we wanted to use for OVN network would be selected, and the OCP cluster would successfully install. And we no longer encounter the issue as originally reported. Thank you for the resolution.
Changing to Verified based on Phil's Comment 27
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069