Created attachment 1882427 [details] must-gather and sosreports Summary: [OVN] Multi-homed node - No Route From POD network to second NIC network This is a blocker issue for customers/partner moving from 4.7.z to 4.8.z. Description of problem: When running RHOCP 4.8.39 in a Three-Node OpenShift Compact Cluster, multi-homed node doesn't provide right Routing path from POD internal network to external network through second NIC on node. 1) A pod using POD Network which has Ip Address: 10.128.0.45 located on master1 2) We can't ping from POD located on master1 to master3 second NIC which is 10.97.227.59/22 3) In the following tests, from node level at master1 (where pod is located) we can see that: 3a) ICMP request between 10.128.0.45 -> 10.97.227.59 3b) ICMP request (probably after some OVN routing) 10.0.128.253 -> 10.97.227.59 with "no reply" 3c) This use case used to be working fine on OCP 4.7, but it changed since we moved to OCP 4.8. 4) While PODs still can ping any of them from the nodes themselves. As listed bellow in the tracepaths, the issue seems to be related to the hop at 100.64.0.3 or the switch IP for node assigned by logical_switch_manager. In 4.8.39, there is this additional hop caused by the switch IP for node assigned by logical_switch_manager. This additional hop doesn't exist in 4.7.z according to the tracepaths collected. Version-Release number of selected component (if applicable): RHOCP 4.8.39 How reproducible: Give the following Three-Node OpenShift Compact Cluster configuration, we have added a second NIC to each node in the subnet 10.97.224.0/22 - master1: - primary ip (MachineNetworkCIDR): 10.0.128.253/20 - secondary ip: 10.97.226.192/22 - master2: - primary ip(MachineNetworkCIDR): 10.0.142.9/20 - secondary ip: 10.97.224.75/22 - master3: - primary ip(MachineNetworkCIDR): 10.0.143.199/20 - secondary ip: 10.97.227.59/22 - POD IP 10.128.0.45 on master1 $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES multitool-openshift-58d96959c4-6txzk 1/1 Running 0 16m 10.128.0.45 ip-10-0-128-253.ec2.internal <none> <none> $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-128-253.ec2.internal Ready master,worker 7h v1.21.8+ed4d8fd 10.0.128.253 <none> Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa) 4.18.0-305.45.1.el8_4.x86_64 cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8 ip-10-0-142-9.ec2.internal Ready master,worker 7h v1.21.8+ed4d8fd 10.0.142.9 <none> Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa) 4.18.0-305.45.1.el8_4.x86_64 cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8 ip-10-0-143-199.ec2.internal Ready master,worker 7h v1.21.8+ed4d8fd 10.0.143.199 <none> Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa) 4.18.0-305.45.1.el8_4.x86_64 cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.39 True False 3h35m Cluster version is 4.8.39 $ grep -Ri "switch IP for node" * must-gather.local.2207980839212123534/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-55a4920ef38c7a5c86eda679c2083ef4f749cc9de0c82b2974d4e1313d4d4412/namespaces/openshift-ovn-kubernetes/pods/ovnkube-master-xt4mc/ovnkube-master/ovnkube-master/logs/current.log:2022-05-23T10:41:52.516091986Z I0523 10:41:52.516023 1 logical_switch_manager.go:345] Initializing and reserving the join switch IP for node: ip-10-0-128-253.ec2.internal to: [100.64.0.3/16] must-gather.local.2207980839212123534/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-55a4920ef38c7a5c86eda679c2083ef4f749cc9de0c82b2974d4e1313d4d4412/namespaces/openshift-ovn-kubernetes/pods/ovnkube-master-xt4mc/ovnkube-master/ovnkube-master/logs/current.log:2022-05-23T10:41:52.519571164Z I0523 10:41:52.519506 1 logical_switch_manager.go:345] Initializing and reserving the join switch IP for node: ip-10-0-142-9.ec2.internal to: [100.64.0.2/16] must-gather.local.2207980839212123534/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-55a4920ef38c7a5c86eda679c2083ef4f749cc9de0c82b2974d4e1313d4d4412/namespaces/openshift-ovn-kubernetes/pods/ovnkube-master-xt4mc/ovnkube-master/ovnkube-master/logs/current.log:2022-05-23T10:41:52.523332436Z I0523 10:41:52.523259 1 logical_switch_manager.go:345] Initializing and reserving the join switch IP for node: ip-10-0-143-199.ec2.internal to: [100.64.0.4/16] Steps to Reproduce: 1) Test 1 - Not reachable - POD IP 10.130.0.57 on master1 can't ping to master3's second NIC 10.97.227.59 / $ ping -c 3 10.97.227.59 PING 10.97.227.59 (10.97.227.59) 56(84) bytes of data. --- 10.97.227.59 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2035ms / $ tracepath -n 10.97.227.59 1?: [LOCALHOST] pmtu 8901 1: 10.128.0.1 1.383ms asymm 2 1: 10.128.0.1 1.222ms asymm 2 2: 100.64.0.3 1.458ms asymm 3 3: no reply (..) 28: no reply 29: no reply 30: no reply Too many hops: pmtu 8901 Resume: pmtu 8901 / $ ######################################################################################################################### 2) Test 2 - OK - POD IP 10.128.0.45 on master1 ping to master1's second NIC 10.97.226.192 / $ ping -c 3 10.97.226.192 PING 10.97.226.192 (10.97.226.192) 56(84) bytes of data. 64 bytes from 10.97.226.192: icmp_seq=1 ttl=64 time=0.108 ms 64 bytes from 10.97.226.192: icmp_seq=2 ttl=64 time=0.077 ms 64 bytes from 10.97.226.192: icmp_seq=3 ttl=64 time=0.093 ms --- 10.97.226.192 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2042ms rtt min/avg/max/mdev = 0.077/0.092/0.108/0.012 ms / $ tracepath -n 10.97.226.192 1?: [LOCALHOST] pmtu 8901 1: 10.128.0.1 1.636ms asymm 2 1: 10.128.0.1 1.059ms asymm 2 2: 10.97.226.192 1.125ms reached Resume: pmtu 8901 hops 2 back 1 / $ ######################################################################################################################### Actual results: POD IP 10.130.0.57 on master1 can't ping to master3's second NIC 10.97.227.59 ######################################################################################################################### Expected results: As previous observed in RHOCP 4.7.37, POD IP 10.130.0.57 on master1 must be able ping to master3's second NIC 10.97.227.59 In the tracepath, we should not see the switch IP for node (assigned by OVN logical_switch_manager). ############### RHOCP 4.7.37 ##################### - m1: - primary ip: 10.0.128.253/20 - secondary ip: 10.97.226.192/22 - m2: - primary ip: 10.0.142.9/20 - secondary ip: 10.97.227.59/22 - m3: - primary ip: 10.0.143.199/20 - secondary ip: 10.97.224.75/22 - POD IP 10.130.0.57 on m3 $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES multitool-openshift-58d96959c4-lnzsx 1/1 Running 0 16s 10.130.0.57 ip-10-0-143-199.ec2.internal <none> <none> [arolivei@arolivei aws]$ #### OK - Test 1 - POD IP 10.130.0.57 on m3 ping to m2's second NIC 10.97.227.59 $ ping -c 3 10.97.227.59 PING 10.97.227.59 (10.97.227.59) 56(84) bytes of data. 64 bytes from 10.97.227.59: icmp_seq=1 ttl=63 time=2.89 ms 64 bytes from 10.97.227.59: icmp_seq=2 ttl=63 time=0.658 ms 64 bytes from 10.97.227.59: icmp_seq=3 ttl=63 time=0.747 ms --- 10.97.227.59 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2008ms rtt min/avg/max/mdev = 0.658/1.431/2.888/1.030 ms $ tracepath -n 10.97.227.59 1?: [LOCALHOST] pmtu 8901 1: 10.130.0.1 1.422ms asymm 2 1: 10.130.0.1 0.930ms asymm 2 2: 10.130.0.2 0.942ms asymm 1 3: 10.97.227.59 0.663ms reached Resume: pmtu 8901 hops 3 back 2 / $ #### OK - Test 2 - POD IP 10.130.0.57 on m3 ping to m3's second NIC 10.97.224.75 $ oc exec -it multitool-openshift-58d96959c4-lnzsx /bin/sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. / $ ping -c 3 10.97.224.75 PING 10.97.224.75 (10.97.224.75) 56(84) bytes of data. 64 bytes from 10.97.224.75: icmp_seq=1 ttl=64 time=1.94 ms 64 bytes from 10.97.224.75: icmp_seq=2 ttl=64 time=0.594 ms 64 bytes from 10.97.224.75: icmp_seq=3 ttl=64 time=0.109 ms --- 10.97.224.75 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2057ms rtt min/avg/max/mdev = 0.109/0.880/1.937/0.773 ms / $ tracepath -n 10.97.224.75 1?: [LOCALHOST] pmtu 8901 1: 10.130.0.1 1.498ms asymm 2 1: 10.130.0.1 1.008ms asymm 2 2: 10.97.224.75 0.983ms reached Resume: pmtu 8901 hops 2 back 1 / $ ############### ######################################################################################################################### Additional info: