Bug 2089389 - [OVN] Multi-homed node - No Route From POD network to second NIC network
Summary: [OVN] Multi-homed node - No Route From POD network to second NIC network
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: obraunsh
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-23 14:40 UTC by Arthur Oliveira
Modified: 2022-10-14 08:04 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-11 13:17:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
must-gather and sosreports (84 bytes, text/plain)
2022-05-23 14:40 UTC, Arthur Oliveira
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-docs issues 49580 0 None open [enterprise-4.8] Issue in file release_notes/ocp-4-8-release-notes.adoc 2022-10-14 08:04:11 UTC
Github openshift openshift-docs pull 51379 0 None open GH#49580: Adds known issue to the 4.8 release notes 2022-10-14 08:04:11 UTC
Red Hat Knowledge Base (Solution) 6962127 0 None None None 2022-10-14 08:04:11 UTC

Description Arthur Oliveira 2022-05-23 14:40:34 UTC
Created attachment 1882427 [details]
must-gather and sosreports

Summary: 
[OVN] Multi-homed node - No Route From POD network to second NIC network
This is a blocker issue for customers/partner moving from 4.7.z to 4.8.z. 

Description of problem:
When running RHOCP 4.8.39 in a Three-Node OpenShift Compact Cluster, multi-homed node doesn't provide right Routing path from POD internal network to external network through second NIC on node. 

1) A pod using POD Network which has Ip Address: 10.128.0.45 located on master1
2) We can't ping from POD located on master1 to master3 second NIC which is 10.97.227.59/22
3) In the following tests, from node level at master1 (where pod is located) we can see that:
3a) ICMP request between 10.128.0.45 -> 10.97.227.59
3b) ICMP request (probably after some OVN routing) 10.0.128.253 -> 10.97.227.59 with "no reply"
3c) This use case used to be working fine on OCP 4.7, but it changed since we moved to OCP 4.8.
4) While PODs still can ping any of them from the nodes themselves. 

As listed bellow in the tracepaths, the issue seems to be related to the hop at 100.64.0.3 or the switch IP for node assigned by logical_switch_manager. 
In 4.8.39, there is this additional hop caused by the  switch IP for node assigned by logical_switch_manager. This additional hop doesn't exist in 4.7.z according to the tracepaths collected.

Version-Release number of selected component (if applicable): RHOCP 4.8.39 


How reproducible:

Give the following Three-Node OpenShift Compact Cluster configuration, we have added a second NIC to each node in the subnet 10.97.224.0/22

- master1:
     - primary ip (MachineNetworkCIDR): 10.0.128.253/20
     - secondary ip: 10.97.226.192/22
- master2:
     - primary ip(MachineNetworkCIDR): 10.0.142.9/20
     - secondary ip: 10.97.224.75/22
- master3:
     - primary ip(MachineNetworkCIDR): 10.0.143.199/20
     - secondary ip: 10.97.227.59/22

- POD IP 10.128.0.45 on master1


$ oc get pods -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP            NODE                           NOMINATED NODE   READINESS GATES
multitool-openshift-58d96959c4-6txzk   1/1     Running   0          16m   10.128.0.45   ip-10-0-128-253.ec2.internal   <none>           <none>

$ oc get nodes -o wide
NAME                           STATUS   ROLES           AGE   VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-128-253.ec2.internal   Ready    master,worker   7h    v1.21.8+ed4d8fd   10.0.128.253   <none>        Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8
ip-10-0-142-9.ec2.internal     Ready    master,worker   7h    v1.21.8+ed4d8fd   10.0.142.9     <none>        Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8
ip-10-0-143-199.ec2.internal   Ready    master,worker   7h    v1.21.8+ed4d8fd   10.0.143.199   <none>        Red Hat Enterprise Linux CoreOS 48.84.202204202010-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.21.6-3.rhaos4.8.git19780ee.2.el8

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.39    True        False         3h35m   Cluster version is 4.8.39

$ grep -Ri "switch IP for node" *
must-gather.local.2207980839212123534/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-55a4920ef38c7a5c86eda679c2083ef4f749cc9de0c82b2974d4e1313d4d4412/namespaces/openshift-ovn-kubernetes/pods/ovnkube-master-xt4mc/ovnkube-master/ovnkube-master/logs/current.log:2022-05-23T10:41:52.516091986Z I0523 10:41:52.516023       1 logical_switch_manager.go:345] Initializing and reserving the join switch IP for node: ip-10-0-128-253.ec2.internal to: [100.64.0.3/16]
must-gather.local.2207980839212123534/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-55a4920ef38c7a5c86eda679c2083ef4f749cc9de0c82b2974d4e1313d4d4412/namespaces/openshift-ovn-kubernetes/pods/ovnkube-master-xt4mc/ovnkube-master/ovnkube-master/logs/current.log:2022-05-23T10:41:52.519571164Z I0523 10:41:52.519506       1 logical_switch_manager.go:345] Initializing and reserving the join switch IP for node: ip-10-0-142-9.ec2.internal to: [100.64.0.2/16]
must-gather.local.2207980839212123534/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-55a4920ef38c7a5c86eda679c2083ef4f749cc9de0c82b2974d4e1313d4d4412/namespaces/openshift-ovn-kubernetes/pods/ovnkube-master-xt4mc/ovnkube-master/ovnkube-master/logs/current.log:2022-05-23T10:41:52.523332436Z I0523 10:41:52.523259       1 logical_switch_manager.go:345] Initializing and reserving the join switch IP for node: ip-10-0-143-199.ec2.internal to: [100.64.0.4/16]


Steps to Reproduce:

1) Test 1 - Not reachable - POD IP 10.130.0.57 on master1 can't ping to master3's second NIC 10.97.227.59
/ $ ping -c 3 10.97.227.59
PING 10.97.227.59 (10.97.227.59) 56(84) bytes of data.

--- 10.97.227.59 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2035ms

/ $ tracepath -n 10.97.227.59
 1?: [LOCALHOST]                      pmtu 8901
 1:  10.128.0.1                                            1.383ms asymm  2 
 1:  10.128.0.1                                            1.222ms asymm  2 
 2:  100.64.0.3                                            1.458ms asymm  3 
 3:  no reply
(..)

28:  no reply
29:  no reply
30:  no reply
     Too many hops: pmtu 8901
     Resume: pmtu 8901 
/ $
#########################################################################################################################
2) Test 2 - OK - POD IP 10.128.0.45 on master1 ping to master1's second NIC 10.97.226.192
/ $ ping -c 3 10.97.226.192
PING 10.97.226.192 (10.97.226.192) 56(84) bytes of data.
64 bytes from 10.97.226.192: icmp_seq=1 ttl=64 time=0.108 ms
64 bytes from 10.97.226.192: icmp_seq=2 ttl=64 time=0.077 ms
64 bytes from 10.97.226.192: icmp_seq=3 ttl=64 time=0.093 ms

--- 10.97.226.192 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2042ms
rtt min/avg/max/mdev = 0.077/0.092/0.108/0.012 ms
/ $ tracepath -n 10.97.226.192
 1?: [LOCALHOST]                      pmtu 8901
 1:  10.128.0.1                                            1.636ms asymm  2 
 1:  10.128.0.1                                            1.059ms asymm  2 
 2:  10.97.226.192                                         1.125ms reached
     Resume: pmtu 8901 hops 2 back 1 
/ $ 
#########################################################################################################################


Actual results:
POD IP 10.130.0.57 on master1 can't ping to master3's second NIC 10.97.227.59

#########################################################################################################################

Expected results:
As previous observed in RHOCP 4.7.37, POD IP 10.130.0.57 on master1 must be able ping to master3's second NIC 10.97.227.59
In the tracepath, we should not see the switch IP for node (assigned by OVN logical_switch_manager). 

############### RHOCP 4.7.37 #####################
- m1:
     - primary ip: 10.0.128.253/20
     - secondary ip: 10.97.226.192/22
- m2:
     - primary ip: 10.0.142.9/20
     - secondary ip: 10.97.227.59/22
- m3:
     - primary ip: 10.0.143.199/20
     - secondary ip: 10.97.224.75/22

- POD IP 10.130.0.57  on m3 

$ oc get pods -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP            NODE                           NOMINATED NODE   READINESS GATES
multitool-openshift-58d96959c4-lnzsx   1/1     Running   0          16s   10.130.0.57   ip-10-0-143-199.ec2.internal   <none>           <none>
[arolivei@arolivei aws]$ 

#### OK - Test 1 - POD IP 10.130.0.57 on m3 ping to m2's second NIC 10.97.227.59 
$ ping -c 3 10.97.227.59  
PING 10.97.227.59 (10.97.227.59) 56(84) bytes of data.
64 bytes from 10.97.227.59: icmp_seq=1 ttl=63 time=2.89 ms
64 bytes from 10.97.227.59: icmp_seq=2 ttl=63 time=0.658 ms
64 bytes from 10.97.227.59: icmp_seq=3 ttl=63 time=0.747 ms

--- 10.97.227.59 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2008ms
rtt min/avg/max/mdev = 0.658/1.431/2.888/1.030 ms
$ tracepath -n 10.97.227.59
 1?: [LOCALHOST]                      pmtu 8901
 1:  10.130.0.1                                            1.422ms asymm  2 
 1:  10.130.0.1                                            0.930ms asymm  2 
 2:  10.130.0.2                                            0.942ms asymm  1 
 3:  10.97.227.59                                          0.663ms reached
     Resume: pmtu 8901 hops 3 back 2 
/ $ 

#### OK - Test 2 - POD IP 10.130.0.57 on m3 ping to m3's second NIC 10.97.224.75

$  oc exec -it multitool-openshift-58d96959c4-lnzsx /bin/sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ $ ping -c 3 10.97.224.75
PING 10.97.224.75 (10.97.224.75) 56(84) bytes of data.
64 bytes from 10.97.224.75: icmp_seq=1 ttl=64 time=1.94 ms
64 bytes from 10.97.224.75: icmp_seq=2 ttl=64 time=0.594 ms
64 bytes from 10.97.224.75: icmp_seq=3 ttl=64 time=0.109 ms

--- 10.97.224.75 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2057ms
rtt min/avg/max/mdev = 0.109/0.880/1.937/0.773 ms

/ $ tracepath -n 10.97.224.75
 1?: [LOCALHOST]                      pmtu 8901
 1:  10.130.0.1                                            1.498ms asymm  2 
 1:  10.130.0.1                                            1.008ms asymm  2 
 2:  10.97.224.75                                          0.983ms reached
     Resume: pmtu 8901 hops 2 back 1 
/ $ 
###############

######################################################################################################################### 
Additional info:


Note You need to log in before you can comment on or make changes to this bug.