1881882 – [RHCOS]Cannot access to the service's externalIP with egressIP from some pods

Bug 1881882 - [RHCOS]Cannot access to the service's externalIP with egressIP from some pods

Summary: [RHCOS]Cannot access to the service's externalIP with egressIP from some pods

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	4.8.z
Assignee:	Patryk Diak
QA Contact:	huirwang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-23 10:03 UTC by huirwang
Modified:	2022-10-05 07:40 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-09-14 20:38:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 1078	None	open	Bug 2070929: Downstream Merge: 04-05-2022	2022-05-04 20:02:55 UTC
Github	ovn-org ovn-kubernetes pull 2945	None	Merged	delete SNAT2NIP if pod.node == egressNodeServingPod	2022-05-04 20:02:55 UTC
Red Hat Product Errata	RHSA-2022:6308	None	None	None	2022-09-14 20:40:08 UTC

Description huirwang 2020-09-23 10:03:19 UTC

Description of problem:
According to comment in https://bugzilla.redhat.com/show_bug.cgi?id=1762580#c93, file this bug to track RHCOS issue.

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-09-21-030155.

How reproducible:
Always


Steps to Reproduce:
4.6.0-0.nightly-2020-09-21-030155.

Steps:
oc get clusternetwork
NAME      CLUSTER NETWORK   SERVICE NETWORK   PLUGIN NAME
default   10.128.0.0/14     172.30.0.0/16     redhat/openshift-ovs-multitenant

 oc get nodes -o wide
NAME                                STATUS   ROLES    AGE   VERSION           INTERNAL-IP      EXTERNAL-IP      
huir-bugverify-stwtm-master-0       Ready    master   89m   v1.19.0+7f9e863   136.144.52.231   136.144.52.231 
huir-bugverify-stwtm-master-1       Ready    master   89m   v1.19.0+7f9e863   136.144.52.236   136.144.52.236  
huir-bugverify-stwtm-master-2       Ready    master   89m   v1.19.0+7f9e863   136.144.52.230   136.144.52.230   
huir-bugverify-stwtm-worker-ch5fh   Ready    worker   74m   v1.19.0+7f9e863   136.144.52.248   136.144.52.248   
huir-bugverify-stwtm-worker-g4b5v   Ready    worker   74m   v1.19.0+7f9e863   136.144.52.240   136.144.52.240  

 1. Patch node huir-bugverify-stwtm-master-0  with egress IP.

 2. Create project test-client and pods in it, pods located on huir-bugverify-stwtm-worker-g4b5v 
 3. Patch egress IP to project test-client.

oc get pods -n test-client -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
hello-pod   1/1     Running   0          21m   10.128.2.11   huir-bugverify-stwtm-worker-g4b5v   <none>           <none>
huiran-mac:script hrwang$ oc get netnamespace test-client
NAME          NETID     EGRESS IPS
test-client   3995387   ["136.144.52.235"]
huiran-mac:script hrwang$ oc get hostsubnet
NAME                                HOST                                HOST IP          SUBNET          EGRESS CIDRS   EGRESS IPS
huir-bugverify-stwtm-master-0       huir-bugverify-stwtm-master-0       136.144.52.231   10.130.0.0/23                  ["136.144.52.235"]
huir-bugverify-stwtm-master-1       huir-bugverify-stwtm-master-1       136.144.52.236   10.128.0.0/23                  
huir-bugverify-stwtm-master-2       huir-bugverify-stwtm-master-2       136.144.52.230   10.129.0.0/23                  
huir-bugverify-stwtm-worker-ch5fh   huir-bugverify-stwtm-worker-ch5fh   136.144.52.248   10.131.0.0/23                  
huir-bugverify-stwtm-worker-g4b5v   huir-bugverify-stwtm-worker-g4b5v   136.144.52.240   10.128.2.0/23 


4. Create service in project test-server and patch external IP. External IP is node huir-bugverify-stwtm-master-1's IP.

oc get svc -n test-server
NAME             TYPE        CLUSTER-IP    EXTERNAL-IP      PORT(S)     AGE
hello-service1   ClusterIP   172.30.0.22   136.144.52.236   27018/TCP   11m

oc get pods -n test-server -o wide
NAME          READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
hello-pod-1   1/1     Running   0          12m   10.131.0.29   huir-bugverify-stwtm-worker-ch5fh   <none>           <none>

5. From pods in test-client to access external IP
oc rsh -n test-client hello-pod
/ # curl 136.144.52.236:27018 --connect-timeout 5
curl: (28) Connection timed out after 5000 milliseconds

6. If remove the egress IP from test-client project, then access the externl IP can suceeded.
oc rsh -n test-client hello-pod
/ # curl 136.144.52.236:27018
Hello OpenShift!

Actual results:


Expected results:


Additional info:

Comment 1 Ben Bennett 2020-09-23 13:04:25 UTC


*** This bug has been marked as a duplicate of bug 1762580 ***

Comment 3 Juan Luis de Sousa-Valadas 2020-09-23 13:54:04 UTC

Public comment about reopening: This tracks specifically the issue in RHCOS. BZ1762580 fixes it for RHEL, which is good enough at the moment because nobody reported this issue on RHCOS.

Comment 5 Juan Luis de Sousa-Valadas 2020-10-02 11:10:19 UTC

Is it correct for this to be high priority? To the best of my knowledge this isn't affecting any customer or QA

Comment 6 Juan Luis de Sousa-Valadas 2021-03-08 14:28:49 UTC

Nobody has requested this to be fixed so I'm going to close it, please reopen it if necessary.

Comment 10 huirwang 2022-05-07 06:43:26 UTC

Hi flaviof,

I reproduced this issue on 4.11.0-0.nightly-2022-05-06-180112 which includes the PR https://github.com/openshift/ovn-kubernetes/pull/1078.
This bug is a little similar as https://bugzilla.redhat.com/show_bug.cgi?id=2016534, the difference is that here with two namespace and it doesn't work.
Please let me know when you need a reproducer and then I can prepare one.

1. ExternalIp svc and related pods
oc get svc -n test
NAME             TYPE        CLUSTER-IP     EXTERNAL-IP      PORT(S)     AGE
hello-service1   ClusterIP   172.30.13.32   172.31.249.223   27018/TCP   36m

oc get pods -n test -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
hello-pod-1     1/1     Running   0          36m   10.128.2.35   huirwang-0507b-qkg46-worker-vxv59   <none>           <none>

2. Create another namespace and pods in it, patch egressip to namespace and a node
oc get pods -n test-client -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
hello-pod   1/1     Running   0          25m   10.131.0.18   huirwang-0507b-qkg46-worker-qctkn   <none>           <none>

oc get netnamespace test-client
NAME          NETID     EGRESS IPS
test-client   8373126   ["172.31.249.200"]

$ oc get hostsubnet
NAME                                HOST                                HOST IP          SUBNET          EGRESS CIDRS   EGRESS IPS
huirwang-0507b-qkg46-master-0       huirwang-0507b-qkg46-master-0       172.31.249.61    10.128.0.0/23                  ["172.31.249.200"]
huirwang-0507b-qkg46-master-1       huirwang-0507b-qkg46-master-1       172.31.249.223   10.130.0.0/23                  
huirwang-0507b-qkg46-master-2       huirwang-0507b-qkg46-master-2       172.31.249.54    10.129.0.0/23                  
huirwang-0507b-qkg46-worker-qctkn   huirwang-0507b-qkg46-worker-qctkn   172.31.249.158   10.131.0.0/23                  
huirwang-0507b-qkg46-worker-vxv59   huirwang-0507b-qkg46-worker-vxv59   172.31.249.26    10.128.2.0/23 

3. From pod hello-pod to access service hello-service1 with externalip
oc rsh -n test-client hello-pod
/ # curl 172.31.249.223:27018 --connect-timeout 5
curl: (28) Connection timed out after 5001 milliseconds

4. Remove egressip from namespace test-client, then curl works.
$oc patch netnamespace test-client --type=merge -p  '{"egressIPs": []}'
netnamespace.network.openshift.io/test-client patched
$  oc rsh -n test-client hello-pod
/ # curl 172.31.249.223:27018 --connect-timeout 5
Hello OpenShift!
/ #

Comment 20 errata-xmlrpc 2022-09-14 20:38:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.8.49 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6308

Note You need to log in before you can comment on or make changes to this bug.