Bug 1881882

Summary: [RHCOS]Cannot access to the service's externalIP with egressIP from some pods
Product: OpenShift Container Platform Reporter: huirwang
Component: NetworkingAssignee: Patryk Diak <pdiak>
Networking sub component: openshift-sdn QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: low CC: anbhat, bbennett, bpickard, pdiak, surya, vlaad, zzhao
Version: 4.6Keywords: Reopened
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-14 20:38:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description huirwang 2020-09-23 10:03:19 UTC
Description of problem:
According to comment in https://bugzilla.redhat.com/show_bug.cgi?id=1762580#c93, file this bug to track RHCOS issue.

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-09-21-030155.

How reproducible:
Always


Steps to Reproduce:
4.6.0-0.nightly-2020-09-21-030155.

Steps:
oc get clusternetwork
NAME      CLUSTER NETWORK   SERVICE NETWORK   PLUGIN NAME
default   10.128.0.0/14     172.30.0.0/16     redhat/openshift-ovs-multitenant

 oc get nodes -o wide
NAME                                STATUS   ROLES    AGE   VERSION           INTERNAL-IP      EXTERNAL-IP      
huir-bugverify-stwtm-master-0       Ready    master   89m   v1.19.0+7f9e863   136.144.52.231   136.144.52.231 
huir-bugverify-stwtm-master-1       Ready    master   89m   v1.19.0+7f9e863   136.144.52.236   136.144.52.236  
huir-bugverify-stwtm-master-2       Ready    master   89m   v1.19.0+7f9e863   136.144.52.230   136.144.52.230   
huir-bugverify-stwtm-worker-ch5fh   Ready    worker   74m   v1.19.0+7f9e863   136.144.52.248   136.144.52.248   
huir-bugverify-stwtm-worker-g4b5v   Ready    worker   74m   v1.19.0+7f9e863   136.144.52.240   136.144.52.240  

 1. Patch node huir-bugverify-stwtm-master-0  with egress IP.

 2. Create project test-client and pods in it, pods located on huir-bugverify-stwtm-worker-g4b5v 
 3. Patch egress IP to project test-client.

oc get pods -n test-client -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
hello-pod   1/1     Running   0          21m   10.128.2.11   huir-bugverify-stwtm-worker-g4b5v   <none>           <none>
huiran-mac:script hrwang$ oc get netnamespace test-client
NAME          NETID     EGRESS IPS
test-client   3995387   ["136.144.52.235"]
huiran-mac:script hrwang$ oc get hostsubnet
NAME                                HOST                                HOST IP          SUBNET          EGRESS CIDRS   EGRESS IPS
huir-bugverify-stwtm-master-0       huir-bugverify-stwtm-master-0       136.144.52.231   10.130.0.0/23                  ["136.144.52.235"]
huir-bugverify-stwtm-master-1       huir-bugverify-stwtm-master-1       136.144.52.236   10.128.0.0/23                  
huir-bugverify-stwtm-master-2       huir-bugverify-stwtm-master-2       136.144.52.230   10.129.0.0/23                  
huir-bugverify-stwtm-worker-ch5fh   huir-bugverify-stwtm-worker-ch5fh   136.144.52.248   10.131.0.0/23                  
huir-bugverify-stwtm-worker-g4b5v   huir-bugverify-stwtm-worker-g4b5v   136.144.52.240   10.128.2.0/23 


4. Create service in project test-server and patch external IP. External IP is node huir-bugverify-stwtm-master-1's IP.

oc get svc -n test-server
NAME             TYPE        CLUSTER-IP    EXTERNAL-IP      PORT(S)     AGE
hello-service1   ClusterIP   172.30.0.22   136.144.52.236   27018/TCP   11m

oc get pods -n test-server -o wide
NAME          READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
hello-pod-1   1/1     Running   0          12m   10.131.0.29   huir-bugverify-stwtm-worker-ch5fh   <none>           <none>

5. From pods in test-client to access external IP
oc rsh -n test-client hello-pod
/ # curl 136.144.52.236:27018 --connect-timeout 5
curl: (28) Connection timed out after 5000 milliseconds

6. If remove the egress IP from test-client project, then access the externl IP can suceeded.
oc rsh -n test-client hello-pod
/ # curl 136.144.52.236:27018
Hello OpenShift!

Actual results:


Expected results:


Additional info:

Comment 1 Ben Bennett 2020-09-23 13:04:25 UTC

*** This bug has been marked as a duplicate of bug 1762580 ***

Comment 3 Juan Luis de Sousa-Valadas 2020-09-23 13:54:04 UTC
Public comment about reopening: This tracks specifically the issue in RHCOS. BZ1762580 fixes it for RHEL, which is good enough at the moment because nobody reported this issue on RHCOS.

Comment 5 Juan Luis de Sousa-Valadas 2020-10-02 11:10:19 UTC
Is it correct for this to be high priority? To the best of my knowledge this isn't affecting any customer or QA

Comment 6 Juan Luis de Sousa-Valadas 2021-03-08 14:28:49 UTC
Nobody has requested this to be fixed so I'm going to close it, please reopen it if necessary.

Comment 10 huirwang 2022-05-07 06:43:26 UTC
Hi flaviof,

I reproduced this issue on 4.11.0-0.nightly-2022-05-06-180112 which includes the PR https://github.com/openshift/ovn-kubernetes/pull/1078.
This bug is a little similar as https://bugzilla.redhat.com/show_bug.cgi?id=2016534, the difference is that here with two namespace and it doesn't work.
Please let me know when you need a reproducer and then I can prepare one.

1. ExternalIp svc and related pods
oc get svc -n test
NAME             TYPE        CLUSTER-IP     EXTERNAL-IP      PORT(S)     AGE
hello-service1   ClusterIP   172.30.13.32   172.31.249.223   27018/TCP   36m

oc get pods -n test -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
hello-pod-1     1/1     Running   0          36m   10.128.2.35   huirwang-0507b-qkg46-worker-vxv59   <none>           <none>

2. Create another namespace and pods in it, patch egressip to namespace and a node
oc get pods -n test-client -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
hello-pod   1/1     Running   0          25m   10.131.0.18   huirwang-0507b-qkg46-worker-qctkn   <none>           <none>

oc get netnamespace test-client
NAME          NETID     EGRESS IPS
test-client   8373126   ["172.31.249.200"]

$ oc get hostsubnet
NAME                                HOST                                HOST IP          SUBNET          EGRESS CIDRS   EGRESS IPS
huirwang-0507b-qkg46-master-0       huirwang-0507b-qkg46-master-0       172.31.249.61    10.128.0.0/23                  ["172.31.249.200"]
huirwang-0507b-qkg46-master-1       huirwang-0507b-qkg46-master-1       172.31.249.223   10.130.0.0/23                  
huirwang-0507b-qkg46-master-2       huirwang-0507b-qkg46-master-2       172.31.249.54    10.129.0.0/23                  
huirwang-0507b-qkg46-worker-qctkn   huirwang-0507b-qkg46-worker-qctkn   172.31.249.158   10.131.0.0/23                  
huirwang-0507b-qkg46-worker-vxv59   huirwang-0507b-qkg46-worker-vxv59   172.31.249.26    10.128.2.0/23 

3. From pod hello-pod to access service hello-service1 with externalip
oc rsh -n test-client hello-pod
/ # curl 172.31.249.223:27018 --connect-timeout 5
curl: (28) Connection timed out after 5001 milliseconds

4. Remove egressip from namespace test-client, then curl works.
$oc patch netnamespace test-client --type=merge -p  '{"egressIPs": []}'
netnamespace.network.openshift.io/test-client patched
$  oc rsh -n test-client hello-pod
/ # curl 172.31.249.223:27018 --connect-timeout 5
Hello OpenShift!
/ #

Comment 20 errata-xmlrpc 2022-09-14 20:38:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.8.49 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6308