Bug 1936920

Summary: socket timeouts for webservice communication between pods
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Dan Winship <danw>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aconstan, akuriyan, arghosh, bbennett, danw, dcbw, echaudro, fleitner, openshift-bugs-escalate, rcarrier, rjamadar
Version: 4.6   
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: iptables rewriting rules Consequence: A client that uses a fixed source port and tries to connect to a service both via the service IP and directly via a pod IP may run into problems with port conflicts. Fix: An additional OVS rule was inserted to notice when this was occurring and do an extra SNAT to avoid the port conflict. Result: Connections work.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-25 01:53:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1910378    
Bug Blocks: 1937547    

Comment 2 zhaozhanqi 2021-03-11 06:21:20 UTC
Verified this bug on 4.7.0-0.nightly-2021-03-11-002149

1. new project and create test podA with 

oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/list_for_pods.json

oc get pod -o wide
NAME              READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
hello-pod2pgsv2   1/1     Running   0          9s    10.128.2.41   ip-10-0-171-150.us-east-2.compute.internal   <none>           <none>
test-rc-j4m4t     1/1     Running   0          65s   10.128.2.40   ip-10-0-171-150.us-east-2.compute.internal   <none>           <none>

 oc get svc
NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
test-service   ClusterIP   172.30.146.95   <none>        27017/TCP   89s

oc scale rc test-rc --replicas=1

2. Create another podB as client with same node 


3. rsh podB and access podA by service ip

   nc 172.30.146.95 27017 -p 30000

4. open another terminal and sent another request to access podA by pod ip from podB
  
  nc 10.128.2.40 8080 -p 30000

5. rsh into sdn pod with same node 

$ oc rsh -n openshift-sdn sdn-4x279
Defaulting container name to sdn.
Use 'oc describe pod/sdn-4x279 -n openshift-sdn' to see all of the containers in this pod.
sh-4.4# conntrack -L | grep 10.128.2.40
tcp      6 431892 ESTABLISHED src=10.128.2.41 dst=172.30.146.95 sport=30000 dport=27017 src=10.128.2.40 dst=10.128.2.41 sport=8080 dport=30000 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp      6 431936 ESTABLISHED src=10.128.2.41 dst=10.128.2.40 sport=30000 dport=8080 src=10.128.2.40 dst=10.128.2.41 sport=8080 dport=27611 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 979 flow entries have been shown


6. sent another request to access podA by pod ip from podB but with different port,

 nc 10.128.2.40 8080 -p 30001

7. Check again in step 5

# conntrack -L | grep 10.128.2.40
tcp      6 431947 ESTABLISHED src=10.128.2.41 dst=172.30.146.95 sport=30000 dport=27017 src=10.128.2.40 dst=10.128.2.41 sport=8080 dport=30000 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp      6 431992 ESTABLISHED src=10.128.2.41 dst=10.128.2.40 sport=30000 dport=8080 src=10.128.2.40 dst=10.128.2.41 sport=8080 dport=27611 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp      6 431995 ESTABLISHED src=10.128.2.41 dst=10.128.2.40 sport=30001 dport=8080 src=10.128.2.40 dst=10.128.2.41 sport=8080 dport=30001 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 968 flow entries have been shown.

Comment 5 errata-xmlrpc 2021-03-25 01:53:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.3 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0821