Bug 1903408

Summary: NodePort externalTrafficPolicy does not work for ovn-kubernetes
Product: OpenShift Container Platform Reporter: shishika
Component: NetworkingAssignee: Andrew Stoycos <astoycos>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: anbhat, bbennett, dcbw, djuran, mapandey, mateusz.bacal, mmasters, moddi, palonsor, suc, vpickard, zzhao
Version: 4.6   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:02:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1927540    
Bug Blocks: 2060542, 2079517    

Description shishika 2020-12-02 02:41:20 UTC
Description of problem:

I have a customer using the ovn-kubernetes network provider and needs to use NodePort, but it doesn't work properly.

Although externalTrafficPolicy is set to Local, it works as set to Cluster.

Version-Release number of selected component (if applicable):

4.6

How reproducible:

Always

Steps to Reproduce:

1. Use the ovn-kubernetes network provider 

$ oc describe Network.config.openshift.io cluster 
~~~
Status:
  Cluster Network:
    Cidr:               10.128.0.0/14
    Host Prefix:        23
  Cluster Network MTU:  8901
  Network Type:         OVNKubernetes
  Service Network:
    172.30.0.0/16
Events:  <none>

2. Prepare a Pod and a NodePort with externalTrafficPolicy is set to Local

$ oc get po example -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP            NODE                                              NOMINATED NODE   READINESS GATES
example   1/1     Running   0          9h    10.128.3.37   ip-10-0-133-223.ap-northeast-1.compute.internal   <none>           <none>

$ oc describe svc example
Name:                     example
Namespace:                shishika01
Labels:                   app=hello-openshift
Annotations:              <none>
Selector:                 app=hello-openshift
Type:                     NodePort
IP:                       172.30.72.177
Port:                     <unset>  8080/TCP
TargetPort:               8080/TCP
NodePort:                 <unset>  31488/TCP
Endpoints:                10.128.3.37:8080
Session Affinity:         None
External Traffic Policy:  Local <-----
Events:                   <none>

3. Send request from another node

$ oc debug node/ip-10-0-129-151.ap-northeast-1.compute.internal
sh-4.4# curl ip-10-0-131-240.ap-northeast-1.compute.internal:31488
Hello OpenShift!
sh-4.4# curl ip-10-0-189-124.ap-northeast-1.compute.internal:31488
Hello OpenShift!
sh-4.4# curl ip-10-0-209-113.ap-northeast-1.compute.internal:31488
Hello OpenShift!

Actual results:
Can access from other nodes.

Expected results:
Can't access from other nodes.

Additional info:

Comment 3 Andrew Stoycos 2020-12-22 17:33:35 UTC
I was able to recreate this on a local kind cluster currently investigating a fix.

Comment 6 Dan Williams 2021-03-25 18:13:58 UTC
@astoycos if we don't already have an OVN bug for this, can you make one? and then add a link to that (or an existing bug) as a dependency here?

Comment 7 Andrew Stoycos 2021-03-25 18:29:45 UTC
ACK Just added the RFE

Comment 8 Andrew Stoycos 2021-04-01 19:46:50 UTC
Upstream PR for the feature -> https://github.com/ovn-org/ovn-kubernetes/pull/2136

Comment 12 zhaozhanqi 2021-09-18 13:46:54 UTC
Verified this bug on 4.10.0-0.nightly-2021-09-17-190348



$ oc get svc -n z1
NAME        TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
hello-pod   NodePort   172.30.110.103   <none>        27017:31999/TCP   5h5m


$ oc get pod -n z1 -o wide
NAME        READY   STATUS    RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
hello-pod   1/1     Running   0          5h7m   10.128.2.31   ip-10-0-215-31.us-east-2.compute.internal   <none>           <none>



$ oc debug node/ip-10-0-142-22.us-east-2.compute.internal
W0918 21:44:28.556373   32479 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/ip-10-0-142-22us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.142.22
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host

sh-4.4# curl ip-10-0-159-11.us-east-2.compute.internal:31999
curl: (7) Failed to connect to ip-10-0-159-11.us-east-2.compute.internal port 31999: Connection refused
sh-4.4# curl ip-10-0-182-179.us-east-2.compute.internal:31999
curl: (7) Failed to connect to ip-10-0-182-179.us-east-2.compute.internal port 31999: Connection refused
sh-4.4# curl ip-10-0-215-31.us-east-2.compute.internal:31999
Hello OpenShift!
sh-4.4#

Comment 15 msi_bacalm 2022-01-03 20:43:06 UTC
Will that be backported to earlier release like 4.7+ ?

If not how on OVN we can achieve similar result  NodePort externalTrafficPolicy=Local so it would preserve source IP ?

Comment 17 Mike McKiernan 2022-01-20 16:15:05 UTC
Surya let me know that this BZ lifts a limitation that was added to the 4.9 and earlier release versions of the docs.

This PR is for 4.10 and removes the limitation:

https://github.com/openshift/openshift-docs/pull/40824

Please review by Jan 21 PM Eastern.

Comment 23 Mohamed Mahmoud 2022-02-17 16:17:09 UTC
*** Bug 2039971 has been marked as a duplicate of this bug. ***

Comment 25 errata-xmlrpc 2022-03-10 16:02:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056