Bug 1972287

Summary: [mlx5] traffic from Node port is not offloaded
Product: OpenShift Container Platform Reporter: Alaa Hleihel (NVIDIA Mellanox) <ahleihel>
Component: NetworkingAssignee: Tim Rozet <trozet>
Networking sub component: ovn-kubernetes QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aconstan, adrianc, ahleihel, ctrautma, danw, dceara, jiji, lariel, mleitner, trozet
Version: 4.9   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:34:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alaa Hleihel (NVIDIA Mellanox) 2021-06-15 15:29:23 UTC
Description of problem:

In Kubernetes with OVN-Kubernetes, When running external to node port service backed by endpoint which serves on Pod network, incoming traffic will be sent to OVN via patch port without going via host CT zone 64000 on the physical bridge. However, returning traffic originating from OVN will be sent to CT zone 64000 before being sent to the wire. This causes connection to not establish on zone 64000. which prevents offload of this type of traffic.


Version-Release number of selected component (if applicable):
OVN-Kubernetes Master branch at 
https://github.com/ovn-org/ovn-kubernetes/commit/6fa30a689b6552d69633a262fef4dee6d7495db2



Proposed fix for this issue:
https://github.com/ovn-org/ovn-kubernetes/pull/2261 

Please review, we need your cooperation to solve this together.

Comment 2 Dan Winship 2021-06-18 13:11:27 UTC
I needed a more generic bug for the ovn-kube rebase (so I could make another bug depend on it for a different upstream bugfix), so I'm making this bug depend on that bug. If the intention was to fix this in 4.8.z then this bug can be moved to 4.8.z now since there's now a separate 4.9 bug for it to depend on.

Comment 3 Dan Winship 2021-06-21 13:26:56 UTC
(the bot glitched and failed to move this bug to MODIFIED)

Comment 6 zhaozhanqi 2021-06-22 12:25:13 UTC
Verified this bug on 4.9.0-0.nightly-2021-06-21-191858

Create one nodeport service 

1. apply the following pod and nodeport service

---
kind: Service
apiVersion: v1
metadata:
  name: hello-pod
  labels:
    name: hello-pod
spec:
  ports:
  - name: http
    protocol: TCP
    port: 27017
    nodePort: 30002
    targetPort: 8080
  type: NodePort
  selector:
    name: hello-pod

---
kind: Pod
apiVersion: v1
metadata:
  name: hello-pod
  labels:
    name: hello-pod
spec:
  containers:
  - name: hello-pod
    image: quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95
    ports:
    - containerPort: 8080

2. Access the nodePort service by nodeip:30002

$ oc debug node/ip-10-0-134-141.us-east-2.compute.internal
Creating debug namespace/openshift-debug-node-jngj9 ...
Starting pod/ip-10-0-134-141us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.134.141
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# curl 10.0.140.99:30002
Hello OpenShift!


3. Check the conntrack -L | grep 30002 on worker

sh-4.4# conntrack -L | grep 30002
tcp      6 110 TIME_WAIT src=10.0.134.141 dst=10.0.140.99 sport=37632 dport=30002 src=10.128.2.73 dst=10.0.134.141 sport=8080 dport=37632 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=23 use=1
conntrack v1.4.4 (conntrack-tools): 770 flow entries have been shown.
sh-4.4# 


Did not see 'zone=64000' , like 'tcp      6 117 FIN_WAIT src=10.73.116.62 dst=10.73.116.58 sport=30002 dport=52468 [UNREPLIED] src=10.73.116.58 dst=10.73.116.62 sport=52468 dport=30002 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64000 use=1'


@Alaa Could you help confirm if the steps are enough to verified this bug?

Comment 7 Alaa Hleihel (NVIDIA Mellanox) 2021-06-22 12:45:56 UTC
(In reply to zhaozhanqi from comment #6)
> Verified this bug on 4.9.0-0.nightly-2021-06-21-191858
> 
> Create one nodeport service 
> 
> 1. apply the following pod and nodeport service
> 
> ---
> kind: Service
> apiVersion: v1
> metadata:
>   name: hello-pod
>   labels:
>     name: hello-pod
> spec:
>   ports:
>   - name: http
>     protocol: TCP
>     port: 27017
>     nodePort: 30002
>     targetPort: 8080
>   type: NodePort
>   selector:
>     name: hello-pod
> 
> ---
> kind: Pod
> apiVersion: v1
> metadata:
>   name: hello-pod
>   labels:
>     name: hello-pod
> spec:
>   containers:
>   - name: hello-pod
>     image:
> quay.io/openshifttest/hello-sdn@sha256:
> d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95
>     ports:
>     - containerPort: 8080
> 
> 2. Access the nodePort service by nodeip:30002
> 
> $ oc debug node/ip-10-0-134-141.us-east-2.compute.internal
> Creating debug namespace/openshift-debug-node-jngj9 ...
> Starting pod/ip-10-0-134-141us-east-2computeinternal-debug ...
> To use host binaries, run `chroot /host`
> Pod IP: 10.0.134.141
> If you don't see a command prompt, try pressing enter.
> sh-4.4# chroot /host
> sh-4.4# curl 10.0.140.99:30002
> Hello OpenShift!
> 
> 
> 3. Check the conntrack -L | grep 30002 on worker
> 
> sh-4.4# conntrack -L | grep 30002
> tcp      6 110 TIME_WAIT src=10.0.134.141 dst=10.0.140.99 sport=37632
> dport=30002 src=10.128.2.73 dst=10.0.134.141 sport=8080 dport=37632
> [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=23 use=1
> conntrack v1.4.4 (conntrack-tools): 770 flow entries have been shown.
> sh-4.4# 
> 
> 
> Did not see 'zone=64000' , like 'tcp      6 117 FIN_WAIT src=10.73.116.62
> dst=10.73.116.58 sport=30002 dport=52468 [UNREPLIED] src=10.73.116.58
> dst=10.73.116.62 sport=52468 dport=30002 mark=0
> secctx=system_u:object_r:unlabeled_t:s0 zone=64000 use=1'
> 
> 
> @Alaa Could you help confirm if the steps are enough to verified this bug?

Hi @adrianc , please advise.

Comment 8 Adrian Chiris 2021-06-27 06:30:11 UTC
Hi,
with this change, nodePort related traffic, endpoint on pod network, should not go through zone 64000.

Comment 9 zhaozhanqi 2021-06-28 02:05:10 UTC
yes, thanks for your confirm,  according to my testing above, there is no traffic go through zone 64000. 

Move this bug to verified.

Comment 12 errata-xmlrpc 2021-10-18 17:34:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759