Bug 1972287 - [mlx5] traffic from Node port is not offloaded
Summary: [mlx5] traffic from Node port is not offloaded
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Tim Rozet
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-15 15:29 UTC by Alaa Hleihel (NVIDIA Mellanox)
Modified: 2021-10-18 17:34 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:34:18 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 579 0 None open Bug 1972287: 6-17-21 merge 2021-06-17 21:07:13 UTC
Github ovn-org ovn-kubernetes pull 2261 0 None closed Shared Gateway Node Port Skip commit to CT-Zone 64000 2021-06-16 15:15:32 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:34:45 UTC

Description Alaa Hleihel (NVIDIA Mellanox) 2021-06-15 15:29:23 UTC
Description of problem:

In Kubernetes with OVN-Kubernetes, When running external to node port service backed by endpoint which serves on Pod network, incoming traffic will be sent to OVN via patch port without going via host CT zone 64000 on the physical bridge. However, returning traffic originating from OVN will be sent to CT zone 64000 before being sent to the wire. This causes connection to not establish on zone 64000. which prevents offload of this type of traffic.


Version-Release number of selected component (if applicable):
OVN-Kubernetes Master branch at 
https://github.com/ovn-org/ovn-kubernetes/commit/6fa30a689b6552d69633a262fef4dee6d7495db2



Proposed fix for this issue:
https://github.com/ovn-org/ovn-kubernetes/pull/2261 

Please review, we need your cooperation to solve this together.

Comment 2 Dan Winship 2021-06-18 13:11:27 UTC
I needed a more generic bug for the ovn-kube rebase (so I could make another bug depend on it for a different upstream bugfix), so I'm making this bug depend on that bug. If the intention was to fix this in 4.8.z then this bug can be moved to 4.8.z now since there's now a separate 4.9 bug for it to depend on.

Comment 3 Dan Winship 2021-06-21 13:26:56 UTC
(the bot glitched and failed to move this bug to MODIFIED)

Comment 6 zhaozhanqi 2021-06-22 12:25:13 UTC
Verified this bug on 4.9.0-0.nightly-2021-06-21-191858

Create one nodeport service 

1. apply the following pod and nodeport service

---
kind: Service
apiVersion: v1
metadata:
  name: hello-pod
  labels:
    name: hello-pod
spec:
  ports:
  - name: http
    protocol: TCP
    port: 27017
    nodePort: 30002
    targetPort: 8080
  type: NodePort
  selector:
    name: hello-pod

---
kind: Pod
apiVersion: v1
metadata:
  name: hello-pod
  labels:
    name: hello-pod
spec:
  containers:
  - name: hello-pod
    image: quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95
    ports:
    - containerPort: 8080

2. Access the nodePort service by nodeip:30002

$ oc debug node/ip-10-0-134-141.us-east-2.compute.internal
Creating debug namespace/openshift-debug-node-jngj9 ...
Starting pod/ip-10-0-134-141us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.134.141
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# curl 10.0.140.99:30002
Hello OpenShift!


3. Check the conntrack -L | grep 30002 on worker

sh-4.4# conntrack -L | grep 30002
tcp      6 110 TIME_WAIT src=10.0.134.141 dst=10.0.140.99 sport=37632 dport=30002 src=10.128.2.73 dst=10.0.134.141 sport=8080 dport=37632 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=23 use=1
conntrack v1.4.4 (conntrack-tools): 770 flow entries have been shown.
sh-4.4# 


Did not see 'zone=64000' , like 'tcp      6 117 FIN_WAIT src=10.73.116.62 dst=10.73.116.58 sport=30002 dport=52468 [UNREPLIED] src=10.73.116.58 dst=10.73.116.62 sport=52468 dport=30002 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64000 use=1'


@Alaa Could you help confirm if the steps are enough to verified this bug?

Comment 7 Alaa Hleihel (NVIDIA Mellanox) 2021-06-22 12:45:56 UTC
(In reply to zhaozhanqi from comment #6)
> Verified this bug on 4.9.0-0.nightly-2021-06-21-191858
> 
> Create one nodeport service 
> 
> 1. apply the following pod and nodeport service
> 
> ---
> kind: Service
> apiVersion: v1
> metadata:
>   name: hello-pod
>   labels:
>     name: hello-pod
> spec:
>   ports:
>   - name: http
>     protocol: TCP
>     port: 27017
>     nodePort: 30002
>     targetPort: 8080
>   type: NodePort
>   selector:
>     name: hello-pod
> 
> ---
> kind: Pod
> apiVersion: v1
> metadata:
>   name: hello-pod
>   labels:
>     name: hello-pod
> spec:
>   containers:
>   - name: hello-pod
>     image:
> quay.io/openshifttest/hello-sdn@sha256:
> d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95
>     ports:
>     - containerPort: 8080
> 
> 2. Access the nodePort service by nodeip:30002
> 
> $ oc debug node/ip-10-0-134-141.us-east-2.compute.internal
> Creating debug namespace/openshift-debug-node-jngj9 ...
> Starting pod/ip-10-0-134-141us-east-2computeinternal-debug ...
> To use host binaries, run `chroot /host`
> Pod IP: 10.0.134.141
> If you don't see a command prompt, try pressing enter.
> sh-4.4# chroot /host
> sh-4.4# curl 10.0.140.99:30002
> Hello OpenShift!
> 
> 
> 3. Check the conntrack -L | grep 30002 on worker
> 
> sh-4.4# conntrack -L | grep 30002
> tcp      6 110 TIME_WAIT src=10.0.134.141 dst=10.0.140.99 sport=37632
> dport=30002 src=10.128.2.73 dst=10.0.134.141 sport=8080 dport=37632
> [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=23 use=1
> conntrack v1.4.4 (conntrack-tools): 770 flow entries have been shown.
> sh-4.4# 
> 
> 
> Did not see 'zone=64000' , like 'tcp      6 117 FIN_WAIT src=10.73.116.62
> dst=10.73.116.58 sport=30002 dport=52468 [UNREPLIED] src=10.73.116.58
> dst=10.73.116.62 sport=52468 dport=30002 mark=0
> secctx=system_u:object_r:unlabeled_t:s0 zone=64000 use=1'
> 
> 
> @Alaa Could you help confirm if the steps are enough to verified this bug?

Hi @adrianc@nvidia.com , please advise.

Comment 8 Adrian Chiris 2021-06-27 06:30:11 UTC
Hi,
with this change, nodePort related traffic, endpoint on pod network, should not go through zone 64000.

Comment 9 zhaozhanqi 2021-06-28 02:05:10 UTC
yes, thanks for your confirm,  according to my testing above, there is no traffic go through zone 64000. 

Move this bug to verified.

Comment 12 errata-xmlrpc 2021-10-18 17:34:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.