Bug 1975826 - ovn-kubernetes host directed traffic cannot be offloaded as CT zone 64000 is not established
Summary: ovn-kubernetes host directed traffic cannot be offloaded as CT zone 64000 is ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Tim Rozet
QA Contact: Ying Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-24 14:00 UTC by Alaa Hleihel (NVIDIA Mellanox)
Modified: 2022-03-10 16:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:04:21 UTC
Target Upstream Version:
Embargoed:
yingwang: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 831 0 None Merged Revert "[DownstreamMerge] Fix previous downstream merge" 2021-12-01 16:14:55 UTC
Github ovn-org ovn-kubernetes pull 2330 0 None None None 2021-08-17 19:46:41 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:04:33 UTC

Description Alaa Hleihel (NVIDIA Mellanox) 2021-06-24 14:00:53 UTC
Description of problem:

Today pod network traffic to k8s workers (host network)
is sent through GR to be SNATed and then to the underlay.
on the receiving end, traffic is sent to CT on the
physical bridge and forwarded to the host
i.e LOCAL port or host representor port in case of Smart-NICs.

By design, the conntrack state on zone 64000 is never
established for host destined traffic which may prevent
offload capable NICs from offloading this type of traffic.

Branch: Master

Comment 2 Adrian Chiris 2021-06-24 14:52:59 UTC
PR:  https://github.com/ovn-org/ovn-kubernetes/pull/2277 proposes a partial solution to the issue for pod to host network traffic.

Comment 3 Tim Rozet 2021-08-17 22:37:09 UTC
will retarget to 4.10

Comment 4 Marcelo Ricardo Leitner 2021-11-09 21:51:04 UTC
Do we need to set Target Release to 4.10 now? I see the MR was merged upstream, https://github.com/openshift/ovn-kubernetes/pull/796 .

Comment 6 Ying Wang 2021-12-06 10:12:04 UTC
Verified this bug on 4.10.0-0.nightly-2021-12-03-213835.

1. Create one hostnetwork service with json file as below,

{
  "apiVersion": "v1",
  "kind": "List",
  "items": [
    {
      "kind": "Pod",
      "apiVersion": "v1",
      "metadata": {
        "name": "http-nodeport-hostnetwork-server",
        "labels": {
          "name": "http-nodeport-hostnetwork-server"
        }
      },
      "spec": {
        "hostNetwork": true,
        "containers": [
          {
           "image": "quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95",
           "name": "http-container",
           "imagePullPolicy": "IfNotPresent", 
           "ports": [
              { "hostPort": 8080,
                "containerPort": 8080
              }
            ]
          }
        ]
      }
    },
    {
      "kind": "Service",
      "apiVersion": "v1",
      "metadata": {
        "name": "http-nodeport-hostnetwork-server",
        "labels": {
          "name": "http-nodeport-hostnetwork-server"
        }
      },
      "spec": {
        "ports": [
          {
            "name": "http",
            "protocol": "TCP",
            "port": 27017,
            "targetPort": 8080
          }
        ],
        "type": "NodePort",
        "selector": {
          "name": "http-nodeport-hostnetwork-server"
        }
      }
    }
  ]
}

2. login one worker node and curl to the hostnetwork service with node ip + node port

sh-4.4# curl 10.0.74.72:30825
Hello OpenShift!

3. check contrack, which includes item to zone 64000
sh-4.4# conntrack -L | grep 30825
tcp      6 117 TIME_WAIT src=10.0.54.55 dst=10.0.74.72 sport=44922 dport=30825 src=10.0.74.72 dst=10.0.54.55 sport=30825 dport=44922 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 zone=64000 use=1
tcp      6 117 TIME_WAIT src=10.0.54.55 dst=10.0.74.72 sport=44922 dport=30825 src=10.0.74.72 dst=10.0.54.55 sport=30825 dport=44922 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1125 flow entries have been shown.

4. Did contract testing on version 4.9.0, the conntrack is as below, which doesn't have the item to zone 64000.
sh-4.4# conntrack -L | grep 31105
tcp      6 99 TIME_WAIT src=10.0.183.190 dst=10.0.158.206 sport=38344 dport=31105 src=10.0.158.206 dst=10.0.183.190 sport=31105 dport=38344 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1101 flow entries have been shown.

@adrianc, please help confirm if it is enough to verify this bug.

Comment 7 zhaozhanqi 2021-12-08 03:06:19 UTC
@trozet Could you help give advice for above comment if it's enough for verifying this bug, thanks.

Comment 8 Adrian Chiris 2021-12-12 08:00:44 UTC
 > login one worker node and curl to the hostnetwork service with node ip + node port

I assume thats the worker's node IP.
This scenario seems OK, just need to ensure that on the receiving end connection is established on zone 64000

Comment 9 Ying Wang 2021-12-16 09:13:52 UTC
Thanks Adrian, I checked on receiving end, connection is established on zone 64000.

sh-4.4# conntrack -L | grep 10.0.128.6 
tcp      6 290 ESTABLISHED src=10.0.128.5 dst=10.0.128.6 sport=10250 dport=56044 src=10.0.128.6 dst=10.0.128.5 sport=56044 dport=10250 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 zone=64000 use=1

Comment 12 errata-xmlrpc 2022-03-10 16:04:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.