Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2093826

Summary: Pods with OVN hardware offloading enabled interface fail to start
Product: OpenShift Container Platform Reporter: Peng Liu <pliu>
Component: NetworkingAssignee: Peng Liu <pliu>
Networking sub component: ovn-kubernetes QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: zshi
Version: 4.11   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:49:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peng Liu 2022-06-06 07:40:02 UTC
Description of problem:
Pods with OVN hardware offloading enabled interface fail to start on hosts with DPU installed. It's stuck in 'ContainerCreating' forever.

Version-Release number of selected component (if applicable):


How reproducible:
Create a pod with an OVN hardware offloading interface


```
apiVersion: v1
kind: Pod
metadata:
  name: pod-bf-1
  annotations:
    v1.multus-cni.io/default-network: default/default
spec:
  nodeSelector:
   kubernetes.io/hostname: tenant-worker-45
  containers:
  - name: appcntr1
    image: quay.io/zshi/centos:iperf
    command: ['/bin/sh', '-c', 'sleep infinity']
    resources:
      requests:
        openshift.io/mlnx_bf: '1'
      limits:
        openshift.io/mlnx_bf: '1'
```


Actual results:
The pod is stuck in 'ContainerCreating' forever.

Find following error message from the container ovn-controller of pod ovnkube-node on DPU.

```
2022-06-02T08:36:32Z|00031|binding|INFO|Not claiming lport default_pod-bf-1, chassis 76a8f7b3-0fe2-4d5e-8c84-880327b7f483 requested-chassis 42f6257b-327b-41d7-b13e-4a32ec18b45a
```

Expected results:
The pod can be started with SRIOV VF attached.


Additional info:

Comment 1 Peng Liu 2022-06-06 07:48:14 UTC
In the OVN sbdb, there are 2 chassis that have the same hostname. One chassis was added when the ovn-controller was running on x86 host, and the other was added by the ovn-controller running on DPU. The first one shall be removed when we move the ovnkube-node function from the host to the DPU. After remove the chassis manually from the ovn sbdb, the pod can be successfully created on 45.

There are two options to solve this issue: 1) Adding host chassis existence logic to the startup script of ovn-controller. If it exists, remove it before starting the ovn-controller on DPU; 2) add reconciliation in onvkube, to prevent more than one chassis are created in ovn logical topology.

Comment 2 Peng Liu 2022-06-08 06:17:08 UTC
Upstream PR https://github.com/ovn-org/ovn-kubernetes/pull/3026

Comment 3 Peng Liu 2022-09-01 08:26:29 UTC
Merged with downstream PR https://github.com/openshift/ovn-kubernetes/pull/1253

Comment 6 zhaozhanqi 2023-01-10 11:04:57 UTC
this issue already be merged for long time and retest multi time and works well.  Move it to verify.

Comment 8 errata-xmlrpc 2023-01-17 19:49:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399