Bug 2093826
| Summary: | Pods with OVN hardware offloading enabled interface fail to start | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Peng Liu <pliu> |
| Component: | Networking | Assignee: | Peng Liu <pliu> |
| Networking sub component: | ovn-kubernetes | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | medium | CC: | zshi |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.12.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-17 19:49:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
In the OVN sbdb, there are 2 chassis that have the same hostname. One chassis was added when the ovn-controller was running on x86 host, and the other was added by the ovn-controller running on DPU. The first one shall be removed when we move the ovnkube-node function from the host to the DPU. After remove the chassis manually from the ovn sbdb, the pod can be successfully created on 45. There are two options to solve this issue: 1) Adding host chassis existence logic to the startup script of ovn-controller. If it exists, remove it before starting the ovn-controller on DPU; 2) add reconciliation in onvkube, to prevent more than one chassis are created in ovn logical topology. Merged with downstream PR https://github.com/openshift/ovn-kubernetes/pull/1253 this issue already be merged for long time and retest multi time and works well. Move it to verify. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |
Description of problem: Pods with OVN hardware offloading enabled interface fail to start on hosts with DPU installed. It's stuck in 'ContainerCreating' forever. Version-Release number of selected component (if applicable): How reproducible: Create a pod with an OVN hardware offloading interface ``` apiVersion: v1 kind: Pod metadata: name: pod-bf-1 annotations: v1.multus-cni.io/default-network: default/default spec: nodeSelector: kubernetes.io/hostname: tenant-worker-45 containers: - name: appcntr1 image: quay.io/zshi/centos:iperf command: ['/bin/sh', '-c', 'sleep infinity'] resources: requests: openshift.io/mlnx_bf: '1' limits: openshift.io/mlnx_bf: '1' ``` Actual results: The pod is stuck in 'ContainerCreating' forever. Find following error message from the container ovn-controller of pod ovnkube-node on DPU. ``` 2022-06-02T08:36:32Z|00031|binding|INFO|Not claiming lport default_pod-bf-1, chassis 76a8f7b3-0fe2-4d5e-8c84-880327b7f483 requested-chassis 42f6257b-327b-41d7-b13e-4a32ec18b45a ``` Expected results: The pod can be started with SRIOV VF attached. Additional info: