Bug 1988520 - weird CNO cni-bin hackery breaks with RHEL 9 and/or crio 1.22
Summary: weird CNO cni-bin hackery breaks with RHEL 9 and/or crio 1.22
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Colin Walters
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-30 17:49 UTC by Dan Winship
Modified: 2024-08-29 04:25 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-04-30 18:04:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1172 0 None None None 2021-08-02 14:02:44 UTC

Internal Links: 1988161

Description Dan Winship 2021-07-30 17:49:00 UTC
In an early test of OCP on "RHEL 9" (but really CentOS Stream), with kubelet 1.21 and cri-o 1.22, sdn fails to start with:

        message: |
          container create failed: openat2 `host/opt/cni`: No such file or directory
        reason: CreateContainerError

This appears to be related to the weird nested mountpoints in the pod:

  host /                -> pod /host
  host /var/lib/cni/bin -> pod /host/opt/cni/bin

This seems kind of weird but I don't remember if there's a good reason for it?

Patching the daemonset to remove the host-cni-bin mount and have it just copy its CNI binary to /host/var/lib/cni/bin/ instead of /host/opt/cni/bin/ seems to fix the problem. (Assuming this is the right fix then multus, ovn-kubernetes, and kuryr probably will need the same fix. Although maybe not multus I guess since that seemed to have no problems in this cluster.)

It is not clear if the change in behavior is in RHEL 9 or in cri-o 1.22...

Comment 1 Colin Walters 2021-07-30 18:18:22 UTC
I am 83% sure the accepted 4.9 nightly I based my test release image on:

https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.9.0-0.nightly/release/4.9.0-0.nightly-2021-07-30-090713

has this same setup of kubelet 1.21 and cri-o 1.22.  So that does seem to narrow it down potentially to a kernel behavior change.

Comment 2 Colin Walters 2021-08-02 11:47:26 UTC
Slightly related PR https://github.com/openshift/cluster-network-operator/pull/1169

Comment 5 Tim Rozet 2022-11-01 15:23:25 UTC
This was merged a long time ago:
https://github.com/openshift/cluster-network-operator/pull/1172/

Comment 6 zhaozhanqi 2022-11-08 01:16:23 UTC
this PR already merged for long time. Move to verified.

Comment 7 Rory Thrasher 2024-04-30 18:04:53 UTC
OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary

Comment 8 Red Hat Bugzilla 2024-08-29 04:25:07 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.