Bug 1988520

Summary: weird CNO cni-bin hackery breaks with RHEL 9 and/or crio 1.22
Product: OpenShift Container Platform Reporter: Dan Winship <danw>
Component: NetworkingAssignee: Colin Walters <walters>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED WONTFIX Docs Contact:
Severity: low    
Priority: low CC: anbhat, astoycos, miabbott, trozet, walters
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-04-30 18:04:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Winship 2021-07-30 17:49:00 UTC
In an early test of OCP on "RHEL 9" (but really CentOS Stream), with kubelet 1.21 and cri-o 1.22, sdn fails to start with:

        message: |
          container create failed: openat2 `host/opt/cni`: No such file or directory
        reason: CreateContainerError

This appears to be related to the weird nested mountpoints in the pod:

  host /                -> pod /host
  host /var/lib/cni/bin -> pod /host/opt/cni/bin

This seems kind of weird but I don't remember if there's a good reason for it?

Patching the daemonset to remove the host-cni-bin mount and have it just copy its CNI binary to /host/var/lib/cni/bin/ instead of /host/opt/cni/bin/ seems to fix the problem. (Assuming this is the right fix then multus, ovn-kubernetes, and kuryr probably will need the same fix. Although maybe not multus I guess since that seemed to have no problems in this cluster.)

It is not clear if the change in behavior is in RHEL 9 or in cri-o 1.22...

Comment 1 Colin Walters 2021-07-30 18:18:22 UTC
I am 83% sure the accepted 4.9 nightly I based my test release image on:

https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.9.0-0.nightly/release/4.9.0-0.nightly-2021-07-30-090713

has this same setup of kubelet 1.21 and cri-o 1.22.  So that does seem to narrow it down potentially to a kernel behavior change.

Comment 2 Colin Walters 2021-08-02 11:47:26 UTC
Slightly related PR https://github.com/openshift/cluster-network-operator/pull/1169

Comment 5 Tim Rozet 2022-11-01 15:23:25 UTC
This was merged a long time ago:
https://github.com/openshift/cluster-network-operator/pull/1172/

Comment 6 zhaozhanqi 2022-11-08 01:16:23 UTC
this PR already merged for long time. Move to verified.

Comment 7 Rory Thrasher 2024-04-30 18:04:53 UTC
OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary

Comment 8 Red Hat Bugzilla 2024-08-29 04:25:07 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days