Bug 1988520
| Summary: | weird CNO cni-bin hackery breaks with RHEL 9 and/or crio 1.22 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Dan Winship <danw> |
| Component: | Networking | Assignee: | Colin Walters <walters> |
| Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
| Status: | VERIFIED --- | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | anbhat, astoycos, miabbott, trozet, walters |
| Version: | 4.8 | Flags: | ffernand:
needinfo?
(walters) |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I am 83% sure the accepted 4.9 nightly I based my test release image on: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.9.0-0.nightly/release/4.9.0-0.nightly-2021-07-30-090713 has this same setup of kubelet 1.21 and cri-o 1.22. So that does seem to narrow it down potentially to a kernel behavior change. Slightly related PR https://github.com/openshift/cluster-network-operator/pull/1169 This was merged a long time ago: https://github.com/openshift/cluster-network-operator/pull/1172/ this PR already merged for long time. Move to verified. |
In an early test of OCP on "RHEL 9" (but really CentOS Stream), with kubelet 1.21 and cri-o 1.22, sdn fails to start with: message: | container create failed: openat2 `host/opt/cni`: No such file or directory reason: CreateContainerError This appears to be related to the weird nested mountpoints in the pod: host / -> pod /host host /var/lib/cni/bin -> pod /host/opt/cni/bin This seems kind of weird but I don't remember if there's a good reason for it? Patching the daemonset to remove the host-cni-bin mount and have it just copy its CNI binary to /host/var/lib/cni/bin/ instead of /host/opt/cni/bin/ seems to fix the problem. (Assuming this is the right fix then multus, ovn-kubernetes, and kuryr probably will need the same fix. Although maybe not multus I guess since that seemed to have no problems in this cluster.) It is not clear if the change in behavior is in RHEL 9 or in cri-o 1.22...