We hit an issue during the upgrade of the build01 cluster related to this, and our current theory is that:
In OpenShift 4.5, the MCO wrote /etc/crio/crio.conf, but subsequent versions stopped doing so. Due to ostree config semantics, this "orphaned" file will persist, and because it contains the now-broken config `conmon = /usr/libexec/crio/conmon`, cri-o will fail to start.
The likely fix here will be for:
- crio to drop that config file as it is no longer necessary
- MCO to remove it from existing systems
In the interim, a known workaround for a broken node is to manually apply the change from https://github.com/openshift/machine-config-operator/commit/723a8a4992f42530af95202e51e5a940d2a3d169 via e.g. ssh.
Upstream fix is https://bugzilla.redhat.com/show_bug.cgi?id=1995785. Need to upgrade from 4.5 or earlier to see it.
OpenShift engineering has decided to NOT ship 4.8.6 on 8/23 due to the following issue.
All the fixes part will be now included in 4.8.7 on 8/30.
The set of clusters exposed to bug 1995785 is narrower, and we're working it in that bug, so moving this one back to VERIFIED to avoid trying to track it in two places.
Actually MODIFIED, to ensure we get swept back into the errata.
Verified on 4.7.0-0.nightly-2021-08-21-153346 via https://bugzilla.redhat.com/show_bug.cgi?id=1995809
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.8.9 bug fix), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.