Bug 1993385

Summary: failed to start cri-o service due to /usr/libexec/crio/conmon is missing
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Mike Fiedler <mifiedle>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aos-bugs, bparees, cblecker, dornelas, dseals, gpei, jchaloup, jiwei, jligon, krmoser, kuiwang, miabbott, mifiedle, mko, mnguyen, mrussell, nstielau, stbenjam, walters, wking
Version: 4.9Keywords: FastFix, TestBlocker
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-31 16:17:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1992557    
Bug Blocks: 1993119, 1993386    

Comment 3 Colin Walters 2021-08-19 17:42:36 UTC
We hit an issue during the upgrade of the build01 cluster related to this, and our current theory is that:

In OpenShift 4.5, the MCO wrote /etc/crio/crio.conf, but subsequent versions stopped doing so.  Due to ostree config semantics, this "orphaned" file will persist, and because it contains the now-broken config `conmon = /usr/libexec/crio/conmon`, cri-o will fail to start.

The likely fix here will be for:

- crio to drop that config file as it is no longer necessary
- MCO to remove it from existing systems

In the interim, a known workaround for a broken node is to manually apply the change from https://github.com/openshift/machine-config-operator/commit/723a8a4992f42530af95202e51e5a940d2a3d169 via e.g. ssh.

Comment 4 Mike Fiedler 2021-08-19 18:50:00 UTC
Upstream fix is https://bugzilla.redhat.com/show_bug.cgi?id=1995785.    Need to upgrade from 4.5 or earlier to see it.

Comment 5 ximhan 2021-08-20 07:26:57 UTC
OpenShift engineering has decided to NOT ship 4.8.6 on 8/23 due to the following issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1995785
All the fixes part will be now included in 4.8.7 on 8/30.

Comment 7 W. Trevor King 2021-08-20 16:57:30 UTC
The set of clusters exposed to bug 1995785 is narrower, and we're working it in that bug, so moving this one back to VERIFIED to avoid trying to track it in two places.

Comment 8 W. Trevor King 2021-08-20 16:58:32 UTC
Actually MODIFIED, to ensure we get swept back into the errata.

Comment 10 Mike Fiedler 2021-08-23 12:24:11 UTC
Verified on 4.7.0-0.nightly-2021-08-21-153346 via https://bugzilla.redhat.com/show_bug.cgi?id=1995809

Comment 13 errata-xmlrpc 2021-08-31 16:17:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.9 bug fix), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3247

Comment 14 Peter Hunt 2021-11-08 17:05:41 UTC
*** Bug 2021256 has been marked as a duplicate of this bug. ***