Bug 1993385 - failed to start cri-o service due to /usr/libexec/crio/conmon is missing
Summary: failed to start cri-o service due to /usr/libexec/crio/conmon is missing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.9
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.z
Assignee: Peter Hunt
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On: 1992557
Blocks: 1993119 1993386
TreeView+ depends on / blocked
 
Reported: 2021-08-12 22:10 UTC by OpenShift BugZilla Robot
Modified: 2021-08-31 16:17 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-31 16:17:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2714 0 None None None 2021-08-16 18:21:43 UTC
Red Hat Product Errata RHBA-2021:3247 0 None None None 2021-08-31 16:17:41 UTC

Comment 3 Colin Walters 2021-08-19 17:42:36 UTC
We hit an issue during the upgrade of the build01 cluster related to this, and our current theory is that:

In OpenShift 4.5, the MCO wrote /etc/crio/crio.conf, but subsequent versions stopped doing so.  Due to ostree config semantics, this "orphaned" file will persist, and because it contains the now-broken config `conmon = /usr/libexec/crio/conmon`, cri-o will fail to start.

The likely fix here will be for:

- crio to drop that config file as it is no longer necessary
- MCO to remove it from existing systems

In the interim, a known workaround for a broken node is to manually apply the change from https://github.com/openshift/machine-config-operator/commit/723a8a4992f42530af95202e51e5a940d2a3d169 via e.g. ssh.

Comment 4 Mike Fiedler 2021-08-19 18:50:00 UTC
Upstream fix is https://bugzilla.redhat.com/show_bug.cgi?id=1995785.    Need to upgrade from 4.5 or earlier to see it.

Comment 5 ximhan 2021-08-20 07:26:57 UTC
OpenShift engineering has decided to NOT ship 4.8.6 on 8/23 due to the following issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1995785
All the fixes part will be now included in 4.8.7 on 8/30.

Comment 7 W. Trevor King 2021-08-20 16:57:30 UTC
The set of clusters exposed to bug 1995785 is narrower, and we're working it in that bug, so moving this one back to VERIFIED to avoid trying to track it in two places.

Comment 8 W. Trevor King 2021-08-20 16:58:32 UTC
Actually MODIFIED, to ensure we get swept back into the errata.

Comment 10 Mike Fiedler 2021-08-23 12:24:11 UTC
Verified on 4.7.0-0.nightly-2021-08-21-153346 via https://bugzilla.redhat.com/show_bug.cgi?id=1995809

Comment 13 errata-xmlrpc 2021-08-31 16:17:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.9 bug fix), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3247


Note You need to log in before you can comment on or make changes to this bug.