Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1995809

Summary: long living clusters may fail to upgrade because of an invalid conmon path
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Mike Fiedler <mifiedle>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aos-bugs, kgarriso, kuiwang, mifiedle, pehunt, schoudha, smilner, wking
Version: 4.7Keywords: FastFix, Regression, Upgrades
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1995785
: 1995810 (view as bug list) Environment:
Last Closed: 2021-08-31 16:17:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1995785    
Bug Blocks: 1995810    

Description W. Trevor King 2021-08-19 19:34:09 UTC
+++ This bug was initially created as a clone of Bug #1995785 +++

Description of problem:
Another step of the fallout of https://bugzilla.redhat.com/show_bug.cgi?id=1993385 includes an interesting interaction between rpm-ostree and older versions of MCO. If a cluster was ever at a version where the MCO configured /etc/crio/crio.conf (4.5 or earlier), then updates to the cri-o rpm won't update the crio.conf file (in ways like updating the conmon path). Since the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1993385 only updated MCO to *not* specify the conmon  path (thinking it would leave it to the CRI-O default of "") in the drop in template, the pre-existing value in /etc/crio/crio.conf (unchanged from fixing the rpm) would prevail, causing cri-o to expect conmon to be at /usr/libexec/crio/conmon, which no longer exists. This causes nodes to not come up

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. upgrade a node from 4.5->affectected versions (going through each minor version)
2. notice cri-o does not come up in similar ways to https://bugzilla.redhat.com/show_bug.cgi?id=1993385


Actual results:
the node does not come up

Expected results:
the node starts

Additional info:

Comment 2 Mike Fiedler 2021-08-21 17:19:27 UTC
Verified on 4.8.0-0.nightly-2021-08-21-050932

1. Install 4.7.24
2. oc debug to a worker and edit /etc/crio/crio.conf and make some changes (I changed loglevel and turned metrics on) and save the file
3. Create a containerruntime config with the following contents

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: set-pids-limit
spec:
 machineConfigPoolSelector:
   matchLabels:
     custom-crio: high-pid-limit
 containerRuntimeConfig:
   pidsLimit: 2048


4. oc label machineconfigpool worker custom-crio=high-pid-limit
5. oc get mcp worker -w and watch for all workers to be ready
6. oc adm upgrade --force --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-08-21-050932

- verify upgrade successful
- oc debug to the node where crio.conf was modified and verify customizations are still in place
- crio config | grep conmon and verify value is "" and not /usr/libexec/crio/conmon

Comment 6 errata-xmlrpc 2021-08-31 16:17:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.9 bug fix), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3247