Bug 1995809 - long living clusters may fail to upgrade because of an invalid conmon path
Summary: long living clusters may fail to upgrade because of an invalid conmon path
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.z
Assignee: Peter Hunt
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On: 1995785
Blocks: 1995810
TreeView+ depends on / blocked
 
Reported: 2021-08-19 19:34 UTC by W. Trevor King
Modified: 2021-08-31 16:18 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1995785
: 1995810 (view as bug list)
Environment:
Last Closed: 2021-08-31 16:17:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2724 0 None None None 2021-08-19 19:41:09 UTC
Red Hat Product Errata RHBA-2021:3247 0 None None None 2021-08-31 16:18:02 UTC

Description W. Trevor King 2021-08-19 19:34:09 UTC
+++ This bug was initially created as a clone of Bug #1995785 +++

Description of problem:
Another step of the fallout of https://bugzilla.redhat.com/show_bug.cgi?id=1993385 includes an interesting interaction between rpm-ostree and older versions of MCO. If a cluster was ever at a version where the MCO configured /etc/crio/crio.conf (4.5 or earlier), then updates to the cri-o rpm won't update the crio.conf file (in ways like updating the conmon path). Since the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1993385 only updated MCO to *not* specify the conmon  path (thinking it would leave it to the CRI-O default of "") in the drop in template, the pre-existing value in /etc/crio/crio.conf (unchanged from fixing the rpm) would prevail, causing cri-o to expect conmon to be at /usr/libexec/crio/conmon, which no longer exists. This causes nodes to not come up

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. upgrade a node from 4.5->affectected versions (going through each minor version)
2. notice cri-o does not come up in similar ways to https://bugzilla.redhat.com/show_bug.cgi?id=1993385


Actual results:
the node does not come up

Expected results:
the node starts

Additional info:

Comment 2 Mike Fiedler 2021-08-21 17:19:27 UTC
Verified on 4.8.0-0.nightly-2021-08-21-050932

1. Install 4.7.24
2. oc debug to a worker and edit /etc/crio/crio.conf and make some changes (I changed loglevel and turned metrics on) and save the file
3. Create a containerruntime config with the following contents

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: set-pids-limit
spec:
 machineConfigPoolSelector:
   matchLabels:
     custom-crio: high-pid-limit
 containerRuntimeConfig:
   pidsLimit: 2048


4. oc label machineconfigpool worker custom-crio=high-pid-limit
5. oc get mcp worker -w and watch for all workers to be ready
6. oc adm upgrade --force --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-08-21-050932

- verify upgrade successful
- oc debug to the node where crio.conf was modified and verify customizations are still in place
- crio config | grep conmon and verify value is "" and not /usr/libexec/crio/conmon

Comment 6 errata-xmlrpc 2021-08-31 16:17:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.9 bug fix), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3247


Note You need to log in before you can comment on or make changes to this bug.