Bug 2034296 - Kubelet and Crio fails to start during upgrde to 4.7.37
Summary: Kubelet and Crio fails to start during upgrde to 4.7.37
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Qi Wang
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-20 15:46 UTC by Arsalan Irshad
Modified: 2022-08-10 10:41 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 10:40:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github cri-o cri-o pull 5520 0 None open [Bug 2034296] Inherits storage configs from storage.conf if crio config does not set 2022-01-11 18:29:12 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:41:00 UTC

Description Arsalan Irshad 2021-12-20 15:46:33 UTC
Description of problem: Kubelet and Crio fails to start during upgrde to 4.7.37


Version-Release number of selected component (if applicable): 4.7.11


How reproducible:

- Issue is specific to customer's cluster.
- Cluster rollout from 4.7.11 version to 4.7.31 version failing.
- All operators are updated to 4.7.37 version except machine-config operator.
- Master mcp rollout the update successfully but 'infra' and worker' mcp fails   because crio going on dead state and kubelet went of activating state.
- No manual changes are performed on the node and mcp were in available state with node in 'Ready' state prior to upgrade.
- Multiple steps were performed to force the upgrade:

 https://access.redhat.com/solutions/6427321
 https://access.redhat.com/solutions/5350721
 
- Patching the render and force touch is also not working.
- The content change crio.conf file and restart bring the crio and kubelet up but when the machine-config render rollout, it again moves to dead state. 

Steps to Reproduce:
1. Rollout the upgrade from 4.7.11 to 4.7.37.
2. All operators are updated to 4.7.37 version except machine-config operator.
3. Master mcp rollout the update successfully but 'infra' and worker' mcp fails because crio going on dead state and kubelet went of activating state.

Actual results:

- 'Worker' and 'Infra' mcp are in degarded state.

Expected results:

- Nodes should update to latest machine-config render successfully.

Additional info:

Comment 25 MinLi 2022-03-23 10:34:05 UTC
hi, qiwan 

I upgraded from 4.11.0-0.nightly-2022-03-18-211245 to 4.11.0-0.nightly-2022-03-20-160505 successfully. All mcp finished roll out.
# oc get mcp 
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-f5acda2dd482824988bc168633eb5e7c   True      False      False      3              3                   3                     0                      114m
worker   rendered-worker-937ec10365e60b06ef62a6006ec3ab8b   True      False      False      3              3                   3                     0                      114m

But when I check crio config, it shows runroot = "/run/containers/storage" , not runroot = "/var/run/containers/storage" as in storage.conf. Is this expected?
sh-4.4# crio config | grep -i root 
INFO[2022-03-23 10:28:57.702140215Z] Starting CRI-O, version: 1.23.0, git: ()     
INFO Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL 
# Path to the "root directory". CRI-O stores all of its data, including
# root = "/var/lib/containers/storage"
# runroot = "/run/containers/storage"
# If true, the runtime will not use pivot_root, but instead use MS_MOVE.
...

Comment 26 Qi Wang 2022-03-23 21:15:59 UTC
@minmli "/run/containers/storage" is expected, the crio inherits this default from containers/storage package not the storage.conf file on the cluster, it it also documented the default is /run/containers/storage https://github.com/cri-o/cri-o/blob/main/docs/crio.8.md.
Thanks for pointing out this.

Comment 27 MinLi 2022-03-24 01:57:26 UTC
verified according to  Comment 26

Comment 29 errata-xmlrpc 2022-08-10 10:40:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.