Description of problem: Kubelet and Crio fails to start during upgrde to 4.7.37 Version-Release number of selected component (if applicable): 4.7.11 How reproducible: - Issue is specific to customer's cluster. - Cluster rollout from 4.7.11 version to 4.7.31 version failing. - All operators are updated to 4.7.37 version except machine-config operator. - Master mcp rollout the update successfully but 'infra' and worker' mcp fails because crio going on dead state and kubelet went of activating state. - No manual changes are performed on the node and mcp were in available state with node in 'Ready' state prior to upgrade. - Multiple steps were performed to force the upgrade: https://access.redhat.com/solutions/6427321 https://access.redhat.com/solutions/5350721 - Patching the render and force touch is also not working. - The content change crio.conf file and restart bring the crio and kubelet up but when the machine-config render rollout, it again moves to dead state. Steps to Reproduce: 1. Rollout the upgrade from 4.7.11 to 4.7.37. 2. All operators are updated to 4.7.37 version except machine-config operator. 3. Master mcp rollout the update successfully but 'infra' and worker' mcp fails because crio going on dead state and kubelet went of activating state. Actual results: - 'Worker' and 'Infra' mcp are in degarded state. Expected results: - Nodes should update to latest machine-config render successfully. Additional info:
hi, qiwan I upgraded from 4.11.0-0.nightly-2022-03-18-211245 to 4.11.0-0.nightly-2022-03-20-160505 successfully. All mcp finished roll out. # oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-f5acda2dd482824988bc168633eb5e7c True False False 3 3 3 0 114m worker rendered-worker-937ec10365e60b06ef62a6006ec3ab8b True False False 3 3 3 0 114m But when I check crio config, it shows runroot = "/run/containers/storage" , not runroot = "/var/run/containers/storage" as in storage.conf. Is this expected? sh-4.4# crio config | grep -i root INFO[2022-03-23 10:28:57.702140215Z] Starting CRI-O, version: 1.23.0, git: () INFO Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL # Path to the "root directory". CRI-O stores all of its data, including # root = "/var/lib/containers/storage" # runroot = "/run/containers/storage" # If true, the runtime will not use pivot_root, but instead use MS_MOVE. ...
@minmli "/run/containers/storage" is expected, the crio inherits this default from containers/storage package not the storage.conf file on the cluster, it it also documented the default is /run/containers/storage https://github.com/cri-o/cri-o/blob/main/docs/crio.8.md. Thanks for pointing out this.
verified according to Comment 26
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069