Description of problem: During a cluster upgrade, RHEL node could go Not Ready due to an incompatibility between the kube config and the version of kubelet running on the node. This requires an upgrade of the kubelet by running the RHEL upgrade playbooks. The playbooks install new RPMs which could modify the files managed by MCD and put the node in a Degraded state. Version-Release number of the following components: 4.2 to 4.3 How reproducible: Steps to Reproduce: 1. Install OCP 4.2 2. Upgrade cluster to 4.3 3. RHEL node is Not Ready 4. Upgrade RHEL nodes 5. MCO machine config rollout is blocked due to on disk files do not match config Actual results: RHEL node Not Ready due kube version skew: hyperkube[2508]: F0117 14:48:12.003999 2508 server.go:206] unrecognized feature gate: LegacyNodeRoleBehavior After RHEL upgrade, MCD reporting: content mismatch for file /etc/containers/storage.conf Expected results: Upgrade to complete successfully.
@Russell, will this also fix BZ#1792139 together?
Checked with 4.2 -> 4.4 path for upgrade with proxy under restricted networking cluster verified version: openshift-ansible-4.4.0-202001201746.git.178.e31d324.el7.noarch.rpm # before upgrade $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2020-01-20-195638 True False 115m Cluster version is 4.2.0-0.nightly-2020-01-20-195638 # after trigger `oc adm upgrade` $ oc get nodes -o wide && oc get clusterversion && oc get co NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME wj42bz-bgmlk-compute-0 Ready worker 4h6m v1.17.1 10.0.98.208 <none> Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1 wj42bz-bgmlk-compute-1 Ready worker 4h7m v1.14.6+97c81d00e 10.0.98.35 <none> Red Hat Enterprise Linux CoreOS 42.81.20200114.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.14.11-6.dev.rhaos4.2.git627b85c.el8 wj42bz-bgmlk-control-plane-0 Ready master 4h19m v1.17.1 10.0.98.128 <none> Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1 wj42bz-bgmlk-control-plane-1 Ready master 4h19m v1.17.1 10.0.96.127 <none> Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1 wj42bz-bgmlk-control-plane-2 Ready master 4h19m v1.17.1 10.0.96.156 <none> Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1 wj42bz-bgmlk-rhel-0 NotReady,SchedulingDisabled worker 114m v1.14.6+c383847f6 10.0.96.188 <none> Red Hat Enterprise Linux Server 7.7 (Maipo) 3.10.0-1062.9.1.el7.x86_64 cri-o://1.14.11-9.dev.rhaos4.2.git983e00f.el7 wj42bz-bgmlk-rhel-1 Ready worker 114m v1.14.6+c383847f6 10.0.96.72 <none> Red Hat Enterprise Linux Server 7.7 (Maipo) 3.10.0-1062.9.1.el7.x86_64 cri-o://1.14.11-9.dev.rhaos4.2.git983e00f.el7 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2020-01-20-195638 True True 86m Unable to apply 4.4.0-0.nightly-2020-01-21-012409: the cluster operator monitoring is degraded NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.4.0-0.nightly-2020-01-21-012409 True False False 3h57m cloud-credential 4.4.0-0.nightly-2020-01-21-012409 True False False 4h17m cluster-autoscaler 4.4.0-0.nightly-2020-01-21-012409 True False False 4h7m console 4.4.0-0.nightly-2020-01-21-012409 True False False 44m dns 4.4.0-0.nightly-2020-01-21-012409 True False False 4h13m image-registry 4.4.0-0.nightly-2020-01-21-012409 True False False 3h59m ingress 4.4.0-0.nightly-2020-01-21-012409 True False False 4h4m insights 4.4.0-0.nightly-2020-01-21-012409 True False False 4h14m kube-apiserver 4.4.0-0.nightly-2020-01-21-012409 True False False 4h11m kube-controller-manager 4.4.0-0.nightly-2020-01-21-012409 True False False 4h9m kube-scheduler 4.4.0-0.nightly-2020-01-21-012409 True False False 4h11m kube-storage-version-migrator 4.4.0-0.nightly-2020-01-21-012409 True False False 76m machine-api 4.4.0-0.nightly-2020-01-21-012409 True False False 4h17m machine-config 4.4.0-0.nightly-2020-01-21-012409 True False False 4h12m marketplace 4.4.0-0.nightly-2020-01-21-012409 True False False 43m monitoring 4.4.0-0.nightly-2020-01-21-012409 False True True 47m network 4.4.0-0.nightly-2020-01-21-012409 True True True 4h12m node-tuning 4.4.0-0.nightly-2020-01-21-012409 True False False 76m openshift-apiserver 4.4.0-0.nightly-2020-01-21-012409 True False False 39m openshift-controller-manager 4.4.0-0.nightly-2020-01-21-012409 True False False 4h13m openshift-samples 4.4.0-0.nightly-2020-01-21-012409 True False False 66m operator-lifecycle-manager 4.4.0-0.nightly-2020-01-21-012409 True False False 4h12m operator-lifecycle-manager-catalog 4.4.0-0.nightly-2020-01-21-012409 True False False 4h12m operator-lifecycle-manager-packageserver 4.4.0-0.nightly-2020-01-21-012409 True False False 38m service-ca 4.4.0-0.nightly-2020-01-21-012409 True False False 4h14m service-catalog-apiserver 4.4.0-0.nightly-2020-01-21-012409 True False False 4h10m service-catalog-controller-manager 4.4.0-0.nightly-2020-01-21-012409 True False False 4h10m storage 4.4.0-0.nightly-2020-01-21-012409 True False False 76m # then run upgrade playbook for the cluster, and after that the cluster back to serve $ oc get nodes -o wide && oc get clusterversion && oc get co NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME wj42bz-bgmlk-compute-0 Ready worker 4h52m v1.17.1 10.0.98.208 <none> Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1 wj42bz-bgmlk-compute-1 Ready worker 4h53m v1.17.1 10.0.98.35 <none> Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1 wj42bz-bgmlk-control-plane-0 Ready master 5h5m v1.17.1 10.0.98.128 <none> Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1 wj42bz-bgmlk-control-plane-1 Ready master 5h5m v1.17.1 10.0.96.127 <none> Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1 wj42bz-bgmlk-control-plane-2 Ready master 5h5m v1.17.1 10.0.96.156 <none> Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1 wj42bz-bgmlk-rhel-0 Ready worker 160m v1.17.1 10.0.96.188 <none> Red Hat Enterprise Linux Server 7.7 (Maipo) 3.10.0-1062.9.1.el7.x86_64 cri-o://1.17.0-0.3.rc1.rhaos4.4.gitb89a5fc.el7-rc1 wj42bz-bgmlk-rhel-1 Ready worker 160m v1.17.1 10.0.96.72 <none> Red Hat Enterprise Linux Server 7.7 (Maipo) 3.10.0-1062.9.1.el7.x86_64 cri-o://1.17.0-0.3.rc1.rhaos4.4.gitb89a5fc.el7-rc1 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-01-21-012409 True False 39m Cluster version is 4.4.0-0.nightly-2020-01-21-012409 NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.4.0-0.nightly-2020-01-21-012409 True False False 4h43m cloud-credential 4.4.0-0.nightly-2020-01-21-012409 True False False 5h3m cluster-autoscaler 4.4.0-0.nightly-2020-01-21-012409 True False False 4h52m console 4.4.0-0.nightly-2020-01-21-012409 True False False 89m dns 4.4.0-0.nightly-2020-01-21-012409 True False False 4h59m image-registry 4.4.0-0.nightly-2020-01-21-012409 True False False 39m ingress 4.4.0-0.nightly-2020-01-21-012409 True False False 39m insights 4.4.0-0.nightly-2020-01-21-012409 True False False 5h kube-apiserver 4.4.0-0.nightly-2020-01-21-012409 True False False 4h57m kube-controller-manager 4.4.0-0.nightly-2020-01-21-012409 True False False 4h55m kube-scheduler 4.4.0-0.nightly-2020-01-21-012409 True False False 4h57m kube-storage-version-migrator 4.4.0-0.nightly-2020-01-21-012409 True False False 39m machine-api 4.4.0-0.nightly-2020-01-21-012409 True False False 5h3m machine-config 4.4.0-0.nightly-2020-01-21-012409 True False False 4h58m marketplace 4.4.0-0.nightly-2020-01-21-012409 True False False 89m monitoring 4.4.0-0.nightly-2020-01-21-012409 True False False 17m network 4.4.0-0.nightly-2020-01-21-012409 True False False 4h58m node-tuning 4.4.0-0.nightly-2020-01-21-012409 True False False 122m openshift-apiserver 4.4.0-0.nightly-2020-01-21-012409 True False False 85m openshift-controller-manager 4.4.0-0.nightly-2020-01-21-012409 True False False 4h59m openshift-samples 4.4.0-0.nightly-2020-01-21-012409 True False False 112m operator-lifecycle-manager 4.4.0-0.nightly-2020-01-21-012409 True False False 4h58m operator-lifecycle-manager-catalog 4.4.0-0.nightly-2020-01-21-012409 True False False 4h58m operator-lifecycle-manager-packageserver 4.4.0-0.nightly-2020-01-21-012409 True False False 84m service-ca 4.4.0-0.nightly-2020-01-21-012409 True False False 5h service-catalog-apiserver 4.4.0-0.nightly-2020-01-21-012409 True False False 4h55m service-catalog-controller-manager 4.4.0-0.nightly-2020-01-21-012409 True False False 4h55m storage 4.4.0-0.nightly-2020-01-21-012409 True False False 122m
Per comment 3, user still need interfere with openshift-ansible rhel worker upgrade during `oc adm upgrade` process so that complete the whole cluster upgrade. So clear needinfo flag fro comment 2, and keep BZ#1792139 for tracking the future improvement for rhcos + rhel worker mix cluster upgrade process.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581