Bug 1793078 - RHEL worker upgrade playbook leads to MCO being out of sync
Summary: RHEL worker upgrade playbook leads to MCO being out of sync
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.4.0
Assignee: Russell Teague
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On:
Blocks: 1793093
TreeView+ depends on / blocked
 
Reported: 2020-01-20 16:03 UTC by Russell Teague
Modified: 2020-05-04 11:25 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Machine config was not properly updated by MCO because package installs updated files on disk Consequence: MCO would not process config updates on RHEL nodes Fix: Added machine config apply back to upgrade steps and added proxy config for image pulls Result: Machine configs were properly applied after package updates during upgrade.
Clone Of:
: 1793093 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:25:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12069 0 None closed Bug 1793078: Fix machine config apply on upgrade 2020-11-12 10:12:08 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:25:55 UTC

Description Russell Teague 2020-01-20 16:03:15 UTC
Description of problem:
During a cluster upgrade, RHEL node could go Not Ready due to an incompatibility between the kube config and the version of kubelet running on the node.  This requires an upgrade of the kubelet by running the RHEL upgrade playbooks.  The playbooks install new RPMs which could modify the files managed by MCD and put the node in a Degraded state.

Version-Release number of the following components:
4.2 to 4.3

How reproducible:

Steps to Reproduce:
1. Install OCP 4.2
2. Upgrade cluster to 4.3
3. RHEL node is Not Ready
4. Upgrade RHEL nodes
5. MCO machine config rollout is blocked due to on disk files do not match config

Actual results:
RHEL node Not Ready due kube version skew:
hyperkube[2508]: F0117 14:48:12.003999    2508 server.go:206] unrecognized feature gate: LegacyNodeRoleBehavior

After RHEL upgrade, MCD reporting:
content mismatch for file /etc/containers/storage.conf


Expected results:
Upgrade to complete successfully.

Comment 2 Johnny Liu 2020-01-21 03:07:54 UTC
@Russell, will this also fix BZ#1792139 together?

Comment 3 weiwei jiang 2020-01-21 08:38:47 UTC
Checked with 4.2 -> 4.4 path for upgrade with proxy under restricted networking cluster

verified version: openshift-ansible-4.4.0-202001201746.git.178.e31d324.el7.noarch.rpm

# before upgrade
$ oc get clusterversion                                                                                                                                                                                                                                                       
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS                                                                                                                                                                                          
version   4.2.0-0.nightly-2020-01-20-195638   True        False         115m    Cluster version is 4.2.0-0.nightly-2020-01-20-195638     

# after trigger `oc adm upgrade`
$ oc get nodes -o wide && oc get clusterversion  && oc get co
NAME                           STATUS                        ROLES    AGE     VERSION             INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
wj42bz-bgmlk-compute-0         Ready                         worker   4h6m    v1.17.1             10.0.98.208   <none>        Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1
wj42bz-bgmlk-compute-1         Ready                         worker   4h7m    v1.14.6+97c81d00e   10.0.98.35    <none>        Red Hat Enterprise Linux CoreOS 42.81.20200114.0 (Ootpa)       4.18.0-147.3.1.el8_1.x86_64   cri-o://1.14.11-6.dev.rhaos4.2.git627b85c.el8        
wj42bz-bgmlk-control-plane-0   Ready                         master   4h19m   v1.17.1             10.0.98.128   <none>        Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1
wj42bz-bgmlk-control-plane-1   Ready                         master   4h19m   v1.17.1             10.0.96.127   <none>        Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1
wj42bz-bgmlk-control-plane-2   Ready                         master   4h19m   v1.17.1             10.0.96.156   <none>        Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1   
wj42bz-bgmlk-rhel-0            NotReady,SchedulingDisabled   worker   114m    v1.14.6+c383847f6   10.0.96.188   <none>        Red Hat Enterprise Linux Server 7.7 (Maipo)                    3.10.0-1062.9.1.el7.x86_64    cri-o://1.14.11-9.dev.rhaos4.2.git983e00f.el7        
wj42bz-bgmlk-rhel-1            Ready                         worker   114m    v1.14.6+c383847f6   10.0.96.72    <none>        Red Hat Enterprise Linux Server 7.7 (Maipo)                    3.10.0-1062.9.1.el7.x86_64    cri-o://1.14.11-9.dev.rhaos4.2.git983e00f.el7        
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS                                                  
version   4.2.0-0.nightly-2020-01-20-195638   True        True          86m     Unable to apply 4.4.0-0.nightly-2020-01-21-012409: the cluster operator monitoring is degraded                                                                                                  
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE               
authentication                             4.4.0-0.nightly-2020-01-21-012409   True        False         False      3h57m                                                                                                                                                       
cloud-credential                           4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h17m
cluster-autoscaler                         4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h7m
console                                    4.4.0-0.nightly-2020-01-21-012409   True        False         False      44m
dns                                        4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h13m
image-registry                             4.4.0-0.nightly-2020-01-21-012409   True        False         False      3h59m
ingress                                    4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h4m
insights                                   4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h14m
kube-apiserver                             4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h11m
kube-controller-manager                    4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h9m
kube-scheduler                             4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h11m
kube-storage-version-migrator              4.4.0-0.nightly-2020-01-21-012409   True        False         False      76m
machine-api                                4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h17m
machine-config                             4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h12m
marketplace                                4.4.0-0.nightly-2020-01-21-012409   True        False         False      43m
monitoring                                 4.4.0-0.nightly-2020-01-21-012409   False       True          True       47m
network                                    4.4.0-0.nightly-2020-01-21-012409   True        True          True       4h12m
node-tuning                                4.4.0-0.nightly-2020-01-21-012409   True        False         False      76m
openshift-apiserver                        4.4.0-0.nightly-2020-01-21-012409   True        False         False      39m
openshift-controller-manager               4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h13m
openshift-samples                          4.4.0-0.nightly-2020-01-21-012409   True        False         False      66m
operator-lifecycle-manager                 4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h12m
operator-lifecycle-manager-catalog         4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h12m
operator-lifecycle-manager-packageserver   4.4.0-0.nightly-2020-01-21-012409   True        False         False      38m
service-ca                                 4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h14m
service-catalog-apiserver                  4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h10m
service-catalog-controller-manager         4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h10m
storage                                    4.4.0-0.nightly-2020-01-21-012409   True        False         False      76m


# then run upgrade playbook for the cluster, and after that the cluster back to serve
$ oc get nodes -o wide && oc get clusterversion  && oc get co
NAME                           STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
wj42bz-bgmlk-compute-0         Ready    worker   4h52m   v1.17.1   10.0.98.208   <none>        Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1
wj42bz-bgmlk-compute-1         Ready    worker   4h53m   v1.17.1   10.0.98.35    <none>        Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1
wj42bz-bgmlk-control-plane-0   Ready    master   5h5m    v1.17.1   10.0.98.128   <none>        Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1
wj42bz-bgmlk-control-plane-1   Ready    master   5h5m    v1.17.1   10.0.96.127   <none>        Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1
wj42bz-bgmlk-control-plane-2   Ready    master   5h5m    v1.17.1   10.0.96.156   <none>        Red Hat Enterprise Linux CoreOS 44.81.202001202331.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.17.0-0.2.rc1.rhaos4.4.gitb89a5fc.el8-rc1
wj42bz-bgmlk-rhel-0            Ready    worker   160m    v1.17.1   10.0.96.188   <none>        Red Hat Enterprise Linux Server 7.7 (Maipo)                    3.10.0-1062.9.1.el7.x86_64    cri-o://1.17.0-0.3.rc1.rhaos4.4.gitb89a5fc.el7-rc1
wj42bz-bgmlk-rhel-1            Ready    worker   160m    v1.17.1   10.0.96.72    <none>        Red Hat Enterprise Linux Server 7.7 (Maipo)                    3.10.0-1062.9.1.el7.x86_64    cri-o://1.17.0-0.3.rc1.rhaos4.4.gitb89a5fc.el7-rc1
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-01-21-012409   True        False         39m     Cluster version is 4.4.0-0.nightly-2020-01-21-012409
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h43m
cloud-credential                           4.4.0-0.nightly-2020-01-21-012409   True        False         False      5h3m
cluster-autoscaler                         4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h52m
console                                    4.4.0-0.nightly-2020-01-21-012409   True        False         False      89m
dns                                        4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h59m
image-registry                             4.4.0-0.nightly-2020-01-21-012409   True        False         False      39m
ingress                                    4.4.0-0.nightly-2020-01-21-012409   True        False         False      39m
insights                                   4.4.0-0.nightly-2020-01-21-012409   True        False         False      5h
kube-apiserver                             4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h57m
kube-controller-manager                    4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h55m
kube-scheduler                             4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h57m
kube-storage-version-migrator              4.4.0-0.nightly-2020-01-21-012409   True        False         False      39m
machine-api                                4.4.0-0.nightly-2020-01-21-012409   True        False         False      5h3m
machine-config                             4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h58m
marketplace                                4.4.0-0.nightly-2020-01-21-012409   True        False         False      89m
monitoring                                 4.4.0-0.nightly-2020-01-21-012409   True        False         False      17m
network                                    4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h58m
node-tuning                                4.4.0-0.nightly-2020-01-21-012409   True        False         False      122m
openshift-apiserver                        4.4.0-0.nightly-2020-01-21-012409   True        False         False      85m
openshift-controller-manager               4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h59m
openshift-samples                          4.4.0-0.nightly-2020-01-21-012409   True        False         False      112m
operator-lifecycle-manager                 4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h58m
operator-lifecycle-manager-catalog         4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h58m
operator-lifecycle-manager-packageserver   4.4.0-0.nightly-2020-01-21-012409   True        False         False      84m
service-ca                                 4.4.0-0.nightly-2020-01-21-012409   True        False         False      5h
service-catalog-apiserver                  4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h55m
service-catalog-controller-manager         4.4.0-0.nightly-2020-01-21-012409   True        False         False      4h55m
storage                                    4.4.0-0.nightly-2020-01-21-012409   True        False         False      122m

Comment 4 Johnny Liu 2020-01-21 09:24:50 UTC
Per comment 3, user still need interfere with openshift-ansible rhel worker upgrade during `oc adm upgrade` process so that complete the whole cluster upgrade. So clear needinfo flag fro comment 2, and keep BZ#1792139 for tracking the future improvement for rhcos + rhel worker mix cluster upgrade process.

Comment 6 errata-xmlrpc 2020-05-04 11:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.