Bug 1793093

Summary: RHEL worker upgrade playbook leads to MCO being out of sync
Product: OpenShift Container Platform Reporter: Russell Teague <rteague>
Component: InstallerAssignee: Russell Teague <rteague>
Installer sub component: openshift-ansible QA Contact: weiwei jiang <wjiang>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: jialiu
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1793078 Environment:
Last Closed: 2020-02-25 06:17:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1793078    
Bug Blocks:    

Description Russell Teague 2020-01-20 16:19:15 UTC
+++ This bug was initially created as a clone of Bug #1793078 +++

Description of problem:
During a cluster upgrade, RHEL node could go Not Ready due to an incompatibility between the kube config and the version of kubelet running on the node.  This requires an upgrade of the kubelet by running the RHEL upgrade playbooks.  The playbooks install new RPMs which could modify the files managed by MCD and put the node in a Degraded state.

Version-Release number of the following components:
4.2 to 4.3

How reproducible:

Steps to Reproduce:
1. Install OCP 4.2
2. Upgrade cluster to 4.3
3. RHEL node is Not Ready
4. Upgrade RHEL nodes
5. MCO machine config rollout is blocked due to on disk files do not match config

Actual results:
RHEL node Not Ready due kube version skew:
hyperkube[2508]: F0117 14:48:12.003999    2508 server.go:206] unrecognized feature gate: LegacyNodeRoleBehavior

After RHEL upgrade, MCD reporting:
content mismatch for file /etc/containers/storage.conf


Expected results:
Upgrade to complete successfully.

Comment 3 weiwei jiang 2020-02-17 08:47:07 UTC
Checked with openshift-ansible-4.3.3-202002142331.git.173.bb0b5a1.el7.noarch.rpm

and this issue is fixed. 
$ oc get co 
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.3.0-0.nightly-2020-02-16-235204   True        False         False      77m
cloud-credential                           4.3.0-0.nightly-2020-02-16-235204   True        False         False      96m
cluster-autoscaler                         4.3.0-0.nightly-2020-02-16-235204   True        False         False      83m
console                                    4.3.0-0.nightly-2020-02-16-235204   True        False         False      14m
dns                                        4.3.0-0.nightly-2020-02-16-235204   True        False         False      95m
image-registry                             4.3.0-0.nightly-2020-02-16-235204   True        False         False      12m
ingress                                    4.3.0-0.nightly-2020-02-16-235204   True        False         False      83m
insights                                   4.3.0-0.nightly-2020-02-16-235204   True        False         False      96m
kube-apiserver                             4.3.0-0.nightly-2020-02-16-235204   True        False         False      89m
kube-controller-manager                    4.3.0-0.nightly-2020-02-16-235204   True        False         False      89m
kube-scheduler                             4.3.0-0.nightly-2020-02-16-235204   True        False         False      90m
machine-api                                4.3.0-0.nightly-2020-02-16-235204   True        False         False      96m
machine-config                             4.3.0-0.nightly-2020-02-16-235204   True        False         False      89m
marketplace                                4.3.0-0.nightly-2020-02-16-235204   True        False         False      18m
monitoring                                 4.3.0-0.nightly-2020-02-16-235204   True        False         False      11m
network                                    4.3.0-0.nightly-2020-02-16-235204   True        False         False      91m
node-tuning                                4.3.0-0.nightly-2020-02-16-235204   True        False         False      14m
openshift-apiserver                        4.3.0-0.nightly-2020-02-16-235204   True        False         False      17m
openshift-controller-manager               4.3.0-0.nightly-2020-02-16-235204   True        False         False      90m
openshift-samples                          4.3.0-0.nightly-2020-02-16-235204   True        False         False      39m
operator-lifecycle-manager                 4.3.0-0.nightly-2020-02-16-235204   True        False         False      91m
operator-lifecycle-manager-catalog         4.3.0-0.nightly-2020-02-16-235204   True        False         False      91m
operator-lifecycle-manager-packageserver   4.3.0-0.nightly-2020-02-16-235204   True        False         False      14m
service-ca                                 4.3.0-0.nightly-2020-02-16-235204   True        False         False      95m
service-catalog-apiserver                  4.3.0-0.nightly-2020-02-16-235204   True        False         False      88m
service-catalog-controller-manager         4.3.0-0.nightly-2020-02-16-235204   True        False         False      88m
storage                                    4.3.0-0.nightly-2020-02-16-235204   True        False         False      52m

$ oc get nodes -o wide                                                                                                                                                                                                                                                  130 ↵
NAME                               STATUS   ROLES    AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
wjuobos217-7th7s-compute-0         Ready    worker   88m    v1.16.2   10.0.98.105   <none>        Red Hat Enterprise Linux CoreOS 43.81.202002131553.0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.16.3-19.dev.rhaos4.3.git6c1f4bd.el8
wjuobos217-7th7s-compute-1         Ready    worker   88m    v1.16.2   10.0.97.232   <none>        Red Hat Enterprise Linux CoreOS 43.81.202002131553.0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.16.3-19.dev.rhaos4.3.git6c1f4bd.el8
wjuobos217-7th7s-control-plane-0   Ready    master   100m   v1.16.2   10.0.96.145   <none>        Red Hat Enterprise Linux CoreOS 43.81.202002131553.0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.16.3-19.dev.rhaos4.3.git6c1f4bd.el8
wjuobos217-7th7s-control-plane-1   Ready    master   99m    v1.16.2   10.0.97.207   <none>        Red Hat Enterprise Linux CoreOS 43.81.202002131553.0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.16.3-19.dev.rhaos4.3.git6c1f4bd.el8
wjuobos217-7th7s-control-plane-2   Ready    master   100m   v1.16.2   10.0.97.218   <none>        Red Hat Enterprise Linux CoreOS 43.81.202002131553.0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.16.3-19.dev.rhaos4.3.git6c1f4bd.el8
wjuobos217-7th7s-rhel-0            Ready    worker   69m    v1.16.2   10.0.97.70    <none>        Red Hat Enterprise Linux Server 7.7 (Maipo)                    3.10.0-1062.12.1.el7.x86_64   cri-o://1.16.3-20.dev.rhaos4.3.git11c04e3.el7

Comment 5 errata-xmlrpc 2020-02-25 06:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0528