Bug 1793093 - RHEL worker upgrade playbook leads to MCO being out of sync
Summary: RHEL worker upgrade playbook leads to MCO being out of sync
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.z
Assignee: Russell Teague
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On: 1793078
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-20 16:19 UTC by Russell Teague
Modified: 2020-02-25 06:18 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1793078
Environment:
Last Closed: 2020-02-25 06:17:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12070 0 None closed Bug 1793093: [release-4.3] Fix machine config apply on upgrade 2020-05-15 18:04:26 UTC
Red Hat Product Errata RHBA-2020:0528 0 None None None 2020-02-25 06:18:15 UTC

Description Russell Teague 2020-01-20 16:19:15 UTC
+++ This bug was initially created as a clone of Bug #1793078 +++

Description of problem:
During a cluster upgrade, RHEL node could go Not Ready due to an incompatibility between the kube config and the version of kubelet running on the node.  This requires an upgrade of the kubelet by running the RHEL upgrade playbooks.  The playbooks install new RPMs which could modify the files managed by MCD and put the node in a Degraded state.

Version-Release number of the following components:
4.2 to 4.3

How reproducible:

Steps to Reproduce:
1. Install OCP 4.2
2. Upgrade cluster to 4.3
3. RHEL node is Not Ready
4. Upgrade RHEL nodes
5. MCO machine config rollout is blocked due to on disk files do not match config

Actual results:
RHEL node Not Ready due kube version skew:
hyperkube[2508]: F0117 14:48:12.003999    2508 server.go:206] unrecognized feature gate: LegacyNodeRoleBehavior

After RHEL upgrade, MCD reporting:
content mismatch for file /etc/containers/storage.conf


Expected results:
Upgrade to complete successfully.

Comment 3 weiwei jiang 2020-02-17 08:47:07 UTC
Checked with openshift-ansible-4.3.3-202002142331.git.173.bb0b5a1.el7.noarch.rpm

and this issue is fixed. 
$ oc get co 
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.3.0-0.nightly-2020-02-16-235204   True        False         False      77m
cloud-credential                           4.3.0-0.nightly-2020-02-16-235204   True        False         False      96m
cluster-autoscaler                         4.3.0-0.nightly-2020-02-16-235204   True        False         False      83m
console                                    4.3.0-0.nightly-2020-02-16-235204   True        False         False      14m
dns                                        4.3.0-0.nightly-2020-02-16-235204   True        False         False      95m
image-registry                             4.3.0-0.nightly-2020-02-16-235204   True        False         False      12m
ingress                                    4.3.0-0.nightly-2020-02-16-235204   True        False         False      83m
insights                                   4.3.0-0.nightly-2020-02-16-235204   True        False         False      96m
kube-apiserver                             4.3.0-0.nightly-2020-02-16-235204   True        False         False      89m
kube-controller-manager                    4.3.0-0.nightly-2020-02-16-235204   True        False         False      89m
kube-scheduler                             4.3.0-0.nightly-2020-02-16-235204   True        False         False      90m
machine-api                                4.3.0-0.nightly-2020-02-16-235204   True        False         False      96m
machine-config                             4.3.0-0.nightly-2020-02-16-235204   True        False         False      89m
marketplace                                4.3.0-0.nightly-2020-02-16-235204   True        False         False      18m
monitoring                                 4.3.0-0.nightly-2020-02-16-235204   True        False         False      11m
network                                    4.3.0-0.nightly-2020-02-16-235204   True        False         False      91m
node-tuning                                4.3.0-0.nightly-2020-02-16-235204   True        False         False      14m
openshift-apiserver                        4.3.0-0.nightly-2020-02-16-235204   True        False         False      17m
openshift-controller-manager               4.3.0-0.nightly-2020-02-16-235204   True        False         False      90m
openshift-samples                          4.3.0-0.nightly-2020-02-16-235204   True        False         False      39m
operator-lifecycle-manager                 4.3.0-0.nightly-2020-02-16-235204   True        False         False      91m
operator-lifecycle-manager-catalog         4.3.0-0.nightly-2020-02-16-235204   True        False         False      91m
operator-lifecycle-manager-packageserver   4.3.0-0.nightly-2020-02-16-235204   True        False         False      14m
service-ca                                 4.3.0-0.nightly-2020-02-16-235204   True        False         False      95m
service-catalog-apiserver                  4.3.0-0.nightly-2020-02-16-235204   True        False         False      88m
service-catalog-controller-manager         4.3.0-0.nightly-2020-02-16-235204   True        False         False      88m
storage                                    4.3.0-0.nightly-2020-02-16-235204   True        False         False      52m

$ oc get nodes -o wide                                                                                                                                                                                                                                                  130 ↵
NAME                               STATUS   ROLES    AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
wjuobos217-7th7s-compute-0         Ready    worker   88m    v1.16.2   10.0.98.105   <none>        Red Hat Enterprise Linux CoreOS 43.81.202002131553.0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.16.3-19.dev.rhaos4.3.git6c1f4bd.el8
wjuobos217-7th7s-compute-1         Ready    worker   88m    v1.16.2   10.0.97.232   <none>        Red Hat Enterprise Linux CoreOS 43.81.202002131553.0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.16.3-19.dev.rhaos4.3.git6c1f4bd.el8
wjuobos217-7th7s-control-plane-0   Ready    master   100m   v1.16.2   10.0.96.145   <none>        Red Hat Enterprise Linux CoreOS 43.81.202002131553.0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.16.3-19.dev.rhaos4.3.git6c1f4bd.el8
wjuobos217-7th7s-control-plane-1   Ready    master   99m    v1.16.2   10.0.97.207   <none>        Red Hat Enterprise Linux CoreOS 43.81.202002131553.0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.16.3-19.dev.rhaos4.3.git6c1f4bd.el8
wjuobos217-7th7s-control-plane-2   Ready    master   100m   v1.16.2   10.0.97.218   <none>        Red Hat Enterprise Linux CoreOS 43.81.202002131553.0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.16.3-19.dev.rhaos4.3.git6c1f4bd.el8
wjuobos217-7th7s-rhel-0            Ready    worker   69m    v1.16.2   10.0.97.70    <none>        Red Hat Enterprise Linux Server 7.7 (Maipo)                    3.10.0-1062.12.1.el7.x86_64   cri-o://1.16.3-20.dev.rhaos4.3.git11c04e3.el7

Comment 5 errata-xmlrpc 2020-02-25 06:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0528


Note You need to log in before you can comment on or make changes to this bug.