Bug 1792139
| Summary: | RHEL7 worker nodes may go to NotReady,SchedulingDisabled while upgrading from 4.2.12 to 4.3.0 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Gaoyun Pei <gpei> |
| Component: | Installer | Assignee: | Russell Teague <rteague> |
| Installer sub component: | openshift-ansible | QA Contact: | Gaoyun Pei <gpei> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | amurdaca, aos-bugs, gpei, jialiu, jokerman, juzhao, kalexand, mifiedle, rphillips, scuppett, sdodson, sjenning, vigoyal, weinliu, wsun, xtian |
| Version: | 4.3.0 | Keywords: | Regression, TestBlocker |
| Target Milestone: | --- | ||
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Machine config was not properly updated by MCO because package installs updated files on disk
Consequence: MCO would not process config updates on RHEL nodes
Fix: Added machine config apply back to upgrade steps and added proxy config for image pulls
Result: Machine configs were properly applied after package updates during upgrade.
|
Story Points: | --- |
| Clone Of: | 1786274 | Environment: | |
| Last Closed: | 2020-05-04 11:24:47 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1786274 | ||
| Bug Blocks: | |||
|
Comment 2
Scott Dodson
2020-01-30 20:02:43 UTC
During the upgrade of a 4.3.1 cluster(with rhcos&rhel worker) to 4.4, no RHEL worker would be stuck at NotReady,SchedulingDisabled, which is different from when doing so in "4.2 to 4.3". # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-02-07-012035 True False 4m27s Cluster version is 4.4.0-0.nightly-2020-02-07-012035 # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.4.0-0.nightly-2020-02-07-012035 True False False 3h55m cloud-credential 4.4.0-0.nightly-2020-02-07-012035 True False False 4h23m cluster-autoscaler 4.4.0-0.nightly-2020-02-07-012035 True False False 4h12m console 4.4.0-0.nightly-2020-02-07-012035 True False False 33m csi-snapshot-controller 4.4.0-0.nightly-2020-02-07-012035 True False False 3m56s dns 4.4.0-0.nightly-2020-02-07-012035 True False False 4h16m etcd 4.4.0-0.nightly-2020-02-07-012035 True False False 19m image-registry 4.4.0-0.nightly-2020-02-07-012035 True False False 7m40s ingress 4.4.0-0.nightly-2020-02-07-012035 True False False 3m44s insights 4.4.0-0.nightly-2020-02-07-012035 True False False 4h18m kube-apiserver 4.4.0-0.nightly-2020-02-07-012035 True False False 4h15m kube-controller-manager 4.4.0-0.nightly-2020-02-07-012035 True False False 4h16m kube-scheduler 4.4.0-0.nightly-2020-02-07-012035 True False False 4h16m kube-storage-version-migrator 4.4.0-0.nightly-2020-02-07-012035 True False False 7m46s machine-api 4.4.0-0.nightly-2020-02-07-012035 True False False 4h17m machine-config 4.4.0-0.nightly-2020-02-07-012035 True False False 21m marketplace 4.4.0-0.nightly-2020-02-07-012035 True False False 23m monitoring 4.4.0-0.nightly-2020-02-07-012035 True False False 8m11s network 4.4.0-0.nightly-2020-02-07-012035 True False False 4h18m node-tuning 4.4.0-0.nightly-2020-02-07-012035 True False False 3h27m openshift-apiserver 4.4.0-0.nightly-2020-02-07-012035 True False False 23m openshift-controller-manager 4.4.0-0.nightly-2020-02-07-012035 True False False 4h16m openshift-samples 4.4.0-0.nightly-2020-02-07-012035 True False False 3h27m operator-lifecycle-manager 4.4.0-0.nightly-2020-02-07-012035 True False False 4h17m operator-lifecycle-manager-catalog 4.4.0-0.nightly-2020-02-07-012035 True False False 4h17m operator-lifecycle-manager-packageserver 4.4.0-0.nightly-2020-02-07-012035 True False False 33m service-ca 4.4.0-0.nightly-2020-02-07-012035 True False False 4h18m service-catalog-apiserver 4.4.0-0.nightly-2020-02-07-012035 True False False 4h15m service-catalog-controller-manager 4.4.0-0.nightly-2020-02-07-012035 True False False 4h15m storage 4.4.0-0.nightly-2020-02-07-012035 True False False 3h27m # oc get clusterversion -o json|jq -r '.items[0].status.history[]|.startedTime + "|" + .completionTime + "|" + .state + "|" + .version' 2020-02-07T08:36:05Z|2020-02-07T12:09:47Z|Completed|4.4.0-0.nightly-2020-02-07-012035 2020-02-07T07:50:36Z|2020-02-07T08:18:35Z|Completed|4.3.1 After cluster upgrade finished, run upgrade playbook against all the RHEL workers. The whole cluster was upgrade to an expected status in the end. # oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-49-70.us-east-2.compute.internal Ready worker 3h44m v1.17.1 10.0.49.70 <none> Red Hat Enterprise Linux Server 7.6 (Maipo) 3.10.0-1062.12.1.el7.x86_64 cri-o://1.17.0-0.4.rc1.rhaos4.4.git5842752.el7-rc1 ip-10-0-51-166.us-east-2.compute.internal Ready worker 4h11m v1.17.1 10.0.51.166 <none> Red Hat Enterprise Linux CoreOS 44.81.202002061902-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.16.2-15.dev.rhaos4.3.gita83f883.el8 ip-10-0-57-164.us-east-2.compute.internal Ready master 4h23m v1.17.1 10.0.57.164 <none> Red Hat Enterprise Linux CoreOS 44.81.202002061902-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.16.2-15.dev.rhaos4.3.gita83f883.el8 ip-10-0-57-166.us-east-2.compute.internal Ready master 4h23m v1.17.1 10.0.57.166 <none> Red Hat Enterprise Linux CoreOS 44.81.202002061902-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.16.2-15.dev.rhaos4.3.gita83f883.el8 ip-10-0-59-5.us-east-2.compute.internal Ready worker 3h44m v1.17.1 10.0.59.5 <none> Red Hat Enterprise Linux Server 7.6 (Maipo) 3.10.0-1062.12.1.el7.x86_64 cri-o://1.17.0-0.4.rc1.rhaos4.4.git5842752.el7-rc1 ip-10-0-67-153.us-east-2.compute.internal Ready master 4h23m v1.17.1 10.0.67.153 <none> Red Hat Enterprise Linux CoreOS 44.81.202002061902-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.16.2-15.dev.rhaos4.3.gita83f883.el8 ip-10-0-69-163.us-east-2.compute.internal Ready worker 4h10m v1.17.1 10.0.69.163 <none> Red Hat Enterprise Linux CoreOS 44.81.202002061902-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.16.2-15.dev.rhaos4.3.gita83f883.el8 Mark this bug as verified in openshift-ansible-4.4.0-202002070656.git.178.3e1c275.el7.noarch.rpm. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |