Bug 1994728
| Summary: | upgrade from 4.6 to 4.7 to 4.8 with mcp worker "paused=true", crio report "panic: close of closed channel" which lead to a master Node go into Restart loop | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
| Component: | Node | Assignee: | Peter Hunt <pehunt> |
| Node sub component: | CRI-O | QA Contact: | MinLi <minmli> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | aos-bugs, cback, minmli, schoudha |
| Version: | 4.7 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.8.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-08-31 16:17:44 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1994454 | ||
| Bug Blocks: | 1994729 | ||
|
Description
OpenShift BugZilla Robot
2021-08-17 19:06:52 UTC
PR merged upgrade from 4.8.7 to 4.9 nightly build with mcp worker "paused=true", test 3 times, don't hit the issue. set verified. Thanks minmli, I also ran upgrade as part of 4.7.z bug and then upgraded to 4.8.7.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.nightly-2021-08-25-190633 True False 90m Cluster version is 4.6.0-0.nightly-2021-08-25-190633
$ oc get machineconfigpool worker -o yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
creationTimestamp: "2021-08-26T09:58:37Z"
generation: 3
labels:
machineconfiguration.openshift.io/mco-built-in: ""
pools.operator.machineconfiguration.openshift.io/worker: ""
name: worker
resourceVersion: "52970"
selfLink: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker
uid: 5a28dd37-409d-4626-90bd-ed1852511461
spec:
configuration:
name: rendered-worker-8e575f0a99d70dd786512889ddc3e642
source:
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 00-worker
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 01-worker-container-runtime
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 01-worker-kubelet
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-worker-generated-registries
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-worker-ssh
machineConfigSelector:
matchLabels:
machineconfiguration.openshift.io/role: worker
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker: ""
paused: true
...
$ oc adm upgrade --to-image=quay.io/openshift-release-dev/ocp-release:4.7.26-x86_64 --force --allow-explicit-upgrade
...
Updating to release image quay.io/openshift-release-dev/ocp-release:4.7.26-x86_64
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.26 True False 25m Cluster version is 4.7.26
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-538c233877e7e578d2b4d1f3329579a4 True False False 3 3 3 0 3h27m
worker rendered-worker-8e575f0a99d70dd786512889ddc3e642 False False False 3 0 0 0 3h27m
$ oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-147-150.ap-south-1.compute.internal Ready master 3h29m v1.20.0+4593a24 10.0.147.150 <none> Red Hat Enterprise Linux CoreOS 47.84.202108181031-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.20.4-11.rhaos4.7.git9d682e1.el8
ip-10-0-152-34.ap-south-1.compute.internal Ready worker 3h16m v1.19.0+4c3480d 10.0.152.34 <none> Red Hat Enterprise Linux CoreOS 46.82.202108251457-0 (Ootpa) 4.18.0-193.60.2.el8_2.x86_64 cri-o://1.19.3-11.rhaos4.6.git66a69b8.el8
ip-10-0-170-36.ap-south-1.compute.internal Ready worker 3h20m v1.19.0+4c3480d 10.0.170.36 <none> Red Hat Enterprise Linux CoreOS 46.82.202108251457-0 (Ootpa) 4.18.0-193.60.2.el8_2.x86_64 cri-o://1.19.3-11.rhaos4.6.git66a69b8.el8
ip-10-0-173-42.ap-south-1.compute.internal Ready master 3h29m v1.20.0+4593a24 10.0.173.42 <none> Red Hat Enterprise Linux CoreOS 47.84.202108181031-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.20.4-11.rhaos4.7.git9d682e1.el8
ip-10-0-200-20.ap-south-1.compute.internal Ready master 3h29m v1.20.0+4593a24 10.0.200.20 <none> Red Hat Enterprise Linux CoreOS 47.84.202108181031-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.20.4-11.rhaos4.7.git9d682e1.el8
ip-10-0-204-249.ap-south-1.compute.internal Ready worker 3h20m v1.19.0+4c3480d 10.0.204.249 <none> Red Hat Enterprise Linux CoreOS 46.82.202108251457-0 (Ootpa) 4.18.0-193.60.2.el8_2.x86_64 cri-o://1.19.3-11.rhaos4.6.git66a69b8.el8
$ oc adm upgrade --to-image=quay.io/openshift-release-dev/ocp-release:4.8.7-x86_64 --force --allow-explicit-upgrade
...
Updating to release image quay.io/openshift-release-dev/ocp-release:4.8.7-x86_64
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.7 True False 50m Cluster version is 4.8.7
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-49174dedc300467ed7322e14529bdfe3 True False False 3 3 3 0 5h53m
worker rendered-worker-8e575f0a99d70dd786512889ddc3e642 False False False 3 0 0 0 5h53m
$ oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-147-150.ap-south-1.compute.internal Ready master 5h55m v1.21.1+9807387 10.0.147.150 <none> Red Hat Enterprise Linux CoreOS 48.84.202108210854-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.21.2-13.rhaos4.8.git52b3f98.el8
ip-10-0-152-34.ap-south-1.compute.internal Ready worker 5h43m v1.19.0+4c3480d 10.0.152.34 <none> Red Hat Enterprise Linux CoreOS 46.82.202108251457-0 (Ootpa) 4.18.0-193.60.2.el8_2.x86_64 cri-o://1.19.3-11.rhaos4.6.git66a69b8.el8
ip-10-0-170-36.ap-south-1.compute.internal Ready worker 5h46m v1.19.0+4c3480d 10.0.170.36 <none> Red Hat Enterprise Linux CoreOS 46.82.202108251457-0 (Ootpa) 4.18.0-193.60.2.el8_2.x86_64 cri-o://1.19.3-11.rhaos4.6.git66a69b8.el8
ip-10-0-173-42.ap-south-1.compute.internal Ready master 5h55m v1.21.1+9807387 10.0.173.42 <none> Red Hat Enterprise Linux CoreOS 48.84.202108210854-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.21.2-13.rhaos4.8.git52b3f98.el8
ip-10-0-200-20.ap-south-1.compute.internal Ready master 5h55m v1.21.1+9807387 10.0.200.20 <none> Red Hat Enterprise Linux CoreOS 48.84.202108210854-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.21.2-13.rhaos4.8.git52b3f98.el8
ip-10-0-204-249.ap-south-1.compute.internal Ready worker 5h47m v1.19.0+4c3480d 10.0.204.249 <none> Red Hat Enterprise Linux CoreOS 46.82.202108251457-0 (Ootpa) 4.18.0-193.60.2.el8_2.x86_64 cri-o://1.19.3-11.rhaos4.6.git66a69b8.el8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.9 bug fix), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3247 *** Bug 2005320 has been marked as a duplicate of this bug. *** |