Bug 1858026
Summary: | Panic in machine-config-operator when attempting to upgrade to 4.5.2 | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Kevin Chung <kechung> | ||||
Component: | Machine Config Operator | Assignee: | MCO Team <team-mco> | ||||
Machine Config Operator sub component: | Machine Config Operator | QA Contact: | Rio Liu <rioliu> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | urgent | ||||||
Priority: | urgent | CC: | alchan, amurdaca, aos-bugs, jbrooks, kgarriso, mkrejci, mnguyen, nschuetz, rioliu, sdodson, sregidor, vpagar, walters, wking, xtian | ||||
Version: | 4.5 | Keywords: | Upgrades | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.6.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1858907 (view as bug list) | Environment: | |||||
Last Closed: | 2020-10-27 16:15:30 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1858907 | ||||||
Attachments: |
|
Description
Kevin Chung
2020-07-16 20:43:25 UTC
Hi Kevin, Can you please attach a must-gather from this cluster? I've kicked off a few tests to see if I can replicate that way in the meantime while we wait for must-gather. Update: I ran 3 tests from 4.4.12 -> 4.5.2 and they all passed. Hi Kirsten, I created support case #02705174 to attach a large must-gather from this cluster. Also of note, I attempted and failed to run the must-gather two times before I succeeded. Not entirely sure if it's related. $ oc adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:61198ba5bd46fc26b3d40d83a2fb7f859614f516a7896404b70fa468c8efa5da [must-gather ] OUT namespace/openshift-must-gather-kgx49 created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-dtvgp created [must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:61198ba5bd46fc26b3d40d83a2fb7f859614f516a7896404b70fa468c8efa5da created [must-gather-tzmwz] OUT gather did not start: Get https://api.ocp4.csa.gsslab.rdu2.redhat.com:6443/api/v1/namespaces/openshift-must-gather-kgx49/pods/must-gather-tzmwz: unexpected EOF [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-dtvgp deleted [must-gather ] OUT namespace/openshift-must-gather-kgx49 deleted error: gather did not start for pod must-gather-tzmwz: Get https://api.ocp4.csa.gsslab.rdu2.redhat.com:6443/api/v1/namespaces/openshift-must-gather-kgx49/pods/must-gather-tzmwz: unexpected EOF $ oc adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:61198ba5bd46fc26b3d40d83a2fb7f859614f516a7896404b70fa468c8efa5da [must-gather ] OUT namespace/openshift-must-gather-29sqc created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-hrzwx created [must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:61198ba5bd46fc26b3d40d83a2fb7f859614f516a7896404b70fa468c8efa5da created [must-gather-7gdhh] OUT gather did not start: Get https://api.ocp4.csa.gsslab.rdu2.redhat.com:6443/api/v1/namespaces/openshift-must-gather-29sqc/pods/must-gather-7gdhh: unexpected EOF [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-hrzwx deleted [must-gather ] OUT namespace/openshift-must-gather-29sqc deleted error: gather did not start for pod must-gather-7gdhh: Get https://api.ocp4.csa.gsslab.rdu2.redhat.com:6443/api/v1/namespaces/openshift-must-gather-29sqc/pods/must-gather-7gdhh: unexpected EOF Kevin Notes going through the logs: Looking at MCP : - lastTransitionTime: "2020-07-17T00:23:15Z" message: All nodes are updating to rendered-worker-237e0016060efcefe9cebacf7a047840 reason: "" status: "True" type: Updating configuration: name: rendered-worker-237e0016060efcefe9cebacf7a047840 source: - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 00-worker - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 01-worker-container-runtime - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 01-worker-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-chrony-configuration - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-e9125565-c5e4-11ea-8005-001a4a0ab023-registries - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-ssh degradedMachineCount: 0 machineCount: 3 observedGeneration: 8 readyMachineCount: 2 unavailableMachineCount: 1 updatedMachineCount: 3 3 updated machines but 1 unavailable? But looking at MCC logs the day before: 2020-07-16T19:42:38.865102277Z I0716 19:42:38.865026 1 status.go:82] Pool worker: All nodes are updated with rendered-worker-237e0016060efcefe9cebacf7a047840 ... Many hours later? After the pool was finished? ... 2020-07-17T00:23:10.432537403Z I0717 00:23:10.432234 1 node_controller.go:433] Pool worker: node worker3.ocp4.csa.gsslab.rdu2.redhat.com is now reporting unready: node worker3.ocp4.csa.gsslab.rdu2.redhat.com is reporting Unschedulable NOTE: in the above post ^^^ rendered-worker-237e0016060efcefe9cebacf7a047840 looks to be an update to 4.4.12 Another weird thing is that the kubelet_service.log just... cuts off? ``` Jul 16 04:11:43.514356 master1.ocp4.csa.gsslab.rdu2.redhat.com hyperkube[1369]: I0716 04:11:43.514273 1369 prober.go:129] Readiness probe for "console-7cb4fbcc9-6tvmv_openshift-console(8a5e9555-7561-4dd7-862c-5cf687c349f9):console" succeeded Jul 16 04:11:43.631471 master1.ocp4.csa.gsslab.rdu2.redhat.com hyperkube[1369]: I0716 04:11:43.631388 1369 prober.go:129] Readiness probe for "marketplace-operator-684775c9cd-7cbjj_openshift-marketplace(08616998-805e-4839-aa31-44bac2c7410a):marketplace-operator" succeeded Jul 16 04:11:43.710163 master1.ocp4.csa.gsslab.rdu2.redhat.com hyperkube[1369]: I0716 04:11:43.710109 1369 prober.go:129] Readiness probe for "oauth-openshift-5755d79585-9mq77_openshift-authentication(a79bde14-22f9-4647-a9fe-814bd1e8433c):oauth-openshift" succeeded Jul 16 04:11:44.095075 master1.ocp4.csa.gsslab.rdu2.redhat.com hyperkube[1369]: I0716 04:11:44.095028 1369 prober.go:129] Liveness probe for "kube-apiserver-master1.ocp4.csa.gsslab.rdu2.redhat.com_openshift-kube-apiserver(b045dd46f58ef123b734cb62cd0d4b36):kube-apiserver" succeeded Jul 16 04:11:44.177008 master1.ocp4.csa.gsslab.rdu2.redhat.com hyperkube[1369]: I0716 04:11:44.176957 1369 prober.go:129] Readiness pro ``` Even tho the masters seem to have updated to 4.4.12 just fine? - lastTransitionTime: "2020-07-16T20:38:51Z" message: All nodes are updated with rendered-master-25beb3d199cfc42b52b3c2034c96497c reason: "" status: "True" type: Updated So the masters we alive but kubelet_service.log stopped? This shouldnt be related to the worker pool however. Investigating further. To add some context, this cluster was built from scratch three days ago starting at 4.1.0 and I stepped through a number of upgrades from stable channels, all successful with no issues (displayed in 'oc get clusterversion'). I left the cluster fully functional on 4.4.12 for a couple of days before I attempted the 4.5.2 upgrade when it became available yesterday. The worker3 which was reporting Unschedulable was actually a result of my comment #4 where I wasn't able to run a must-gather on the node for some reason, cordoned it to attempt to run must-gather from a different node, and found that pod spun up on worker3 again anyways but this time with success (which I uploaded). So you can ignore the Unschedulable node. Created attachment 1701585 [details]
Web console for this OCP cluster
I've also attached a screenshot of the web console showing the current state of my OpenShift cluster. The machine-config-operator pod is down, but everything else is up. Also, here's the state of each of the nodes: $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master1.ocp4.csa.gsslab.rdu2.redhat.com Ready master 3d5h v1.17.1+a1af596 10.10.179.161 <none> Red Hat Enterprise Linux CoreOS 44.81.202007070223-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-19.rhaos4.4.gitfb8131a.el8 master2.ocp4.csa.gsslab.rdu2.redhat.com Ready master 3d5h v1.17.1+a1af596 10.10.179.180 <none> Red Hat Enterprise Linux CoreOS 44.81.202007070223-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-19.rhaos4.4.gitfb8131a.el8 master3.ocp4.csa.gsslab.rdu2.redhat.com Ready master 3d5h v1.17.1+a1af596 10.10.179.175 <none> Red Hat Enterprise Linux CoreOS 44.81.202007070223-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-19.rhaos4.4.gitfb8131a.el8 worker1.ocp4.csa.gsslab.rdu2.redhat.com Ready worker 3d5h v1.17.1+a1af596 10.10.179.176 <none> Red Hat Enterprise Linux CoreOS 44.81.202007070223-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-19.rhaos4.4.gitfb8131a.el8 worker2.ocp4.csa.gsslab.rdu2.redhat.com Ready worker 3d5h v1.17.1+a1af596 10.10.179.177 <none> Red Hat Enterprise Linux CoreOS 44.81.202007070223-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-19.rhaos4.4.gitfb8131a.el8 worker3.ocp4.csa.gsslab.rdu2.redhat.com Ready worker 3d5h v1.17.1+a1af596 10.10.179.178 <none> Red Hat Enterprise Linux CoreOS 44.81.202007070223-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-19.rhaos4.4.gitfb8131a.el8 For ref, Kevin's cluster is RHEV installed as baremetal and Jason's is baremetal. I think the `syncCloudConfig()` bit here is key. On bare metal that won't exist. The code looks like it's trying to handle it not existing but the bug likely lies there. https://github.com/openshift/machine-config-operator/blob/1f52e483b93ffd88ba7d8217b273357e61e0cc6a/pkg/operator/sync.go#L131 last touched this. Sorry I meant https://github.com/openshift/machine-config-operator/commit/e7455dcb4e0150e00f78e0ae4954b73047d1bf75 @Colin The weird thing is that the baremetal team has already updated that 2months ago and removed baremetal from that (along with ovirt): https://github.com/openshift/machine-config-operator/commit/7c6e1ba9dbcec56f02f13b071664e160d9552b16 must-gather includes: $ grep -r 'layer not known' host_service_logs/masters/crio_service.log:Jul 16 19:37:59.241925 master1.ocp4.csa.gsslab.rdu2.redhat.com crio[1321]: time="2020-07-16 19:37:59.232234035Z" level=warning msg="failed to stop container k8s_packageserver_packageserver-54646bfd7d-58h7p_openshift-operator-lifecycle-manager_633f410c-6d1b-48d4-9277-7157e922ea49_0 in pod sandbox 253281d149544387f315b280380987d361a9f7539351c13ab13b39d731a4ff4b: layer not known" id=af68b800-ff16-47df-881c-fc50099950b0 which is suspicious for bug 1857224. Although I'm not clear yet on how the sync corruption discussed there would cause "failed to stop container" errors instead of "failed to create container" errors. @wking That error seems to coincide with the upgrade to 4.4.12 (which finished around 19:45) not the subsequent upgrade to 4.5.2 (as best as I can follow the logs) Just double checking the configs in infrastructure/cluser.yaml: ``` spec: cloudConfig: name: "" status: ... platform: None ``` config.io.openshift/infrastructures.yaml: ``` spec: cloudConfig: name: "" status: ... platform: None ``` I believe `switch infra.Status.PlatformStatus.Type` is the problem. if that DNE I get the same panic! Copying over my summary from the PR: infra.Status.PlatformStatus is *PlatformStatus and in <4.5.x baremetal setups this entire thing is empty and only platform is set to none. So when we hit the switch statement in the MCO we panic bc pointers, so check that it's really there before you do the switch stmt case comparisions. Before my fix, the unit test I added failed with same panic, now it passes and func returns false. This behavior was seen in bm clusters updating from 4.4.12->4.5.2, which is also when Platform was deprecated in favor of PlatformStatus and the MCO was missing the check. The checks do exist in the MCC transitioning them to the new type. As for why we saw this with users but not in CI, AFAIK there isn't an e2e metal upgrade job anywhere and I believe the existing metal job would just install a 4.5.x cluster with the new PlatformStatus. We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? example: Up to 2 minute disruption in edge routing example: Up to 90seconds of API downtime example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? example: Issue resolves itself after five minutes example: Admin uses oc to fix things example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? example: No, it’s always been like this we just never noticed example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1 For clusters such as mine with an upgrade in limbo due to this machine-config-operator bug, once this errata is published how shall we recover? Shall we just force upgrade to the 4.5.latest? Draft impact statement, to be updated as we get more information: Who is impacted? - Customers upgrading to 4.5.2 with platform: none which is some subset of baremetal deployments What is the impact? Is it serious enough to warrant blocking edges? - When the upgrade is rolling out the MCO panics and the upgrade is blocked. This will happen to every `platform: none` deployment. How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? - We are currently investigating remediation, but so far have no confirmed fix. Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? - No this is new in 4.5.2 To test this fix, you will need a cluster upgraded from 4.1 -> 4.4.12, verify that it has platform:None and no platformStatus set in the infrastucture object, then upgrade to master which contains the fix. The expectation is that you should upgrade successfully and not hit the above MCO panic. We can reproduce it with a baremetal on OSP cluster upgraded from 4.1.41-> 4.2.36-> 4.3.29 -> 4.4.12-> 4.5.2. For details, please refer to [1]. Upgrade across 3 y-versions such as upgrading from 4.4.12 -> 4.6 is not officially supported. That means, to test it, we will need to upgrade a cluster from 4.1.41-> 4.2.36-> 4.3.29 -> 4.4.12-> 4.5.2 -> 4.6. However, when it comes to 4.5, it will fail definitely. To bypass the issue, we thought it was able to edit infrastucture object and remove platformStatus to mimic this case. It did work on a 4.4 cluster. With platformStatus removed on a fresh installed 4.4 cluster, upgrade failed. For details, please refer to [2]. We tried the similar operations on a fresh installed 4.5 cluster, but platformStatus was unable to remove. So without the fix in 4.5, we're getting stuck here. [1]https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/upgrade_CI/3931/console [2]https://gitlab.cee.redhat.com/openshift-qe/qe-40-blog/-/blob/master/gpei/BZ%231858026_reproduce.md Thanks @yangyang for the update it does look like you reproduced the bug correctly. I'm wondering if there is a way to take a 4.5 ci build from the 4.5 PR.. Let me try this out, I'm pretty sure I've done it in the past.. Will update shortly. I think the most expedient thing will be to merge the fix into 4.5 since it's nearly impossible to test in 4.6 and QE has a confirmed reproducer. I'm going to override the bugzilla/valid-bug based on this reasoning. SGTM *** Bug 1859781 has been marked as a duplicate of this bug. *** Verified upgrade from 4.4.13 -> 4.5.0-0.nightly-2020-07-24-091850 using the reproducer of removing `platformStatus.type=None`. [root@helper openshift]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.13 True False 16m Cluster version is 4.4.13 [root@helper openshift]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.4.13 True False False 17m cloud-credential 4.4.13 True False False 62m cluster-autoscaler 4.4.13 True False False 37m console 4.4.13 True False False 20m csi-snapshot-controller 4.4.13 True False False 22m dns 4.4.13 True False False 45m etcd 4.4.13 True False False 44m image-registry 4.4.13 True False False 38m ingress 4.4.13 True False False 22m insights 4.4.13 True False False 38m kube-apiserver 4.4.13 True False False 43m kube-controller-manager 4.4.13 True False False 44m kube-scheduler 4.4.13 True False False 43m kube-storage-version-migrator 4.4.13 True False False 22m machine-api 4.4.13 True False False 38m machine-config 4.4.13 True False False 45m marketplace 4.4.13 True False False 37m monitoring 4.4.13 True False False 20m network 4.4.13 True False False 46m node-tuning 4.4.13 True False False 46m openshift-apiserver 4.4.13 True False False 40m openshift-controller-manager 4.4.13 True False False 37m openshift-samples 4.4.13 True False False 36m operator-lifecycle-manager 4.4.13 True False False 45m operator-lifecycle-manager-catalog 4.4.13 True False False 45m operator-lifecycle-manager-packageserver 4.4.13 True False False 40m service-ca 4.4.13 True False False 46m service-catalog-apiserver 4.4.13 True False False 46m service-catalog-controller-manager 4.4.13 True False False 46m storage 4.4.13 True False False 37m [root@helper openshift]# oc get infrastructure -o yaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2020-07-24T12:18:06Z" generation: 1 name: cluster resourceVersion: "430" selfLink: /apis/config.openshift.io/v1/infrastructures/cluster uid: 09e21c21-e9ab-4686-880a-7ab31e0ac80f spec: cloudConfig: name: "" status: apiServerInternalURI: https://api-int.ocp4.example.com:6443 apiServerURL: https://api.ocp4.example.com:6443 etcdDiscoveryDomain: ocp4.example.com infrastructureName: ocp4-j52w2 platform: None platformStatus: type: None kind: List metadata: resourceVersion: "" selfLink: "" [root@helper openshift]# oc edit infrastructure infrastructure.config.openshift.io/cluster edited [root@helper openshift]# oc get infrastructure -o yaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2020-07-24T12:18:06Z" generation: 2 name: cluster resourceVersion: "32704" selfLink: /apis/config.openshift.io/v1/infrastructures/cluster uid: 09e21c21-e9ab-4686-880a-7ab31e0ac80f spec: cloudConfig: name: "" status: apiServerInternalURI: https://api-int.ocp4.example.com:6443 apiServerURL: https://api.ocp4.example.com:6443 etcdDiscoveryDomain: ocp4.example.com infrastructureName: ocp4-j52w2 platform: None kind: List metadata: resourceVersion: "" selfLink: "" [root@helper openshift]# oc adm upgrade --force --allow-explicit-upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-07-24-091850 Updating to release image registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-07-24-091850 [root@helper openshift]# watch oc get clusterversion [root@helper openshift]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.13 True True 10s Working towards registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-07-24-091850: downloading update [root@helper openshift]# watch oc get clusterversion [root@helper openshift]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.13 True True 24s Unable to apply 4.5.0-0.nightly-2020-07-24-091850: the workload openshift-cluster-version/cluster-version-operator has not yet successfully rolled out [root@helper openshift]# watch oc get clusterversion [root@helper openshift]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.4.13 True False False 19m cloud-credential 4.4.13 True False False 63m cluster-autoscaler 4.4.13 True False False 38m config-operator console 4.4.13 True False False 21m csi-snapshot-controller 4.4.13 True False False 23m dns 4.4.13 True False False 46m etcd 4.4.13 True False False 45m image-registry 4.4.13 True False False 39m ingress 4.4.13 True False False 24m insights 4.4.13 True False False 39m kube-apiserver 4.4.13 True False False 45m kube-controller-manager 4.4.13 True False False 45m kube-scheduler 4.4.13 True False False 45m kube-storage-version-migrator 4.4.13 True False False 23m machine-api 4.4.13 True False False 39m machine-approver machine-config 4.4.13 True False False 46m marketplace 4.4.13 True False False 39m monitoring 4.4.13 True False False 21m network 4.4.13 True False False 48m node-tuning 4.4.13 True False False 48m openshift-apiserver 4.4.13 True False False 41m openshift-controller-manager 4.4.13 True False False 39m openshift-samples 4.4.13 True False False 38m operator-lifecycle-manager 4.4.13 True False False 47m operator-lifecycle-manager-catalog 4.4.13 True False False 47m operator-lifecycle-manager-packageserver 4.4.13 True False False 42m service-ca 4.4.13 True False False 48m service-catalog-apiserver 4.4.13 True False False 48m service-catalog-controller-manager 4.4.13 True False False 48m storage 4.4.13 True False False 39m [root@helper openshift]# watch oc get clusterversion [root@helper openshift]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.5.0-0.nightly-2020-07-24-091850 True False False 38m cloud-credential 4.5.0-0.nightly-2020-07-24-091850 True False False 82m cluster-autoscaler 4.5.0-0.nightly-2020-07-24-091850 True False False 57m config-operator 4.5.0-0.nightly-2020-07-24-091850 True False False 16m console 4.5.0-0.nightly-2020-07-24-091850 True False False 8m10s csi-snapshot-controller 4.5.0-0.nightly-2020-07-24-091850 True False False 42m dns 4.5.0-0.nightly-2020-07-24-091850 True True False 65m etcd 4.5.0-0.nightly-2020-07-24-091850 True False False 64m image-registry 4.5.0-0.nightly-2020-07-24-091850 True False False 58m ingress 4.5.0-0.nightly-2020-07-24-091850 True False False 43m insights 4.5.0-0.nightly-2020-07-24-091850 True False False 58m kube-apiserver 4.5.0-0.nightly-2020-07-24-091850 True False False 64m kube-controller-manager 4.5.0-0.nightly-2020-07-24-091850 True False False 64m kube-scheduler 4.5.0-0.nightly-2020-07-24-091850 True False False 64m kube-storage-version-migrator 4.5.0-0.nightly-2020-07-24-091850 True False False 10m machine-api 4.5.0-0.nightly-2020-07-24-091850 True False False 58m machine-approver 4.5.0-0.nightly-2020-07-24-091850 True False False 11m machine-config 4.4.13 True False False 6m30s marketplace 4.5.0-0.nightly-2020-07-24-091850 True False False 9m13s monitoring 4.5.0-0.nightly-2020-07-24-091850 True False False 7m39s network 4.5.0-0.nightly-2020-07-24-091850 True False False 67m node-tuning 4.5.0-0.nightly-2020-07-24-091850 True False False 10m openshift-apiserver 4.5.0-0.nightly-2020-07-24-091850 True False False 10m openshift-controller-manager 4.5.0-0.nightly-2020-07-24-091850 True False False 58m openshift-samples 4.5.0-0.nightly-2020-07-24-091850 True False False 9m13s operator-lifecycle-manager 4.5.0-0.nightly-2020-07-24-091850 True False False 66m operator-lifecycle-manager-catalog 4.5.0-0.nightly-2020-07-24-091850 True False False 66m operator-lifecycle-manager-packageserver 4.5.0-0.nightly-2020-07-24-091850 True False False 8m59s service-ca 4.5.0-0.nightly-2020-07-24-091850 True False False 66m service-catalog-apiserver 4.4.13 True False False 67m service-catalog-controller-manager 4.4.13 True False False 67m storage 4.5.0-0.nightly-2020-07-24-091850 True False False 11m [root@helper openshift]# oc -n openshift-machine-config-operator get pods NAME READY STATUS RESTARTS AGE etcd-quorum-guard-54896968c-kzxpc 1/1 Running 0 65m etcd-quorum-guard-54896968c-prcl7 1/1 Running 0 65m etcd-quorum-guard-54896968c-xlnz2 1/1 Running 0 65m machine-config-controller-5b89ddfc68-zd8mb 1/1 Running 1 66m machine-config-daemon-68xgq 2/2 Running 0 67m machine-config-daemon-7b2dx 2/2 Running 0 45m machine-config-daemon-j6nz7 2/2 Running 0 45m machine-config-daemon-vlglq 2/2 Running 0 67m machine-config-daemon-vzhb6 2/2 Running 0 67m machine-config-operator-59bbb54b9c-nb7td 1/1 Running 0 64s machine-config-server-llnz2 1/1 Running 0 66m machine-config-server-wpwrv 1/1 Running 0 66m machine-config-server-z76jk 1/1 Running 0 66m [root@helper openshift]# oc -n openshift-machine-config-operator logs -f machine-config-operator-59bbb54b9c-nb7td I0724 13:40:37.239693 1 start.go:46] Version: 4.5.0-0.nightly-2020-07-24-091850 (Raw: v4.5.0-202007240519.p0-dirty, Hash: 99eb744f5094224edb60d88ca85d607ab151ebdf) I0724 13:40:37.244312 1 leaderelection.go:242] attempting to acquire leader lease openshift-machine-config-operator/machine-config... ^C [root@helper openshift]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-07-24-091850 True False 2m39s Cluster version is 4.5.0-0.nightly-2020-07-24-091850 [root@helper openshift]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.5.0-0.nightly-2020-07-24-091850 True False False 65m cloud-credential 4.5.0-0.nightly-2020-07-24-091850 True False False 109m cluster-autoscaler 4.5.0-0.nightly-2020-07-24-091850 True False False 84m config-operator 4.5.0-0.nightly-2020-07-24-091850 True False False 43m console 4.5.0-0.nightly-2020-07-24-091850 True False False 15m csi-snapshot-controller 4.5.0-0.nightly-2020-07-24-091850 True False False 20m dns 4.5.0-0.nightly-2020-07-24-091850 True False False 92m etcd 4.5.0-0.nightly-2020-07-24-091850 True False False 91m image-registry 4.5.0-0.nightly-2020-07-24-091850 True False False 85m ingress 4.5.0-0.nightly-2020-07-24-091850 True False False 69m insights 4.5.0-0.nightly-2020-07-24-091850 True False False 85m kube-apiserver 4.5.0-0.nightly-2020-07-24-091850 True False False 91m kube-controller-manager 4.5.0-0.nightly-2020-07-24-091850 True False False 91m kube-scheduler 4.5.0-0.nightly-2020-07-24-091850 True False False 91m kube-storage-version-migrator 4.5.0-0.nightly-2020-07-24-091850 True False False 17m machine-api 4.5.0-0.nightly-2020-07-24-091850 True False False 85m machine-approver 4.5.0-0.nightly-2020-07-24-091850 True False False 38m machine-config 4.5.0-0.nightly-2020-07-24-091850 True False False 5m22s marketplace 4.5.0-0.nightly-2020-07-24-091850 True False False 14m monitoring 4.5.0-0.nightly-2020-07-24-091850 True False False 34m network 4.5.0-0.nightly-2020-07-24-091850 True False False 93m node-tuning 4.5.0-0.nightly-2020-07-24-091850 True False False 36m openshift-apiserver 4.5.0-0.nightly-2020-07-24-091850 True False False 7m23s openshift-controller-manager 4.5.0-0.nightly-2020-07-24-091850 True False False 84m openshift-samples 4.5.0-0.nightly-2020-07-24-091850 True False False 36m operator-lifecycle-manager 4.5.0-0.nightly-2020-07-24-091850 True False False 92m operator-lifecycle-manager-catalog 4.5.0-0.nightly-2020-07-24-091850 True False False 92m operator-lifecycle-manager-packageserver 4.5.0-0.nightly-2020-07-24-091850 True False False 6m58s service-ca 4.5.0-0.nightly-2020-07-24-091850 True False False 93m storage 4.5.0-0.nightly-2020-07-24-091850 True False False 37m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |