Bug 2102777 - OCP upgrade from 4.10 to 4.11 got stuck at machine-config due to stuck reboot
Summary: OCP upgrade from 4.10 to 4.11 got stuck at machine-config due to stuck reboot
Keywords:
Status: CLOSED DUPLICATE of bug 2100456
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Ben Nemec
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-30 15:49 UTC by yliu1
Modified: 2022-08-15 20:20 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-12 15:07:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description yliu1 2022-06-30 15:49:37 UTC
Note:
1. if you are dealing with Machines or MachineSet objects, please select the component as "Cloud Compute" under same product.
2. if you are dealing with kubelet / kubeletconfigs / container runtime configs, please select the component as "Node" under same product.

Description of problem:
During 4.10.20 to 4.11.0-rc.0 OCP upgrade, machine-config got stuck. Note that test pods are added to the SNO cluster before upgrade started. (All other components were successfully upgraded to 4.11.0-rc.0 via "oc get co")

machine-config                             4.10.20       True        True          False      21h     Working towards 4.11.0-rc.0

From the machine config log, it looks like the reboot was triggered but stuck. 

update.go:1977] Rebooting node
2022-06-30T02:45:21.060670363+00:00 stderr F I0630 02:45:21.060638  796222 update.go:2007] Removing SIGTERM protection
2022-06-30T02:45:21.060772523+00:00 stderr F I0630 02:45:21.060734  796222 update.go:1977] initiating reboot: Node will reboot into config rendered-master-212bbb30b9272b2bff610f503c45c2d4
2022-06-30T02:45:22.426016359+00:00 stderr F I0630 02:45:22.425077  796222 daemon.go:934] Shutting down MachineConfigDaemon

Version-Release number of MCO (Machine Config Operator) (if applicable):

Platform : Openshift SNO on baremetal

Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
Not sure

How reproducible:
Not sure yet. The node is stuck. 

Steps to Reproduce:
1. Deploy SNO with DU profile (rt kernel, 4 cpus reserved for platform, sriov, pao, logging, local-storage, ptp operators are deployed)
2. Launch test application. Test app specs are here:
https://gitlab.cee.redhat.com/cnf/ran-test-app/-/blob/master/README.md
3. Trigger OCP upgrade from 4.10.20 to 4.11.0-rc.0.

Actual results:
upgrade stuck at machine-config

Expected results:
upgrade succeeded.

Additional info:

1. Please consider attaching a must-gather archive (via oc adm must-gather). Please review must-gather contents for sensitive information before attaching any must-gathers to a Bugzilla report. You may also mark the bug private if you wish.

** must-gather stuck. 
$ oc adm must-gather 
[must-gather      ] OUT the server is currently unable to handle the request (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT 
[must-gather      ] OUT Using must-gather plug-in image: registry.redhat.io/openshift4/ose-must-gather:latest
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: eb9dfd6e-f481-4c9c-94c7-e171457f30fa
ClusterVersion: Updating to "4.11.0-rc.0" from "4.10.20" for 14 hours: Working towards 4.11.0-rc.0: 678 of 802 done (84% complete)
ClusterOperators:
	clusteroperator/machine-config is not upgradeable because One or more machine config pools are updating, please see `oc get mcp` for further details


[must-gather      ] OUT namespace/openshift-must-gather-6bgrf created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-tqw4v created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-tqw4v deleted
[must-gather      ] OUT namespace/openshift-must-gather-6bgrf deleted


When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: eb9dfd6e-f481-4c9c-94c7-e171457f30fa
ClusterVersion: Updating to "4.11.0-rc.0" from "4.10.20" for 14 hours: Working towards 4.11.0-rc.0: 678 of 802 done (84% complete)
ClusterOperators:
	clusteroperator/machine-config is not upgradeable because One or more machine config pools are updating, please see `oc get mcp` for further details


I0630 10:43:12.855484  294602 request.go:665] Waited for 1.112940325s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/machineconfiguration.openshift.io/v1?timeout=32s
I0630 10:43:22.957101  294602 request.go:665] Waited for 2.363324029s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/user.openshift.io/v1?timeout=32s
I0630 10:43:33.493485  294602 request.go:665] Waited for 1.160487998s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/image.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-config...
I0630 10:43:43.632459  294602 request.go:665] Waited for 1.355360452s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/user.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-config-managed...
I0630 10:43:55.217536  294602 request.go:665] Waited for 1.152238621s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/route.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-authentication...
I0630 10:44:24.782914  294602 request.go:665] Waited for 1.165069933s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/project.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-authentication-operator...
I0630 10:44:55.663023  294602 request.go:665] Waited for 1.164915612s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/build.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-ingress...
I0630 10:45:26.716237  294602 request.go:665] Waited for 1.153017709s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/build.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-oauth-apiserver...
I0630 10:46:28.969526  294602 request.go:665] Waited for 1.144644393s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/security.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-machine-api...
I0630 10:49:35.345802  294602 request.go:665] Waited for 1.157733546s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/security.openshift.io/v1?timeout=32s
I0630 10:49:46.370165  294602 request.go:665] Waited for 1.119115839s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/node.k8s.io/v1beta1?timeout=32s
I0630 10:49:56.381139  294602 request.go:665] Waited for 2.164421119s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/packages.operators.coreos.com/v1?timeout=32s
I0630 10:50:07.375210  294602 request.go:665] Waited for 1.163590657s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/user.openshift.io/v1?timeout=32s
I0630 10:50:17.420440  294602 request.go:665] Waited for 2.116520782s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/console.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-cloud-controller-manager-operator...
I0630 10:51:25.320869  294602 request.go:665] Waited for 1.130220946s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/user.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-cloud-controller-manager...
I0630 10:51:37.085617  294602 request.go:665] Waited for 1.142090368s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/authorization.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-cloud-credential-operator...
I0630 10:52:43.645444  294602 request.go:665] Waited for 1.155097855s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/quota.openshift.io/v1?timeout=32s
I0630 10:52:54.447605  294602 request.go:665] Waited for 1.074508271s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/local.storage.openshift.io/v1alpha1?timeout=32s
I0630 10:53:04.548578  294602 request.go:665] Waited for 2.363965936s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/quota.openshift.io/v1?timeout=32s
I0630 10:53:15.075122  294602 request.go:665] Waited for 1.165457327s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/route.openshift.io/v1?timeout=32s
I0630 10:53:25.204234  294602 request.go:665] Waited for 2.52231623s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/discovery.k8s.io/v1?timeout=32s
I0630 10:53:35.531610  294602 request.go:665] Waited for 1.111852868s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/machine.openshift.io/v1beta1?timeout=32s
I0630 10:53:45.595112  294602 request.go:665] Waited for 2.364439956s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/quota.openshift.io/v1?timeout=32s
I0630 10:53:56.123518  294602 request.go:665] Waited for 1.163525552s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/security.openshift.io/v1?timeout=32s
I0630 10:54:06.248639  294602 request.go:665] Waited for 2.520724339s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/local.storage.openshift.io/v1?timeout=32s
I0630 10:54:16.577607  294602 request.go:665] Waited for 1.115648455s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/storage.k8s.io/v1beta1?timeout=32s
I0630 10:54:26.639432  294602 request.go:665] Waited for 2.363268685s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/oauth.openshift.io/v1?timeout=32s
I0630 10:54:37.169591  294602 request.go:665] Waited for 1.162501452s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/oauth.openshift.io/v1?timeout=32s
I0630 10:54:47.297514  294602 request.go:665] Waited for 2.522570931s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/metal3.io/v1alpha1?timeout=32s
I0630 10:54:57.622659  294602 request.go:665] Waited for 1.128991429s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/events.k8s.io/v1beta1?timeout=32s
I0630 10:55:07.685249  294602 request.go:665] Waited for 2.362913891s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/authorization.openshift.io/v1?timeout=32s
I0630 10:55:18.210089  294602 request.go:665] Waited for 1.136462189s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/template.openshift.io/v1?timeout=32s
I0630 10:55:28.336045  294602 request.go:665] Waited for 2.518674786s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/admissionregistration.k8s.io/v1?timeout=32s
I0630 10:55:38.673335  294602 request.go:665] Waited for 1.12273545s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/machine.openshift.io/v1beta1?timeout=32s
I0630 10:55:48.739676  294602 request.go:665] Waited for 2.362928396s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/user.openshift.io/v1?timeout=32s
I0630 10:55:59.265118  294602 request.go:665] Waited for 1.166205557s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/route.openshift.io/v1?timeout=32s
I0630 10:56:09.392196  294602 request.go:665] Waited for 2.52630517s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/k8s.ovn.org/v1?timeout=32s
I0630 10:56:19.717412  294602 request.go:665] Waited for 1.108719387s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/network.operator.openshift.io/v1?timeout=32s
I0630 10:56:29.782508  294602 request.go:665] Waited for 2.362060191s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/route.openshift.io/v1?timeout=32s
I0630 10:56:40.311048  294602 request.go:665] Waited for 1.164439898s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/route.openshift.io/v1?timeout=32s
I0630 10:56:50.435484  294602 request.go:665] Waited for 2.520510366s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/snapshot.storage.k8s.io/v1?timeout=32s
I0630 10:57:00.898594  294602 request.go:665] Waited for 1.122649166s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/node.k8s.io/v1?timeout=32s
I0630 10:57:11.096044  294602 request.go:665] Waited for 2.365049566s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/security.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-config-operator...
I0630 10:57:45.446683  294602 request.go:665] Waited for 1.165750329s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/security.openshift.io/v1?timeout=32s
I0630 10:57:56.288475  294602 request.go:665] Waited for 1.115593748s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/autoscaling/v2beta2?timeout=32s
I0630 10:58:06.386182  294602 request.go:665] Waited for 2.362392673s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/packages.operators.coreos.com/v1?timeout=32s
I0630 10:58:16.911859  294602 request.go:665] Waited for 1.164982193s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/oauth.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-console-operator...
I0630 10:58:48.769644  294602 request.go:665] Waited for 1.161807794s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/packages.operators.coreos.com/v1?timeout=32s
Gathering data for ns/openshift-console...
I0630 10:59:00.582647  294602 request.go:665] Waited for 1.166190188s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/user.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-cluster-storage-operator...
I0630 11:01:09.012555  294602 request.go:665] Waited for 1.161756731s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/oauth.openshift.io/v1?timeout=32s
I0630 11:01:19.930771  294602 request.go:665] Waited for 1.094433947s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/sriovfec.intel.com/v2?timeout=32s
Gathering data for ns/openshift-dns-operator...
I0630 11:02:26.735569  294602 request.go:665] Waited for 1.145569277s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/route.openshift.io/v1?timeout=32s
I0630 11:02:37.533106  294602 request.go:665] Waited for 1.123411735s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/sriovnetwork.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-dns...
I0630 11:04:15.886968  294602 request.go:665] Waited for 1.122846532s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/oauth.openshift.io/v1?timeout=32s
I0630 11:04:26.712613  294602 request.go:665] Waited for 1.120478217s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/coordination.k8s.io/v1?timeout=32s
Gathering data for ns/openshift-etcd-operator...
I0630 11:05:02.806827  294602 request.go:665] Waited for 1.166340636s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/authorization.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-etcd...
I0630 11:10:42.825816  294602 request.go:665] Waited for 1.16472433s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/build.openshift.io/v1?timeout=32s
I0630 11:10:53.740948  294602 request.go:665] Waited for 1.119343391s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/machineconfiguration.openshift.io/v1?timeout=32s
I0630 11:11:03.857999  294602 request.go:665] Waited for 2.363834647s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/template.openshift.io/v1?timeout=32s
I0630 11:11:14.421783  294602 request.go:665] Waited for 1.165248251s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/build.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-image-registry...
I0630 11:13:18.655609  294602 request.go:665] Waited for 1.166576812s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/quota.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-ingress-operator...
I0630 11:14:20.761520  294602 request.go:665] Waited for 1.117392189s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/packages.operators.coreos.com/v1?timeout=32s
I0630 11:14:31.775616  294602 request.go:665] Waited for 1.120642365s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/coordination.k8s.io/v1?timeout=32s
I0630 11:14:41.787680  294602 request.go:665] Waited for 2.166128064s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/authorization.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-ingress-canary...
I0630 11:15:07.854690  294602 request.go:665] Waited for 1.165763071s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/apps.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-insights...
I0630 11:15:38.851519  294602 request.go:665] Waited for 1.118131835s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/packages.operators.coreos.com/v1?timeout=32s
I0630 11:15:51.427579  294602 request.go:665] Waited for 1.124934139s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/route.openshift.io/v1?timeout=32s
I0630 11:16:01.599524  294602 request.go:665] Waited for 2.364174156s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/route.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-kube-apiserver-operator...
I0630 11:16:40.905409  294602 request.go:665] Waited for 1.165771618s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/project.openshift.io/v1?timeout=32s
Gathering data for ns/openshift-kube-apiserver...

I0630 11:23:53.710063  294602 request.go:665] Waited for 1.047851299s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/oauth.openshift.io/v1?timeout=32s
I0630 11:24:05.183974  294602 request.go:665] Waited for 1.106091181s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/local.storage.openshift.io/v1?timeout=32s
I0630 11:24:15.353772  294602 request.go:665] Waited for 2.36489411s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/packages.operators.coreos.com/v1?timeout=32s
I0630 11:24:26.107986  294602 request.go:665] Waited for 1.131169935s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/security.openshift.io/v1?timeout=32s
I0630 11:24:36.878779  294602 request.go:665] Waited for 1.11633811s due to client-side throttling, not priority and fairness, request: GET:https://api.test-sno-1.lab.eng.rdu2.redhat.com:6443/apis/machine.openshift.io/v1beta1?timeout=32s
Gathering data for ns/openshift-kube-controller-manager...



** pod logs cannot be accessed via oc cmd. Following is the end of the machine config logs. 

$ oc logs -n openshift-machine-config-operator machine-config-daemon-czltn machine-config-daemon
Error from server (InternalError): Internal error occurred: Authorization error (user=system:kube-apiserver, verb=get, resource=nodes, subresource=proxy)

By looking at pod logs on server, it seems the node got stuck at rebooting. 
2022-06-30T02:45:15.700903222+00:00 stdout F Run "systemctl reboot" to start a reboot
2022-06-30T02:45:16.676546917+00:00 stderr F I0630 02:45:16.676219  796222 update.go:1977] Initiating switch from kernel realtime to realtime
2022-06-30T02:45:16.756636820+00:00 stderr F I0630 02:45:16.756584  796222 update.go:1977] Updating rt-kernel packages on host: ["update"]
2022-06-30T02:45:16.799364708+00:00 stderr F I0630 02:45:16.797676  796222 update.go:1962] Running: rpm-ostree update
2022-06-30T02:45:19.106876308+00:00 stdout F Checking out tree 188b97d...done
2022-06-30T02:45:19.186682609+00:00 stdout F Enabled rpm-md repositories: coreos-extensions
2022-06-30T02:45:19.192144631+00:00 stdout F rpm-md repo 'coreos-extensions' (cached); generated: 2022-06-27T23:38:35Z
2022-06-30T02:45:19.544713587+00:00 stdout F Importing rpm-md...done
2022-06-30T02:45:19.567872700+00:00 stdout F Resolving dependencies...done
2022-06-30T02:45:19.567946539+00:00 stdout F No upgrade available.
2022-06-30T02:45:21.052523320+00:00 stderr F I0630 02:45:21.052195  796222 update.go:1977] Rebooting node
2022-06-30T02:45:21.060670363+00:00 stderr F I0630 02:45:21.060638  796222 update.go:2007] Removing SIGTERM protection
2022-06-30T02:45:21.060772523+00:00 stderr F I0630 02:45:21.060734  796222 update.go:1977] initiating reboot: Node will reboot into config rendered-master-212bbb30b9272b2bff610f503c45c2d4
2022-06-30T02:45:22.426016359+00:00 stderr F I0630 02:45:22.425077  796222 daemon.go:934] Shutting down MachineConfigDaemon

# last reboot
reboot   system boot  4.18.0-372.9.1.r Thu Jun 30 02:48   still running
reboot   system boot  4.18.0-305.49.1. Thu Jun 30 02:11 - 02:46  (00:35)

2. If a must-gather is unavailable, please provide the output of:

$ oc get co machine-config -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  annotations:
    exclude.release.openshift.io/internal-openshift-hosted: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: "2022-06-29T17:32:09Z"
  generation: 1
  name: machine-config
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
    uid: 8fb7c8de-30b1-4e73-a8c9-c22c0f1dc50a
  resourceVersion: "324846"
  uid: e47d1b8f-0bd1-4df6-919d-af050de83a78
spec: {}
status:
  conditions:
  - lastTransitionTime: "2022-06-30T02:32:40Z"
    message: Working towards 4.11.0-rc.0
    status: "True"
    type: Progressing
  - lastTransitionTime: "2022-06-30T02:38:25Z"
    message: One or more machine config pools are updating, please see `oc get mcp`
      for further details
    reason: PoolUpdating
    status: "False"
    type: Upgradeable
  - lastTransitionTime: "2022-06-29T17:47:53Z"
    status: "False"
    type: Degraded
  - lastTransitionTime: "2022-06-29T17:47:54Z"
    message: Cluster has deployed [{operator 4.10.20}]
    status: "True"
    type: Available
  extension:
    master: 0 (ready 0) out of 1 nodes are updating to latest configuration rendered-master-212bbb30b9272b2bff610f503c45c2d4
    worker: all 0 nodes are at latest configuration rendered-worker-1edcda45fdce3bf53a58ed6c4240b9a1
  relatedObjects:
  - group: ""
    name: openshift-machine-config-operator
    resource: namespaces
  - group: machineconfiguration.openshift.io
    name: ""
    resource: machineconfigpools
  - group: machineconfiguration.openshift.io
    name: ""
    resource: controllerconfigs
  - group: machineconfiguration.openshift.io
    name: ""
    resource: kubeletconfigs
  - group: machineconfiguration.openshift.io
    name: ""
    resource: containerruntimeconfigs
  - group: machineconfiguration.openshift.io
    name: ""
    resource: machineconfigs
  - group: ""
    name: ""
    resource: nodes
  - group: ""
    name: openshift-kni-infra
    resource: namespaces
  - group: ""
    name: openshift-openstack-infra
    resource: namespaces
  - group: ""
    name: openshift-ovirt-infra
    resource: namespaces
  - group: ""
    name: openshift-vsphere-infra
    resource: namespaces
  versions:
  - name: operator
    version: 4.10.20


$ oc get mcp (and oc describe mcp/${degraded_pool} if pools are degraded)
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-89d6c7214ba7d725b4fc5ef6ca90a7dd   False     True       False      1              0                   0                     0                      21h
worker   rendered-worker-1edcda45fdce3bf53a58ed6c4240b9a1   True      False      False      0              0                   0                     0                      21h

$ oc get mc
NAME                                                GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                           4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             21h
00-worker                                           4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             21h
01-master-container-runtime                         4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             21h
01-master-kubelet                                   4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             21h
01-worker-container-runtime                         4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             21h
01-worker-kubelet                                   4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             21h
02-master-workload-partitioning                                                                3.2.0             21h
04-accelerated-container-startup-master                                                        3.2.0             21h
04-accelerated-container-startup-worker                                                        3.2.0             21h
05-chronyd-dynamic-master                                                                      3.2.0             21h
05-chronyd-dynamic-worker                                                                      3.2.0             21h
06-kdump-enable-master                                                                         3.2.0             21h
06-kdump-enable-worker                                                                         3.2.0             21h
50-master-dnsmasq-configuration                                                                2.2.0             21h
50-masters-chrony-configuration                                                                2.2.0             21h
50-nto-master                                                                                  3.2.0             20h
50-performance-openshift-node-performance-profile                                              3.2.0             20h
50-workers-chrony-configuration                                                                2.2.0             21h
98-just-a-test                                                                                 3.2.0             20h
99-master-generated-crio-seccomp-use-default                                                   3.2.0             21h
99-master-generated-kubelet                         4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             20h
99-master-generated-registries                      4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             21h
99-master-ssh                                                                                  3.2.0             21h
99-worker-generated-crio-seccomp-use-default                                                   3.2.0             21h
99-worker-generated-registries                      4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             21h
99-worker-ssh                                                                                  3.2.0             21h
container-mount-namespace-and-kubelet-conf-master                                              3.2.0             21h
container-mount-namespace-and-kubelet-conf-worker                                              3.2.0             21h
load-sctp-module-master                                                                        2.2.0             21h
load-sctp-module-worker                                                                        2.2.0             21h
rendered-master-212bbb30b9272b2bff610f503c45c2d4    4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             12h
rendered-master-218f5c48beb67310d12f4515b8730143    c3ac7f07f1cae32456a7ab361e29f49af7eb0802   3.2.0             15h
rendered-master-520e8b478f45a630f7416bf7ea81f4ea    c3ac7f07f1cae32456a7ab361e29f49af7eb0802   3.2.0             13h
rendered-master-84b534be57edf1bc926648d86304f490    c3ac7f07f1cae32456a7ab361e29f49af7eb0802   3.2.0             20h
rendered-master-89d6c7214ba7d725b4fc5ef6ca90a7dd    c3ac7f07f1cae32456a7ab361e29f49af7eb0802   3.2.0             13h
rendered-master-8c27739c8a0c413cff40d3e9e9a8c05f    c3ac7f07f1cae32456a7ab361e29f49af7eb0802   3.2.0             20h
rendered-master-901c4d8a629d3af906cc82ff5590fb29    c3ac7f07f1cae32456a7ab361e29f49af7eb0802   3.2.0             21h
rendered-master-f3768ff5decc9e6f07f30cccf0aa4f6b    c3ac7f07f1cae32456a7ab361e29f49af7eb0802   3.2.0             20h
rendered-worker-02b840c1bd66613f9d9a4aaa4d07b906    c3ac7f07f1cae32456a7ab361e29f49af7eb0802   3.2.0             15h
rendered-worker-1edcda45fdce3bf53a58ed6c4240b9a1    4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a   3.2.0             12h
rendered-worker-838ef9a68093ae2eb8d407531ade0887    c3ac7f07f1cae32456a7ab361e29f49af7eb0802   3.2.0             21h


$ oc get pod -n openshift-machine-config-operator
NAME                                        READY   STATUS    RESTARTS   AGE
machine-config-controller-b8d7fbbc7-rzds7   2/2     Running   0          12h
machine-config-daemon-czltn                 2/2     Running   0          12h
machine-config-operator-6d48c84476-gnvpr    1/1     Running   0          12h
machine-config-server-xgnk5                 1/1     Running   0          12h

$ oc get node -o wide
NAME                                 STATUS   ROLES           AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                         CONTAINER-RUNTIME
test-sno-1.lab.eng.rdu2.redhat.com   Ready    master,worker   21h   v1.23.5+3afdacb   10.1.190.48   <none>        Red Hat Enterprise Linux CoreOS 410.84.202206212304-0 (Ootpa)   4.18.0-305.49.1.rt7.121.el8_4.x86_64   cri-o://1.23.3-6.rhaos4.10.git74543e3.el8

3. If a node is not accessible via API, please provide console/journal/kubelet logs of the problematic node
tarball for /var/log:
https://drive.google.com/drive/folders/1SzDn7OBld7zcMy6wGYmJqwgQZ1g8YFSi?usp=sharing

4. Are there RHEL nodes on the cluster? If yes, please upload the whole Ansible logs or Jenkins job

Comment 1 Ben Nemec 2022-07-12 15:07:46 UTC
It looks like this is another instance of https://bugzilla.redhat.com/show_bug.cgi?id=2100456#c3 We're working on a fix with the NetworkManager team.

*** This bug has been marked as a duplicate of bug 2100456 ***

Comment 2 yliu1 2022-08-15 20:20:52 UTC
Reproducible from 4.10.23 to 4.11.0.
mcp updating got stuck at shutting down system. Upgrade eventually completed after manually rebooting the server.

2022-08-15T18:11:20.571928782+00:00 stdout F Removed:
2022-08-15T18:11:20.572000238+00:00 stdout F   ModemManager-glib-1.10.8-2.el8.x86_64
2022-08-15T18:11:20.572071556+00:00 stdout F   libmbim-1.20.2-1.el8.x86_64
2022-08-15T18:11:20.572389870+00:00 stdout F   libqmi-1.24.0-1.el8.x86_64
2022-08-15T18:11:20.572442417+00:00 stdout F   openvswitch2.16-2.16.0-76.el8fdp.x86_64
2022-08-15T18:11:20.572482569+00:00 stdout F   redhat-release-coreos-410.84-2.el8.x86_64
2022-08-15T18:11:20.572522012+00:00 stdout F Added:
2022-08-15T18:11:20.572560226+00:00 stdout F   WALinuxAgent-udev-2.3.0.2-2.el8_6.2.noarch
2022-08-15T18:11:20.572598527+00:00 stdout F   glibc-gconv-extra-2.28-189.5.el8_6.x86_64
2022-08-15T18:11:20.572636793+00:00 stdout F   libbpf-0.4.0-3.el8.x86_64
2022-08-15T18:11:20.572675518+00:00 stdout F   nvme-cli-1.16-3.el8.x86_64
2022-08-15T18:11:20.572713555+00:00 stdout F   openvswitch2.17-2.17.0-31.el8fdp.x86_64
2022-08-15T18:11:20.572751647+00:00 stdout F   redhat-release-8.6-0.1.el8.x86_64
2022-08-15T18:11:20.572791421+00:00 stdout F   redhat-release-eula-8.6-0.1.el8.x86_64
2022-08-15T18:11:20.572829516+00:00 stdout F   shadow-utils-subid-2:4.6-16.el8.x86_64
2022-08-15T18:11:20.573613294+00:00 stdout F Run "systemctl reboot" to start a reboot
2022-08-15T18:11:22.282168508+00:00 stderr F I0815 18:11:22.281258  824440 update.go:1991] Initiating switch from kernel realtime to realtime
2022-08-15T18:11:22.288012328+00:00 stderr F I0815 18:11:22.287967  824440 update.go:1991] Updating rt-kernel packages on host: ["update"]
2022-08-15T18:11:22.291666674+00:00 stderr F I0815 18:11:22.291639  824440 update.go:1976] Running: rpm-ostree update
2022-08-15T18:11:24.414831307+00:00 stdout F Checking out tree 5e84581...done
2022-08-15T18:11:24.488883838+00:00 stdout F Enabled rpm-md repositories: coreos-extensions
2022-08-15T18:11:24.496463584+00:00 stdout F rpm-md repo 'coreos-extensions' (cached); generated: 2022-08-03T11:48:07Z
2022-08-15T18:11:24.711890306+00:00 stdout F Importing rpm-md...done
2022-08-15T18:11:24.724547635+00:00 stdout F Resolving dependencies...done
2022-08-15T18:11:24.724787364+00:00 stdout F No upgrade available.
2022-08-15T18:11:26.553489762+00:00 stderr F I0815 18:11:26.553443  824440 update.go:1991] Rebooting node
2022-08-15T18:11:26.560119163+00:00 stderr F I0815 18:11:26.560071  824440 update.go:2021] Removing SIGTERM protection
2022-08-15T18:11:26.560209041+00:00 stderr F I0815 18:11:26.560191  824440 update.go:1991] initiating reboot: Node will reboot into config rendered-master-563530429f1165228713dedc9658c3ab
2022-08-15T18:11:27.212174684+00:00 stderr F I0815 18:11:27.210247  824440 daemon.go:934] Shutting down MachineConfigDaemon


Note You need to log in before you can comment on or make changes to this bug.