Hide Forgot
Created attachment 1542740 [details] clsuter operator information
Created attachment 1542741 [details] cluster-version-operator pod logs
From the attached information: $ yaml2json <cluster-scoped-resources/config.openshift.io/clusteroperators/kube-controller-manager.yaml | jq -r '.status.conditions[0].message' StaticPodsFailing: pods "kube-controller-manager-ip-10-0-156-25.us-east-2.compute.internal" not found StaticPodsFailing: pods "kube-controller-manager-ip-10-0-173-165.us-east-2.compute.internal" not found StaticPodsFailing: pods "kube-controller-manager-ip-10-0-137-205.us-east-2.compute.internal" not found RevisionControllerFailing: configmaps "service-ca" not found $ yaml2json <namespaces/openshift-kube-controller-manager-operator/pods/kube-controller-manager-operator-58d697bb6-mchxr/kube-controller-manager-operator-58d697bb6-mchxr.yaml | jq '{node: .spec.nodeName, containerID: .status.containerStatuses[0].containerID}' { "node": "ip-10-0-137-205.us-east-2.compute.internal", "containerID": "cri-o://62cae2c3f682567074976d9320e3c646ac2b43c4acbd961cacdf92a59a0625af" } Can you supply logs for kube-controller-manager-operator-58d697bb6-mchxr? Possibly via SSHing into ip-10-0-137-205.us-east-2.compute.internal and using crictl?
The cluster-version-operator logs from comment 4 looks like the wrong pod: # oc get po -n openshift-cluster-version --config=/root/03-10-151536/auth/kubeconfig NAME READY STATUS RESTARTS AGE cluster-version-operator-5464f7d694-nxsdn 1/1 Running 0 50m # oc logs cluster-version-operator-5464f7d694-nxsdn -n openshift-cluster-version --config=/root/03-10-151536/auth/kubeconfig I0311 04:29:14.341714 1 start.go:23] ClusterVersionOperator v4.0.21-1-dirty ... I0311 04:29:14.438635 1 leaderelection.go:185] attempting to acquire leader lease openshift-cluster-version/version... E0311 04:29:14.439211 1 leaderelection.go:234] error retrieving resource lock openshift-cluster-version/version: Get https://127.0.0.1:6443/api/v1/namespaces/openshift-cluster-version/configmaps/version: d ial tcp 127.0.0.1:6443: connect: connection refused ... E0311 04:32:48.210526 1 leaderelection.go:234] error retrieving resource lock openshift-cluster-version/version: Get https://127.0.0.1:6443/api/v1/namespaces/openshift-cluster-version/configmaps/version: d ial tcp 127.0.0.1:6443: connect: connection refused I0311 04:32:48.210592 1 leaderelection.go:190] failed to acquire lease openshift-cluster-version/version I0311 04:33:21.715170 1 leaderelection.go:253] lock is held by ip-10-0-12-56_21e09c57-44e7-4ac5-829a-1bd01aeea748 and has not yet expired I0311 04:33:21.715192 1 leaderelection.go:190] failed to acquire lease openshift-cluster-version/version ... I0311 05:19:35.370656 1 leaderelection.go:253] lock is held by ip-10-0-12-56_5c46864e-d750-4527-8c79-af0ccf697a22 and has not yet expired I0311 05:19:35.370677 1 leaderelection.go:190] failed to acquire lease openshift-cluster-version/version Is there another cluster-version-operator container holding the lock? Maybe the CVO running in a static pod on the bootstrap node is still going?
Created attachment 1542755 [details] kubeconfig file for debug env
So I'm not familiar with the kube-controller-manager-operator logs, but these entries stick out from the comment 3 upload: $ cat info/namespaces/openshift-kube-controller-manager-operator/pods/kube-controller-manager-operator-58d697bb6-mchxr/operator/operator/logs/current.log 2019-03-11T04:35:21.32849072Z W0311 04:35:21.328358 1 cmd.go:134] Using insecure, self-signed certificates 2019-03-11T04:35:21.328667218Z I0311 04:35:21.328628 1 crypto.go:493] Generating new CA for kube-controller-manager-operator-signer@1552278921 cert, and key in /tmp/serving-cert-638759439/serving-signer.cr t, /tmp/serving-cert-638759439/serving-signer.key 2 ... 2019-03-11T04:35:22.839219237Z I0311 04:35:22.839144 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'StatusNotFound' Unable to determine current operator status for kube-controller-manager ... 2019-03-11T04:35:22.849038357Z I0311 04:35:22.848991 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MissingVersion' no image found for operand pod 2 ... 2019-03-11T04:35:22.858857057Z I0311 04:35:22.858828 1 status_controller.go:100] clusteroperator/kube-controller-manager not found ... 2019-03-11T04:35:22.865776807Z I0311 04:35:22.865735 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RevisionTriggered' new revision 1 triggered by "configmap \"kube-controller-manager-pod\" not found" 2 2019-03-11T04:35:22.870588837Z I0311 04:35:22.870555 1 status_controller.go:152] clusteroperator/kube-controller-manager diff {"status":{"conditions":[{"lastTransitionTime":"2019-03-11T04:35:22Z","reason":"NoData","status":"Unknown","type":"Failing"},{"lastTransitionTime":"2019-03-11T04:35:22Z","reason":"NoData","status":"Unknown","type":"Progressing"},{"lastTransitionTime":"2019-03-11T04:35:22Z","reason":"NoData","status":"Unknown","type":"Available"},{"lastTransitionTime":"2019-03-11T04:35:22Z","reason":"NoData","status":"Unknown","type":"Upgradeable"}],"relatedObjects":[{"group":"operator.openshift.io","name":"cluster","resource":"kubecontrollermanagers"},{"group":"","name":"openshift-config","resource":"namespaces"},{"group":"","name":"openshift-config-managed","resource":"namespaces"},{"group":"","name":"openshift-kube-controller-manager","resource":"namespaces"},{"group":"","name":"openshift-kube-controller-manager-operator","resource":"namespaces"}],"versions":[{"name":"operator","version":"4.0.0-0.nightly-2019-03-10-151536"}]}} 2019-03-11T04:35:22.875119059Z W0311 04:35:22.875076 1 reflector.go:270] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: watch of *v1.Infrastructure ended with: too old resource version: 2514 (2515) 2019-03-11T04:35:22.87897499Z I0311 04:35:22.878928 1 status_controller.go:152] clusteroperator/kube-controller-manager diff {"status":{"conditions":[{"lastTransitionTime":"2019-03-11T04:35:22Z","reason":"AsExpected","status":"False","type":"Failing"},{"lastTransitionTime":"2019-03-11T04:35:22Z","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2019-03-11T04:35:22Z","message":"Available: 0 nodes are active; ","reason":"Available","status":"False","type":"Available"},{"lastTransitionTime":"2019-03-11T04:35:22Z","reason":"NoData","status":"Unknown","type":"Upgradeable"}]}} 2019-03-11T04:35:22.880100138Z I0311 04:35:22.880052 1 prune_controller.go:141] no revision IDs currently eligible to prune 2019-03-11T04:35:22.880100138Z I0311 04:35:22.880087 1 prune_controller.go:325] No excluded revisions to prune, skipping 2019-03-11T04:35:22.880330252Z I0311 04:35:22.880289 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MissingVersion' no image found for operand pod 2019-03-11T04:35:22.880364155Z I0311 04:35:22.880319 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MissingVersion' no image found for operand pod 2019-03-11T04:35:22.884168334Z I0311 04:35:22.884121 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for operator kube-controller-manager changed: Failing changed from Unknown to False (""),Progressing changed from Unknown to False (""),Available changed from Unknown to False ("Available: 0 nodes are active; ") ... 2019-03-11T04:35:22.94148869Z I0311 04:35:22.941449 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'MasterNodeObserved' Observed new master node ip-10-0-137-205.us-east-2.compute.internal 2019-03-11T04:35:22.941586647Z I0311 04:35:22.941543 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'MasterNodeObserved' Observed new master node ip-10-0-156-25.us-east-2.compute.internal 2019-03-11T04:35:22.941649924Z I0311 04:35:22.941622 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'MasterNodeObserved' Observed new master node ip-10-0-173-165.us-east-2.compute.internal ... 2019-03-11T04:35:22.9447304Z I0311 04:35:22.944692 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MissingVersion' no image found for operand pod 2019-03-11T04:35:22.946973321Z E0311 04:35:22.946936 1 node_controller.go:153] key failed with : Operation cannot be fulfilled on kubecontrollermanagers.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again 2019-03-11T04:35:22.947259456Z I0311 04:35:22.947198 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'MasterNodeObserved' Observed new master node ip-10-0-173-165.us-east-2.compute.internal 2019-03-11T04:35:22.947291624Z I0311 04:35:22.947250 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'MasterNodeObserved' Observed new master node ip-10-0-137-205.us-east-2.compute.internal 2019-03-11T04:35:22.947291624Z I0311 04:35:22.947270 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'MasterNodeObserved' Observed new master node ip-10-0-156-25.us-east-2.compute.internal 2019-03-11T04:35:22.949994533Z I0311 04:35:22.949942 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MissingVersion' no image found for operand pod ... 2019-03-11T04:35:23.853895309Z I0311 04:35:23.853853 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operat or", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RoleCreated' Created Role.rbac.authorization.k8s.io/prometheus-k8s -n openshift- kube-controller-manager because it was missing 2019-03-11T04:35:23.860643424Z I0311 04:35:23.860601 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operat or", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RoleBindingCreated' Created RoleBinding.rbac.authorization.k8s.io/prometheus-k8s -n openshift-kube-controller-manager because it was missing 2019-03-11T04:35:23.863920061Z I0311 04:35:23.863875 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'ServiceMonitorCreateFailed' Failed to create ServiceMonitor.monitoring.coreos.com/v1: the server could not find the requested resource ... 2019-03-11T04:35:23.881352937Z E0311 04:35:23.881311 1 monitoring_resource_controller.go:183] key failed with : the server could not find the requested resource 2019-03-11T04:35:23.881389567Z I0311 04:35:23.881345 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'ServiceMonitorCreateFailed' Failed to create ServiceMonitor.monitoring.coreos.com/v1: the server could not find the requested resource 2019-03-11T04:35:23.890637085Z E0311 04:35:23.890603 1 monitoring_resource_controller.go:183] key failed with : the server could not find the requested resource 2019-03-11T04:35:23.890692385Z I0311 04:35:23.890655 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'ServiceMonitorCreateFailed' Failed to create ServiceMonitor.monitoring.coreos.com/v1: the server could not find the requested resource 2019-03-11T04:35:23.900375875Z E0311 04:35:23.900341 1 monitoring_resource_controller.go:183] key failed with : the server could not find the requested resource 2019-03-11T04:35:23.900434053Z I0311 04:35:23.900394 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'ServiceMonitorCreateFailed' Failed to create ServiceMonitor.monitoring.coreos.com/v1: the server could not find the requested resource ... 2019-03-11T04:35:26.039139207Z I0311 04:35:26.039092 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SATokenSignerControllerStuck' unexpected addresses: 10.0.12.56 ... 2019-03-11T04:35:42.439638776Z E0311 04:35:42.439603 1 revision_controller.go:316] key failed with : synthetic requeue request (err: <nil>) 2019-03-11T04:35:42.443557297Z I0311 04:35:42.443509 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-opera\ tor", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RevisionTriggered' new revision 1 triggered by "configmap \"kube-controller-ma\ nager-pod-0\" not found" ... 2019-03-11T04:52:26.039606782Z E0311 04:52:26.039583 1 monitoring_resource_controller.go:183] key failed with : the server could not find the requested resource 2019-03-11T04:52:26.03969321Z I0311 04:52:26.039602 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'ServiceMonitorCreateFailed' Failed to create ServiceMonitor.monitoring.coreos.com/v1: the server could not find the requested resource 2 ... 2019-03-11T04:35:42.443557297Z I0311 04:35:42.443509 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-controller-manager-operator", Name:"kube-controller-manager-operator", UID:"0d87095d-43b7-11e9-8035-02b844793e06", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RevisionTriggered' new revision 1 triggered by "configmap \"kube-controller-manager-pod-0\" not found" Maybe the master folks will be able to focus in on anything suspicious in there about why the operand pod is not coming up (or the full log from comment 3).
This stood up from the clusteroperator status: RevisionControllerFailing: configmaps "service-ca" not found I guess this prevented the pods from being created.
The issue still not fix in nightly build, 4.0.0-0.nightly-2019-03-11-133421 This blocked all testing against the latest build.
The service-CA operator [1] is young, and the installer only started delegating to that operator recently [2]. Maybe there's an issue with these nightly builds not including the operator in their payload? $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-11-133421 | grep service-ca-operator ... no hits ... For comparison: $ oc adm release info --commits registry.svc.ci.openshift.org/openshift/origin-release:v4.0 | grep service-ca-operator service-ca https://github.com/openshift/service-ca-operator ffa65e08e01a5ee1b325842d26376cacc90c7e78 I'm moving this over to the Release team, since they can probably figure out any issues with getting the service-CA operator more quickly. [1]: https://github.com/openshift/service-ca-operator [2]: https://github.com/openshift/installer/pull/1208
There were a number of steps to this fix, but the biggest was [1]. [1]: https://github.com/openshift/ocp-build-data/commit/35cd2176ba852327a29774fc7407b61672184e61
Remove beta3blocker, 4.0.0-0.nightly-2019-03-13-233958 already fixed this issue RHCOS build: 400.7.20190306.0 Please change to ON_QA
As per Comment 22, set it to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758