Description of problem: https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-openstack-4.2/69 Sep 01 12:00:39.296 I ns/openshift-kube-scheduler-operator deployment/openshift-kube-scheduler-operator Status for clusteroperator/kube-scheduler changed: Degraded message changed from "NodeControllerDegraded: All master node(s) are ready" to "StaticPodsDegraded: nodes/ci-op-wn1h3kbf-qkj68-master-0 pods/openshift-kube-scheduler-ci-op-wn1h3kbf-qkj68-master-0 container=\"scheduler\" is not ready\nStaticPodsDegraded: nodes/ci-op-wn1h3kbf-qkj68-master-0 pods/openshift-kube-scheduler-ci-op-wn1h3kbf-qkj68-master-0 container=\"scheduler\" is terminated: \"Error\" - \"configmaps\\\" in API group \\\"\\\" in the namespace \\\"openshift-kube-scheduler\\\"\\nE0901 12:00:10.828247 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.ReplicationController: replicationcontrollers is forbidden: User \\\"system:kube-scheduler\\\" cannot list resource \\\"replicationcontrollers\\\" in API group \\\"\\\" at the cluster scope\\nE0901 12:00:10.942694 1 webhook.go:107] Failed to make webhook authenticator request: tokenreviews.authentication.k8s.io is forbidden: User \\\"system:kube-scheduler\\\" cannot create resource \\\"tokenreviews\\\" in API group \\\"authentication.k8s.io\\\" at the cluster scope\\nE0901 12:00:10.942755 1 authentication.go:65] Unable to authenticate the request due to an error: [invalid bearer token, tokenreviews.authentication.k8s.io is forbidden: User \\\"system:kube-scheduler\\\" cannot create resource \\\"tokenreviews\\\" in API group \\\"authentication.k8s.io\\\" at the cluster scope]\\nE0901 12:00:11.289880 1 event.go:247] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:\\\"\\\", APIVersion:\\\"\\\"}, ObjectMeta:v1.ObjectMeta{Name:\\\"\\\", GenerateName:\\\"\\\", Namespace:\\\"\\\", SelfLink:\\\"\\\", UID:\\\"\\\", ResourceVersion:\\\"\\\", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:\\\"\\\", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'ci-op-wn1h3kbf-qkj68-master-0_28ebfa3f-ccab-11e9-ba3b-fa163ecbf40b stopped leading'\\nI0901 12:00:11.290047 1 leaderelection.go:263] failed to renew lease openshift-kube-scheduler/kube-scheduler: timed out waiting for the condition\\nF0901 12:00:11.290075 1 server.go:247] leaderelection lost\\n\"\nNodeControllerDegraded: All master node(s) are ready" Version-Release number of selected component (if applicable): How reproducible: sometimes Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I went through the logs and I don't see any problem with the scheduler, the operator is working as expected and scheduler is working properly. If there's a problem it looks like a problem with either the nodes being available which might be in turn a problem with openstack infrastructure. I'm closing this, if you thing the problem still exists please direct the bug at a specific component that is failing and not at a component that you happen to find a log matching it.
The root cause is MCO has not finished the upgrade, so kube-apiserver is not ready (degraded) which in turn casues kube-scheduler to fail as well. I'll pass this over to the MCO team for an investigation.
The SDN container seems to be crash looping. I'm moving this over to the networking team, although sine this bug is somewhat old it would be good to see if this is still an issue.
I see the issue; it seems to be slow SDN startup time in concert with a poorly written liveness check on one of the nodes. Maybe that node is just slow or had other connectivity issues. We fixed that in 1761609. I see that CI has been reasonably green (though the release jobs are a trainwreck.. not this problem), so I think this is fixed. *** This bug has been marked as a duplicate of bug 1761609 ***