Bug 1769931
| Summary: | Kubelet logging FATAL level leader election lost messages on master every 60-90 seconds. New: system:kube-scheduler failure to list configmaps | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> |
| Component: | kube-scheduler | Assignee: | Maciej Szulik <maszulik> |
| Status: | CLOSED DUPLICATE | QA Contact: | ge liu <geliu> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.3.0 | CC: | aos-bugs, deads, jokerman, mfojtik, mpatel, sttts, xxia |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-12-11 13:25:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Let me know if the logs requested in comment 2 are not in the must-gather. Possible fix: https://github.com/openshift/library-go/pull/586 Still an issue on 4.3.0-0.nightly-2019-12-03-094421. Now includes an additional F level message for system:kube-scheduler failure to list configmaps.
Let me know if you want a new bz.
Dec 03 21:05:25 ip-10-0-131-128 hyperkube[2536]: F1203 21:05:02.592648 1 server.go:264] leaderelection lost
Dec 03 21:05:35 ip-10-0-131-128 hyperkube[2536]: - s"&PodStatus{Phase:Running,Conditions:[]PodCondition{PodCondition{Type:Initialized,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2019-12-03 21:03:56 +0000 UTC,Reason:,Message:,},PodCondition{Type:Ready,Status:False,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2019-12-03 21:05:15 +0000 UTC,Reason:ContainersNotReady,Message:containers with unready status: [scheduler],},PodCondition{Type:ContainersReady,Status:False,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2019-12-03 21:05:15 +0000 UTC,Reason:ContainersNotReady,Message:containers with unready status: [scheduler],},PodCondition{Type:PodScheduled,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2019-12-03 20:58:24 +0000 UTC,Reason:,Message:,},},Message:,Reason:,HostIP:10.0.131.128,PodIP:10.0.131.128,StartTime:2019-12-03 20:58:24 +0000 UTC,ContainerStatuses:[]ContainerStatus{ContainerStatus{Name:scheduler,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2019-12-03 21:05:16 +0000 UTC,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:&ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:m\"\nE1203 21:04:57.705581 1 reflector.go:123] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:kube-scheduler\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"kube-system\"\nE1203 21:04:58.706954 1 reflector.go:123] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:kube-scheduler\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"kube-system\"\nE1203 21:04:59.708420 1 reflector.go:123] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:kube-scheduler\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"kube-system\"\nE1203 21:04:59.800263 1 leaderelection.go:330] error retrieving resource lock openshift-kube-scheduler/kube-scheduler: etcdserver: request timed out\nE1203 21:05:00.709747 1 reflector.go:123] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:kube-scheduler\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"kube-system\"\nE1203 21:05:01.711156 1 reflector.go:123] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:kube-scheduler\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"kube-system\"\nI1203 21:05:02.592623 1 leaderelection.go:287] failed to renew lease openshift-kube-scheduler/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded\nF1203 21:05:02.592648 1 server.go:264] leaderelection lost\n,StartedAt:2019-12-03 21:03:56 +0000 UTC,FinishedAt:2019-12-03 21:05:14 +0000 UTC,ContainerID:cri-o://2c8d7761b920953f619848f0f12d542ff230c7ea4c3c7a0f350c61333089454e,},},Ready:false,RestartCount:1,Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf2e48a865a9754f6340600cb9c6b7a015f2eaf364d105ac70825455d296136e,ImageID:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf2e48a865a9754f6340600cb9c6b7a015f2eaf364d105ac70825455d296136e,ContainerID:cri-o://ce17338d014d46786cb9e8167c9a9dc96cf1515a6cb992aa158bed5b1bdecf6a,Started:*true,},},QOSClass:Burstable,InitContainerStatuses:[]ContainerStatus{ContainerStatus{Name:wait-for-host-port,State:ContainerState{Waiting:nil,Running:nil,Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2019-12-03 21:02:59 +0000 UTC,FinishedAt:2019-12-03 21:03:00 +0000 UTC,ContainerID:cri-o://cb06f359720c302cf32c4fe08d1e3ea1ba36e35164211adbb0a4d445fbd47cf9,},},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:true,RestartCount:0,Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf2e48a865a9754f6340600cb9c6b7a015f2eaf364d105ac70825455d296136e,ImageID:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf2e48a865a9754f6340600cb9c6b7a015f2eaf364d105ac70825455d296136e,ContainerID:cri-o://cb06f359720c302cf32c4fe08d1e3ea1ba36e35164211adbb0a4d445fbd47cf9,Started:nil,},},NominatedNodeName:,PodIPs:[]PodIP{PodIP{IP:10.0.131.128,},},EphemeralContainerStatuses:[]ContainerStatus{},}",
Dec 03 21:05:35 ip-10-0-131-128 hyperkube[2536]: + Terminated: s"&ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:m\"\nE1203 21:04:57.705581 1 reflector.go:123] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:kube-scheduler\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"kube-system\"\nE1203 21:04:58.706954 1 reflector.go:123] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:kube-scheduler\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"kube-system\"\nE1203 21:04:59.708420 1 reflector.go:123] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:kube-scheduler\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"kube-system\"\nE1203 21:04:59.800263 1 leaderelection.go:330] error retrieving resource lock openshift-kube-scheduler/kube-scheduler: etcdserver: request timed out\nE1203 21:05:00.709747 1 reflector.go:123] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:kube-scheduler\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"kube-system\"\nE1203 21:05:01.711156 1 reflector.go:123] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:kube-scheduler\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"kube-system\"\nI1203 21:05:02.592623 1 leaderelection.go:287] failed to renew lease openshift-kube-scheduler/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded\nF1203 21:05:02.592648 1 server.go:264] leaderelection lost\n,StartedAt:2019-12-03 21:03:56 +0000 UTC,FinishedAt:2019-12-03 21:05:14 +0000 UTC,ContainerID:cri-o://2c8d7761b920953f619848f0f12d542ff230c7ea4c3c7a0f350c61333089454e,}",
Dec 03 21:05:35 ip-10-0-131-128 hyperkube[2536]: F1203 21:05:02.592648 1 server.go:264] leaderelection lost
The code for operators and some non-critical controller managers simply suicides and restarts. The fatal message is cosmetic in such cases. Recommend reducing severity and defering. IMO we should not have Fatal cosmetic messages. Info, Warning? Debug? Re-opening just to confirm we want Fatal Example for comment 11 is that I search Elasticsearch looking for Fxxxx messages in the OCP ops logs to look for indications for failures. In previous releases, this has been reliable. I assume the Fatal is clear that it's not gonna be changed, as for the scheduler part https://bugzilla.redhat.com/show_bug.cgi?id=1778072 exists about the configmaps problem. *** This bug has been marked as a duplicate of bug 1778072 *** |
Description of problem: journalctl -u kubelet | grep "F1107" (on 7-November) on a master node shows fatal level messages being logged every 60-90 seconds: Nov 07 16:05:03 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:06:14 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:07:26 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:08:30 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:09:59 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:11:01 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:12:17 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:13:38 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:14:58 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:16:13 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:17:30 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost Nov 07 16:18:56 mffied-mzgsv-m-0.c.openshift-qe.internal hyperkube[1870]: F1107 14:35:11.393205 1 leaderelection.go:66] leaderelection lost The cluster is nominally OK Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2019-11-07-062654 How reproducible: Always Steps to Reproduce: 1. Install a cluster with 4.3.0-0.nightly-2019-11-07-062654 2. Let it run for a while 3. oc adm node-logs <master-node> | grep "F<date>" e.g. grep F1107 on 7-Nov Actual results: Fatal errors reported every 60-90 seconds Expected results: No fatal errors in a healthy cluster Additional info: Will provide location of must-gather and node logs in follow-up comment