Bug 1810290

Summary: Openshift-state-metrics logs say "the server is currently unable to handle the request"
Product: OpenShift Container Platform Reporter: Lili Cosic <lcosic>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED INSUFFICIENT_DATA QA Contact: ge liu <geliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.3.zCC: alegrand, anpicker, aos-bugs, erooth, kakkoyun, lcosic, mfojtik, mloibl, pkrupa, sbatsche, skolicha, surbania
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-14 13:22:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lili Cosic 2020-03-04 21:40:38 UTC
Description of problem:
In openshift-state-metrics logs we see the following in our long lived cluster:
E0225 00:17:47.240447       1 reflector.go:125] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to list *v1.Group: the server is currently unable to handle the request (get groups.user.openshift.io)
E0225 00:17:50.313720       1 reflector.go:125] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to list *v1.Group: the server is currently unable to handle the request (get groups.user.openshift.io)
E0225 00:21:40.713378       1 reflector.go:283] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.Route: the server is currently unable to handle the request (get routes.route.openshift.io)
E0225 00:21:43.784759       1 reflector.go:283] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.Route: the server is currently unable to handle the request (get routes.route.openshift.io)
E0225 00:21:46.859237       1 reflector.go:125] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to list *v1.Route: the server is currently unable to handle the request (get routes.route.openshift.io)
E0225 00:26:57.130028       1 reflector.go:283] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
E0225 00:27:00.200765       1 reflector.go:283] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
E0225 00:27:03.272599       1 reflector.go:125] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to list *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
E0225 00:27:06.347446       1 reflector.go:125] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to list *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
E0225 00:27:09.416454       1 reflector.go:283] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
E0225 00:27:12.488150       1 reflector.go:125] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to list *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)

Version-Release number of selected component (if applicable):
4.3.1

Steps to Reproduce:
1. Login to our long lived cluster
2. Go to openshift-state-metrics pod logs

Comment 7 Michal Fojtik 2020-05-12 10:59:42 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

As such, we're marking this bug as "LifecycleStale" and decreasing the severity. 

If you have further information on the current state of the bug, please update it, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

Comment 8 Sam Batschelet 2020-05-14 13:00:43 UTC
We need more logs to debug this. The etcd logs above point to etcd-0 being down.

> 2020-02-23 02:16:25.231688 W | etcdserver: failed to reach the peerURL(https://etcd-0.fbranczy-llc.observatorium.io:2380) of member 39d1db31380f7845 (Get https://etcd-0.fbranczy-llc.observato

the other terminated containers exited 0 so this looks like a possible disruption from an upgrade?

Lil do you have any more problems with this?