Bug 1947740

Summary: [single-node] "Failed to watch" errors in openshift-state-metrics container
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Prashant Balachandran <pnair>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.8CC: anpicker, dgrisonn, erooth, hongyli, lcosic, spasquie
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:29:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
monitoring dump file none

Description Junqi Zhao 2021-04-09 06:06:01 UTC
Created attachment 1770495 [details]
monitoring dump file

Description of problem:
this error is only happen in single-node cluster, no such issue in multiple nodes cluster, the error does not affect the cluster function
# oc -n openshift-monitoring logs -c openshift-state-metrics openshift-state-metrics-655c67f78b-vxcgm
...
E0409 01:18:01.441471       1 reflector.go:127] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.Group: failed to list *v1.Group: the server is currently unable to handle the request (get groups.user.openshift.io)
E0409 01:18:06.243515       1 reflector.go:127] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.DeploymentConfig: failed to list *v1.DeploymentConfig: the server is currently unable to handle the request (get deploymentconfigs.apps.openshift.io)
E0409 01:18:10.666784       1 reflector.go:127] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.BuildConfig: failed to list *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
E0409 01:18:29.303632       1 reflector.go:127] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.Route: failed to list *v1.Route: the server is currently unable to handle the request (get routes.route.openshift.io)
E0409 01:18:33.649381       1 reflector.go:127] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.Build: failed to list *v1.Build: the server is currently unable to handle the request (get builds.build.openshift.io)
E0409 01:18:47.100638       1 reflector.go:127] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.DeploymentConfig: failed to list *v1.DeploymentConfig: the server is currently unable to handle the request (get deploymentconfigs.apps.openshift.io)
...

# oc get routes.route.openshift.io -A 
NAMESPACE                  NAME                HOST/PORT                                                                              PATH   SERVICES            PORT    TERMINATION            WILDCARD
openshift-authentication   oauth-openshift     oauth-openshift.apps.***.qe.devcluster.openshift.com                                 oauth-openshift     6443    passthrough/Redirect   None
openshift-console          console             console-openshift-console.apps.***.qe.devcluster.openshift.com                       console             https   reencrypt/Redirect     None
openshift-console          downloads           downloads-openshift-console.apps.***.qe.devcluster.openshift.com                     downloads           http    edge/Redirect          None
...
# oc get buildconfigs.build.openshift.io -A
No resources found

Version-Release number of selected component (if applicable):
# oc version
Client Version: 4.8.0-0.nightly-2021-04-08-200632
Server Version: 4.8.0-0.nightly-2021-04-08-200632
Kubernetes Version: v1.21.0-rc.0+6d27558


How reproducible:
only happen in single-node cluster

Steps to Reproduce:
1. see the description
2.
3.

Actual results:
"Failed to watch" errors in openshift-state-metrics container

Expected results:
no error

Additional info:

Comment 2 Damien Grisonnet 2021-04-28 11:42:26 UTC
This seems to be a side-effect of bug 1948311.

Comment 3 Prashant Balachandran 2021-06-08 12:11:11 UTC
This is happening only during cluster start up and it settles down later, checking if this can be reduced by using a back off.

Comment 5 Junqi Zhao 2021-08-25 06:25:53 UTC
checked with 4.9.0-0.nightly-2021-08-24-203710, the error is shown during the cluster start up, wait for a while, no such error later, some of the resources are not exist in the cluster after a few hours, we can ignore the errors, they are expected; some resources error only shown during the start up, no such error later
# date -u
Wed Aug 25 06:18:02 UTC 2021

# oc -n openshift-monitoring logs $(oc -n openshift-monitoring get po | grep openshift-state-metrics | awk '{print $1}') -c openshift-state-metrics  | tail
E0825 04:34:13.931052       1 reflector.go:138] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.BuildConfig: failed to list *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
E0825 04:34:18.754009       1 reflector.go:138] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.BuildConfig: failed to list *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
E0825 04:34:31.089581       1 reflector.go:138] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.BuildConfig: failed to list *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
E0825 04:34:44.931323       1 reflector.go:138] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.Build: failed to list *v1.Build: Get "https://172.30.0.1:443/apis/build.openshift.io/v1/builds?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: connection refused
E0825 04:34:52.740182       1 reflector.go:138] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.BuildConfig: failed to list *v1.BuildConfig: Get "https://172.30.0.1:443/apis/build.openshift.io/v1/buildconfigs?resourceVersion=14328": dial tcp 172.30.0.1:443: connect: connection refused
E0825 04:35:06.285500       1 reflector.go:138] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.DeploymentConfig: failed to list *v1.DeploymentConfig: Get "https://172.30.0.1:443/apis/apps.openshift.io/v1/deploymentconfigs?resourceVersion=12779": dial tcp 172.30.0.1:443: connect: connection refused
E0825 04:35:12.384875       1 reflector.go:138] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.Route: failed to list *v1.Route: the server is currently unable to handle the request (get routes.route.openshift.io)
E0825 04:35:34.878037       1 reflector.go:138] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.BuildConfig: failed to list *v1.BuildConfig: the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
E0825 04:35:36.659014       1 reflector.go:138] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.Build: failed to list *v1.Build: the server is currently unable to handle the request (get builds.build.openshift.io)
E0825 04:35:43.470800       1 reflector.go:138] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to watch *v1.DeploymentConfig: failed to list *v1.DeploymentConfig: the server is currently unable to handle the request (get deploymentconfigs.apps.openshift.io)

# oc get clusterversion  version -oyaml
...
metadata:
  creationTimestamp: "2021-08-25T04:27:11Z"
...
 oc get deploymentconfigs.apps.openshift.io -A
No resources found
# oc get builds.build.openshift.io  -A
No resources found
# oc get routes.route.openshift.io -A
NAMESPACE                  NAME                HOST/PORT                                                                                           PATH   SERVICES            PORT    TERMINATION            WILDCARD
openshift-authentication   oauth-openshift     oauth-openshift.apps.ci-ln-hs7k7q2-d5d6b.origin-ci-int-aws.dev.rhcloud.com                                 oauth-openshift     6443    passthrough/Redirect   None
...

Comment 12 errata-xmlrpc 2021-10-18 17:29:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759