Bug 1881963

Summary: [release 4.6] openshift-state-metrics: Fix bug in reflector not recovering from "Too large resource version"
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.6CC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, spasquie, surbania
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1882448 (view as bug list) Environment:
Last Closed: 2020-10-27 16:44:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1880369    
Bug Blocks: 1882448    
Attachments:
Description Flags
openshift-state-metrics container logs none

Description Junqi Zhao 2020-09-23 13:50:31 UTC
Created attachment 1716055 [details]
openshift-state-metrics container logs

Description of problem:
# oc -n openshift-monitoring logs $(oc -n openshift-monitoring get po | grep openshift-state-metrics | awk '{print $1}') -c openshift-state-metrics | grep "Too large resource version" | tail -n 3
E0922 04:58:46.887235       1 reflector.go:178] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to list *v1.Build: Timeout: Too large resource version: 47360, current: 43789
E0922 05:19:35.734025       1 reflector.go:178] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to list *v1.DeploymentConfig: Timeout: Too large resource version: 128797, current: 113605
E0922 05:20:19.653911       1 reflector.go:178] github.com/openshift/openshift-state-metrics/pkg/collectors/builder.go:228: Failed to list *v1.DeploymentConfig: Timeout: Too large resource version: 128797, current: 113605

full logs see the attached file
Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-09-21-230455
# /usr/bin/openshift-state-metrics --version
version.Version{GitCommit:"1ba41a4", BuildDate:"2020-09-19T21:43:31Z", Release:"v4.6.0-202009192030.p0", GoVersion:"go1.14.7", Compiler:"gc", Platform:"linux/amd64"}


How reproducible:
not often, the last time to see this error was E0922 05:20:19.653911, no such error now

Steps to Reproduce:
1. oc -n openshift-monitoring logs $(oc -n openshift-monitoring get po | grep openshift-state-metrics | awk '{print $1}') -c openshift-state-metrics
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Lili Cosic 2020-09-23 13:52:46 UTC
Wonder if its related to the client-go bug? https://bugzilla.redhat.com/show_bug.cgi?id=1881079

Comment 2 Simon Pasquier 2020-09-23 14:39:29 UTC
(In reply to Lili Cosic from comment #1)
> Wonder if its related to the client-go bug?
> https://bugzilla.redhat.com/show_bug.cgi?id=1881079

I'd say so.

Comment 7 errata-xmlrpc 2020-10-27 16:44:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196