Bug 1951815

Summary: Reduce number of kubelet WATCH requests
Product: OpenShift Container Platform Reporter: Evan Cordell <ecordell>
Component: NodeAssignee: Elana Hashman <ehashman>
Node sub component: Kubelet QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: akashem, andbartl, aos-bugs, bjarolim, dahernan, dgautam, ecordell, jiazha, krizza, nhale, openshift-bugs-escalate, pducai, skolicha
Version: 4.6   
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Kubelet can sometimes open a large number of WATCH requests for secrets and configmaps, particularly on node reboot. Consequence: The API servers may be overwhelmed under load. Fix: Reduce the number of kubelet WATCH requests. Result: Load is reduced on API servers.
Story Points: ---
Clone Of: 1943704 Environment:
Last Closed: 2021-05-19 15:16:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1939734    
Bug Blocks: 1960002    

Comment 1 Elana Hashman 2021-04-20 22:11:54 UTC
Already patched in 4.8.0: https://github.com/openshift/kubernetes/commit/57a3b0abd678c66a9a04e553b6d6ae49671a4779

Hence, not a blocker.

Requested backports to 4.6 and 4.7.

Comment 2 Elana Hashman 2021-04-22 23:39:52 UTC
I have a PR up: https://github.com/openshift/kubernetes/pull/692

Patch is pending verification of https://bugzilla.redhat.com/show_bug.cgi?id=1939734 where this was initially reported.

Comment 5 Sunil Choudhary 2021-05-12 19:00:25 UTC
Checked on 4.7.0-0.nightly-2021-05-12-004740, rebooted  node multiple times.
I see the number of watch calls are low.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-05-12-004740   True        False         119m    Cluster version is 4.7.0-0.nightly-2021-05-12-004740

$ oc get nodes
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-134-125.us-east-2.compute.internal   Ready    worker   136m   v1.20.0+75370d3
ip-10-0-139-107.us-east-2.compute.internal   Ready    master   141m   v1.20.0+75370d3
ip-10-0-182-213.us-east-2.compute.internal   Ready    worker   136m   v1.20.0+75370d3
ip-10-0-187-71.us-east-2.compute.internal    Ready    master   145m   v1.20.0+75370d3
ip-10-0-193-213.us-east-2.compute.internal   Ready    master   145m   v1.20.0+75370d3
ip-10-0-194-243.us-east-2.compute.internal   Ready    worker   136m   v1.20.0+75370d3

$ oc debug node/ip-10-0-139-107.us-east-2.compute.internal
Starting pod/ip-10-0-139-107us-east-2computeinternal-debug ...
...

sh-4.4# journalctl | grep -i "Starting reflector" | wc -l
252

Comment 7 errata-xmlrpc 2021-05-19 15:16:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1550