Bug 2101520

Summary: csi-snapshot-controller-operator occasionally establishes an unusual number of watch requests
Product: OpenShift Container Platform Reporter: Devan Goodwin <dgoodwin>
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Storage sub component: Operators QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: medium CC: jsafrane
Version: 4.11   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:50:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Devan Goodwin 2022-06-27 17:05:18 UTC
In investigating https://issues.redhat.com/browse/TRT-347 we found that this operator appears to be going beyond it's past expected limits for watch requests. To some extent we expect these values to grow over time, but a couple runs were identified at nearly twice the expected limit from last summer. 

Examples pulled from: https://search.ci.openshift.org/?search=Operator.*csi-snapshot-controller-operator.*produces+more+watch+requests+than+expected&maxAge=336h&context=0&type=bug%2Bjunit&name=gcp&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Note that the limit coded for GCP last summer was 134.

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/operator-framework_operator-marketplace/478/pull-ci-operator-framework-operator-marketplace-master-e2e-gcp/1539988916965216256

   "Operator \"csi-snapshot-controller-operator\" produces more watch requests than expected: watchrequestcount=256, upperbound=134, ratio=1.9104477611940298",


https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-e2e-gcp/1537183611613089792

  "Operator \"csi-snapshot-controller-operator\" produces more watch requests than expected: watchrequestcount=220, upperbound=134, ratio=1.6417910447761195",



For this bug we were just hoping for an explanation for why the number of watch requests can vary so heavily, and if it's expected. If so and this is not a concerning bug, this can be closed.

Note that this situation is very very rare, but it does happen.

Pod logs are available for both and there definitely looks to be some funky things going on with watches, but it's hard for us external to the code to know why.

We are drafting a PR to bump the limits a little in general separately, though not quite high enough to cover these outliers.

Comment 1 Fabio Bertinatto 2022-09-28 18:23:02 UTC
Seems like it fails to start a controller here, then it retries multiple times. This is likely the cause of the leaks:

> I0615 21:46:44.754123       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:45.757131       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:46.757610       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:47.757257       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:48.757348       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:49.757570       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:50.763997       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:51.758744       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:52.757022       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:53.757490       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:54.757458       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:55.757647       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:56.757318       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:57.757696       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:58.757371       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:46:59.757287       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:00.757527       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:01.757276       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:02.757422       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:03.757743       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:04.757850       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:05.758162       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:06.756991       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:07.757692       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:08.757005       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:09.757522       1 server.go:50] Error initializing delegating authentication (will retry): <nil>
> I0615 21:47:10.758219       1 server.go:50] Error initializing delegating authentication (will retry): <nil>

This is the call that's failing: https://github.com/bertinatto/library-go/blob/2ba6eb4e665d297e2337bcabe4a09efc426b4f10/pkg/config/serving/server.go#L48

However, judging from the entries below, the controller is in fact being started *at every retry*, so are the informers:

> I0615 21:47:16.164614       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.164921       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.168369       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.168434       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.168863       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.170983       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.171060       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.171955       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.172015       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.172073       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.172568       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.172632       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.172970       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.172919       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.172944       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173021       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173069       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173085       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173102       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173242       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173261       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173275       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173325       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173345       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173362       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173378       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173409       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> I0615 21:47:16.173291       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
> 
> (...)
> 
> I0615 21:47:16.371445       1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
> I0615 21:47:16.374774       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
> I0615 21:47:16.389071       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.389430       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.389568       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.389802       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.390384       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.390966       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.391479       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.392956       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.394572       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.395056       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.395502       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.396148       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.397962       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.402784       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.406326       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.406293       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.410299       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.410450       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.410816       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.422048       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.422171       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.467400       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.467505       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.469395       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.471798       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.471926       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.472709       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:47:16.478355       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
> I0615 21:50:14.909697       1 base_controller.go:73] Caches are synced for LoggingSyncer 
> I0615 21:50:14.909756       1 base_controller.go:73] Caches are synced for ManagementStateController 
> I0615 21:50:14.909794       1 base_controller.go:73] Caches are synced for StatusSyncer_csi-snapshot-controller 
> I0615 21:50:14.909836       1 base_controller.go:73] Caches are synced for CSISnapshotWebhookController 
> I0615 21:50:14.910261       1 base_controller.go:73] Caches are synced for StaticResourceController

Comment 3 Wei Duan 2022-10-05 09:21:07 UTC
Don't see such error after the fix in recent CI, will check if the right error message will be logged.

Comment 4 Fabio Bertinatto 2022-10-11 11:06:02 UTC
Moving back to POST as we need to merge 2 other PRs.

Comment 5 Fabio Bertinatto 2022-10-11 12:33:26 UTC
Moving back to POST as we need to merge another PR.

Comment 7 Wei Duan 2022-10-14 10:39:15 UTC
Checked csi-snapshot-controller-operator log from many 4.12 CI job but did not find "Error initializing delegating authentication" error yet.

Comment 12 errata-xmlrpc 2023-01-17 19:50:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399