Bug 1961561 - The encryption controllers send lots of request to an API server
Summary: The encryption controllers send lots of request to an API server
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Lukasz Szaszkiewicz
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-18 09:22 UTC by Lukasz Szaszkiewicz
Modified: 2021-07-27 23:09 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:08:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 438 0 None closed Bug 1950379: routersecret: sync only the cert/key pair for the default domain 2021-05-20 07:49:00 UTC
Github openshift cluster-kube-apiserver-operator pull 1079 0 None open Bug 1943804: increases termination timeouts for AWS 2021-05-18 09:25:06 UTC
Github openshift cluster-openshift-apiserver-operator pull 451 0 None open Bug 1961561: pick up the precondition checker for reducing encryption QPS 2021-05-18 09:23:43 UTC
Github openshift library-go pull 1059 0 None closed precondition checker for reducing encryption QPS 2021-05-18 09:23:19 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:09:17 UTC

Description Lukasz Szaszkiewicz 2021-05-18 09:22:38 UTC
Synchronizing encryption controllers is expensive because they pull data directly from the servers to get the most recent data.

By default, the controllers resync every 60 seconds. However, tighter loops can be enforced on dependencies. For example, the authentication operator reconciles its resource every 20 seconds.

We provided a precondition checker [1] that determines if encryption controllers should synchronize. This helps to avoid sending requests to the API servers if there is no work to do.


The precondition checker must be pulled into kas-o, oas-o and the authentication operator.


[1] - https://github.com/openshift/library-go/pull/1059

Comment 2 Lukasz Szaszkiewicz 2021-05-25 15:43:40 UTC
I downloaded the audit-logs and used https://github.com/openshift/cluster-debug-tools to get requests for encryption-config secret made by the cluster-openshift-apiserver-operator.

I haven't found any requests for the secret after applying the fix.

after: ./kubectl-dev_tool audit -f /Users/lszaszki/workspace/Downloads/audit-logs/registry-build01-ci-openshift-org-ci-op-d3lj2xcr-stable-sha256-b26af26229ab63635a788809f1997f11bbe81d2a5a815203f7002c67711f68fb/audit_logs/kube-apiserver --by=resource --user=system:serviceaccount:openshift-apiserver-operator:openshift-apiserver-operator --verb=list,get -otop=50 -owide | grep encryption

before: ./kubectl-dev_tool audit -f /Users/lszaszki/workspace/Downloads/audit-logs-before/registry-build01-ci-openshift-org-ci-op-k9msd634-stable-sha256-b26af26229ab63635a788809f1997f11bbe81d2a5a815203f7002c67711f68fb/audit_logs/kube-apiserver --by=resource --user=system:serviceaccount:openshift-apiserver-operator:openshift-apiserver-operator --verb=list,get -otop=50 -owide | grep encryption


after: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-openshift-apiserver-operator/451/pull-ci-openshift-cluster-openshift-apiserver-operator-master-e2e-aws/1394239157840121856

before: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-openshift-apiserver-operator/448/pull-ci-openshift-cluster-openshift-apiserver-operator-master-e2e-aws/1393074832035287040


It would be nice if we could do similar tests for kas-o and the authentication operator

Comment 4 Ke Wang 2021-06-10 11:38:11 UTC
Refer to Comment #2, checked all audit.log files on master, 

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-09-142759   True        False         9h      Cluster version is 4.8.0-0.nightly-2021-06-09-142759

$ masters=$(oc get no -l node-role.kubernetes.io/master | sed '1d' | awk '{print $1}')
$ oc adm node-logs $masters --path=kube-apiserver/audit.log > kas-audit.log
$ oc adm node-logs $masters --path=openshift-apiserver/audit.log > oas-audit.log;oc adm node-logs $masters --path=oauth-apiserver/audit.log > oauth-audit.log

Downloaded and used https://github.com/openshift/cluster-debug-tools to check if there are requests for encryption-config secret made by the cluster-openshift-apiserver-operator, openshift-kube-apiserver-operator and openshift-authentication-operator, 
$./kubectl-dev_tool audit -f oas-audit.log --by=resource --user=system:serviceaccount:openshift-apiserver-operator:openshift-apiserver-operator --verb=list,get -otop=50 -owide | grep encryption

$./kubectl-dev_tool audit -f kas-audit.log --by=resource --user=system:serviceaccount:openshift-kube-apiserver-operator:kube-apiserver-operator --verb=list,get -otop=50 -owide | grep encryption

$./kubectl-dev_tool audit -f oauth-audit.log --by=resource --user=system:serviceaccount:openshift-authentication-operator:authentication-operator --verb=list,get -otop=50 -owide | grep encryption

After checked, no any requests for the secret can be found, it is as expected, so move the bug VERIFIED.

Comment 7 errata-xmlrpc 2021-07-27 23:08:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.