Bug 1881044

Summary: Limit the maximum CSR reports
Product: OpenShift Container Platform Reporter: Ricardo Lüders <rluders>
Component: Insights OperatorAssignee: Ricardo Lüders <rluders>
Status: CLOSED ERRATA QA Contact: Pavel Šimovec <psimovec>
Severity: medium Docs Contact: Marc Muehlfeld <mmuehlfe>
Priority: unspecified    
Version: 4.6CC: aos-bugs, avicenzi, inecas, mkunc, tremes
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Missing report limiter to CSR was causing an enormous amount of unnecessary data reports. Consequence: Unecessary amount of reports being collected. Fix: Limit the report data to to 5000 reports. Result: Limiting of the amount of data reports collected to 5000 reports.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:43:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ricardo Lüders 2020-09-21 12:36:51 UTC
Description of problem:

In order to avoid memory overload it has to limit the number of CSR reports.

Version-Release number of selected component (if applicable):


How reproducible:

Found a cluster that has a lot of CSR and give a try. Maybe?

Steps to Reproduce:
1.
2.
3.

Actual results:

Memory problems when found large amount of CSR


Expected results:

Not crash?

Additional info:

Comment 3 Pavel Šimovec 2020-09-30 11:23:31 UTC
I had to check what the upper limit is in source code

I have created certificate server.crt and a script:

for (( c=1; c<=9555; c++ ))
do
echo "apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
  name: service$c.default
spec:
  groups:
  - system:authenticated
  request: $(cat server.csr | base64 | tr -d '\n')
  usages:
  - digital signature
  - key encipherment
  - server auth" > wow$c &&echo $c&&kubectl create -f wow$c&
sleep 0.08;
done


then I have added check in TestArachiveContains - pattern ^config/certificatesigningrequests/.*json$
When I run the test during the time that first script was running, I got this output
main_test.go:369: 5320 csr files match pattern `^config/certificatesigningrequests/.*json$`

...5320 is more than hardcoded upper limit 5000
does NOT work in 4.6.0-0.ci-2020-09-30-031307

Comment 4 Pavel Šimovec 2020-09-30 11:44:30 UTC
additional info:
When I have added few more thousands csrs, it started to limit collection of files as expected
main_test.go:369: 5000 csr files match pattern `^config/certificatesigningrequests/.*json$`

Comment 5 Ricardo Lüders 2020-09-30 15:18:00 UTC
I tried to reproduce the issue without success, in all my tests I was able to get the amount of CSR reports by the limit. Also, I wasn't able to use your script to generate more than 100 resources at my cluster on quicklab, what is kind of weird.

Comment 6 Pavel Šimovec 2020-10-01 13:39:45 UTC
The script isn't the best one, today I had issues with it myself, I had to increase timeout to make it work better..

I was able to reproduce it myself on cluster from cluster bot, same version as last time

11:44
main_test.go:369: 5220 csr files match pattern `^config/certificatesigningrequests/.*json$`
11:48
main_test.go:369: 5255 csr files match pattern `^config/certificatesigningrequests/.*json$`
11:53
main_test.go:369: 5000 csr files match pattern `^config/certificatesigningrequests/.*json$`
11:56
main_test.go:369: 5000 csr files match pattern `^config/certificatesigningrequests/.*json$`


Fix should be added in different PR,
I will verify this BZ, as it doesn't break anything.. it works at least partially, so it can just help - the issue I found should be fixed in different PR/BZ

Comment 9 errata-xmlrpc 2020-10-27 16:43:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196