Bug 1918287

Summary: [ovirt] ovirt csi driver is flooding RHV with API calls and spam the event UI with new connections
Product: OpenShift Container Platform Reporter: Gal Zaidman <gzaidman>
Component: StorageAssignee: Gal Zaidman <gzaidman>
Storage sub component: oVirt CSI Driver QA Contact: Michael Burman <mburman>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, mburman
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:54:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1924623    
Attachments:
Description Flags
200 line on /var/log/httpd/ssl_request log - 3masters 3workers none

Description Gal Zaidman 2021-01-20 11:49:43 UTC
Created attachment 1749053 [details]
200 line on /var/log/httpd/ssl_request log - 3masters 3workers

Description of problem:

When installing the CSI driver we see that the RHV events log gets spammed with infinite amount of auth calls to the engine - one call from each node -each 10 sec.

This is a problem not only for the logs but also to the engine itself that can handle a limited amount of requests per second and so many request can really be an issue for the engine.

When debugging this we see that the problem is in our Probe call of the CSI driver that gets called each 10 secs, since the driver is on each node that will lead to around 1 auth per sec on a small cluster and more on larger one.

Comment 1 Benny Zlotnik 2021-01-20 12:21:11 UTC
When liveness probe was introduced the period was set to 30 seconds
https://github.com/oVirt/csi-driver/commit/efb64526378d20fca0039b61aa421b29321e380e#diff-4b761eacd68a8a2fb6f81f7c9f0e4c22b0c707238a37d7325f0bf84d211586adR103

But looks like it was lost in the migration to a second level operator, it needs to adjusted in the operator:
https://github.com/openshift/ovirt-csi-driver-operator/tree/master/assets

Comment 2 Gal Zaidman 2021-01-20 12:29:14 UTC
This is not a blocker

Comment 3 Gal Zaidman 2021-01-20 12:29:26 UTC
(In reply to Benny Zlotnik from comment #1)
> When liveness probe was introduced the period was set to 30 seconds
> https://github.com/oVirt/csi-driver/commit/
> efb64526378d20fca0039b61aa421b29321e380e#diff-
> 4b761eacd68a8a2fb6f81f7c9f0e4c22b0c707238a37d7325f0bf84d211586adR103
> 
> But looks like it was lost in the migration to a second level operator, it
> needs to adjusted in the operator:
> https://github.com/openshift/ovirt-csi-driver-operator/tree/master/assets

Can you open a separate bug on that? I want to reserve this bug for a different fix.

Comment 5 Michael Burman 2021-01-26 13:35:06 UTC
Verified on - 4.7.0-0.nightly-2021-01-26-044139 with 4.4.4.7-0.1.el8ev

The API calls spam are gone from the event log UI.
There is only one connecting event coming from one of the master VMs(leader) each time it expired(10-15 minutes) and connecting.

Comment 8 errata-xmlrpc 2021-02-24 15:54:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633