Bug 2009859

Summary: Large number of sessions created by vmware-vsphere-csi-driver-operator during e2e tests
Product: OpenShift Container Platform Reporter: rvanderp
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Storage sub component: Operators QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, fbertina, hekumar, jcallen
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:16:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2018496    

Description rvanderp 2021-10-01 19:34:37 UTC
Description of problem:
During e2e testing, there has been a recent, significant increase in vSphere sessions.  vCenter has a maximum limit of 2000 concurrent sessions.  When the vmware-vsphere-csi-driver-operator is running, it has been noticed that individual clusters sometime consume a few hundred sessions at once.  At most, clusters, consume a few dozen sessions.  When the operator is disabled, no further session growth is noted and established sessions are eventually closed.

The session growth only occurs during e2e tests and corresponds with the  operator sync which can occur every few seconds and results in a new connection to vCenter[https://github.com/openshift/vmware-vsphere-csi-driver-operator/blob/cb321b1980d02f4e8ded29da8371e0f466454e10/pkg/operator/storageclasscontroller/storageclasscontroller.go#L163]. Clusters with over 250 sessions have been noted.

This behavior results in significant instability for all of vSphere CI as all clusters are prevented from accessing the vCenter API once sessions are exhausted.  

Version-Release number of selected component (if applicable):
- 4.10.0-0.nightly-2021-10-01-013103
- VMware IPI

How reproducible: consistently

Steps to Reproduce:
1. Install 4.10.0-0.nightly-2021-10-01-013103
2. Run e2e tests
3. Check session count in vCenter

Actual results:
a new session is established with every sync

Expected results:
session reuse should be investigated or explicitly closed

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

snippet of sync instances
I1001 19:25:03.197131       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:25:04.003940       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:25:07.898412       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:25:13.909210       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:25:23.634340       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:25:48.719095       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:26:01.911076       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:26:05.427926       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:26:09.292775       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:26:09.649060       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft

sessions in use by the cluster(user id test):
govc session.ls | grep test | wc -l
134

Comment 2 Hemant Kumar 2021-10-01 19:38:21 UTC
I think we will have to implement connection caching for both SOAP and REST clients..

Comment 4 rvanderp 2021-10-01 20:10:52 UTC
If you need any help at all testing fixes for this, just let me know.  I'm happy to help.

Comment 18 errata-xmlrpc 2022-03-10 16:16:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056