2009859 – Large number of sessions created by vmware-vsphere-csi-driver-operator during e2e tests

Bug 2009859 - Large number of sessions created by vmware-vsphere-csi-driver-operator during e2e tests

Summary: Large number of sessions created by vmware-vsphere-csi-driver-operator during...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Fabio Bertinatto
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2018496
TreeView+	depends on / blocked

Reported:	2021-10-01 19:34 UTC by rvanderp
Modified:	2022-03-10 16:17 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 16:16:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-storage-operator pull 221	None	open	Bug 2009859: Install vSphere CSI Driver by default (again)	2021-10-06 11:17:00 UTC
Github	openshift vmware-vsphere-csi-driver-operator pull 47	None	open	Bug 2009859: Close connection to vCenter API	2021-10-04 20:36:14 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-10 16:16:59 UTC

Description rvanderp 2021-10-01 19:34:37 UTC

Description of problem:
During e2e testing, there has been a recent, significant increase in vSphere sessions.  vCenter has a maximum limit of 2000 concurrent sessions.  When the vmware-vsphere-csi-driver-operator is running, it has been noticed that individual clusters sometime consume a few hundred sessions at once.  At most, clusters, consume a few dozen sessions.  When the operator is disabled, no further session growth is noted and established sessions are eventually closed.

The session growth only occurs during e2e tests and corresponds with the  operator sync which can occur every few seconds and results in a new connection to vCenter[https://github.com/openshift/vmware-vsphere-csi-driver-operator/blob/cb321b1980d02f4e8ded29da8371e0f466454e10/pkg/operator/storageclasscontroller/storageclasscontroller.go#L163]. Clusters with over 250 sessions have been noted.

This behavior results in significant instability for all of vSphere CI as all clusters are prevented from accessing the vCenter API once sessions are exhausted.  

Version-Release number of selected component (if applicable):
- 4.10.0-0.nightly-2021-10-01-013103
- VMware IPI

How reproducible: consistently

Steps to Reproduce:
1. Install 4.10.0-0.nightly-2021-10-01-013103
2. Run e2e tests
3. Check session count in vCenter

Actual results:
a new session is established with every sync

Expected results:
session reuse should be investigated or explicitly closed

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

snippet of sync instances
I1001 19:25:03.197131       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:25:04.003940       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:25:07.898412       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:25:13.909210       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:25:23.634340       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:25:48.719095       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:26:01.911076       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:26:05.427926       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:26:09.292775       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft
I1001 19:26:09.649060       1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft

sessions in use by the cluster(user id test):
govc session.ls | grep test | wc -l
134

Comment 2 Hemant Kumar 2021-10-01 19:38:21 UTC

I think we will have to implement connection caching for both SOAP and REST clients..

Comment 4 rvanderp 2021-10-01 20:10:52 UTC

If you need any help at all testing fixes for this, just let me know.  I'm happy to help.

Comment 18 errata-xmlrpc 2022-03-10 16:16:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.