Description of problem: During e2e testing, there has been a recent, significant increase in vSphere sessions. vCenter has a maximum limit of 2000 concurrent sessions. When the vmware-vsphere-csi-driver-operator is running, it has been noticed that individual clusters sometime consume a few hundred sessions at once. At most, clusters, consume a few dozen sessions. When the operator is disabled, no further session growth is noted and established sessions are eventually closed. The session growth only occurs during e2e tests and corresponds with the operator sync which can occur every few seconds and results in a new connection to vCenter[https://github.com/openshift/vmware-vsphere-csi-driver-operator/blob/cb321b1980d02f4e8ded29da8371e0f466454e10/pkg/operator/storageclasscontroller/storageclasscontroller.go#L163]. Clusters with over 250 sessions have been noted. This behavior results in significant instability for all of vSphere CI as all clusters are prevented from accessing the vCenter API once sessions are exhausted. Version-Release number of selected component (if applicable): - 4.10.0-0.nightly-2021-10-01-013103 - VMware IPI How reproducible: consistently Steps to Reproduce: 1. Install 4.10.0-0.nightly-2021-10-01-013103 2. Run e2e tests 3. Check session count in vCenter Actual results: a new session is established with every sync Expected results: session reuse should be investigated or explicitly closed Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: snippet of sync instances I1001 19:25:03.197131 1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft I1001 19:25:04.003940 1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft I1001 19:25:07.898412 1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft I1001 19:25:13.909210 1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft I1001 19:25:23.634340 1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft I1001 19:25:48.719095 1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft I1001 19:26:01.911076 1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft I1001 19:26:05.427926 1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft I1001 19:26:09.292775 1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft I1001 19:26:09.649060 1 vmware.go:307] Found existing profile with same name: openshift-storage-policy-rvanderp-dev-bxhft sessions in use by the cluster(user id test): govc session.ls | grep test | wc -l 134
I think we will have to implement connection caching for both SOAP and REST clients..
If you need any help at all testing fixes for this, just let me know. I'm happy to help.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056