Description of problem: Customer has updated from openshift 4.3.5 to 4.3.28 and then it has started complaining about PV taking 5 to 10 min to get created and bind. They have tried, in of their cluster, to use the CSI driver from vmware and it manage to create the volume is seconds. Version-Release number of selected component (if applicable): Openshift 4.3.28 Vsphere 6.7u3 How reproducible: Always Steps to Reproduce: 1. define a PVC that uses a storageclass in vsphere 2. Wait for 5 to 10 minutes 3. PV is created and bind Actual results: Logs from kube-admin-controller-manager: ~~~ I0915 16:29:26.197884 1 vsphere.go:1173] Starting to create a vSphere volume with volumeOptions: &{CapacityKB:2097152 Tags:map[kubernetes.io/created-for/pv/name:pvc-089a7d35-eae9-480f-a8db-4f3440474847 kubernetes.io/created-for/pvc/name:test-volume2 kubernetes.io/created-for/pvc/namespace:default] Name:ilocpces401-rq4qs-dynamic-pvc-089a7d35-eae9-480f-a8db-4f3440474847 DiskFormat:zeroedthick Datastore:CSSA-IBOX2998-IL-OCP4-04-13-DYN-CES401 VSANStorageProfileData: StoragePolicyName: StoragePolicyID: SCSIControllerType: Zone:[] SelectedNode:nil} ^^ Here we begin the volume creation with name: "ilocpces401-rq4qs-dynamic-pvc-089a7d35-eae9-480f-a8db-4f3440474847" I0915 16:29:27.403350 1 vsphere.go:1225] Volume topology : [] I0915 16:36:38.143990 1 vsphere_volume_util.go:158] Successfully created vsphere volume ilocpces401-rq4qs-dynamic-pvc-089a7d35-eae9-480f-a8db-4f3440474847 ^^ And here we see the volume created, this is when vsphere notify us about the volume has been created I0915 16:36:38.149080 1 pv_controller.go:1559] volume "pvc-089a7d35-eae9-480f-a8db-4f3440474847" provisioned for claim "default/test-volume2" I0915 16:36:38.149266 1 event.go:255] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"test-volume2", UID:"089a7d35-eae9-480f-a8db-4f3440474847", APIVersion:"v1", ResourceVersion:"123010048", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-089a7d35-eae9-480f-a8db-4f3440474847 using kubernetes.io/vsphere-volume I0915 16:36:38.154599 1 pv_controller.go:798] volume "pvc-089a7d35-eae9-480f-a8db-4f3440474847" entered phase "Bound" I0915 16:36:38.154626 1 pv_controller.go:901] volume "pvc-089a7d35-eae9-480f-a8db-4f3440474847" bound to claim "default/test-volume2" ^^ All of this is the "discover" of the volume as part of the cluster and it's binding to the pvc ~~~ Expected results: To be back the "seconds" range of creation in pair with CSI driver if possible Additional info: We are trying to gather logs or events in the vsphere side. The impact of the issue is high. We have increased the loglevel on the kube-controller using this: $ oc patch kubecontrollermanager cluster -p='{"spec": { "logLevel": "debug"}}' --type=merge $ oc patch kubecontrollermanager cluster -p='{"spec": {"forceRedeploymentReason": "loglevel change-'"$( date --rfc-3339=ns )"'"}}' --type=merge Is there any other way to increase the loglevel or to get more information from openshift side?
Sending over to storage team.
We did not get the kubelet logs nor kubelet goroutine dump and we are unable to diagnose this issue further. Feel free to reopen the bug once you have the requested information.
Marked as "VERIFIED" to go ahead for 4.7.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438