Bug 1879944 - [OCP 4.8] Slow PV creation with vsphere [NEEDINFO]
Summary: [OCP 4.8] Slow PV creation with vsphere
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.5
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.8.0
Assignee: Hemant Kumar
QA Contact: Wei Duan
URL:
Whiteboard:
Depends On:
Blocks: 1951726
TreeView+ depends on / blocked
 
Reported: 2020-09-17 12:12 UTC by Mario Abajo
Modified: 2021-07-27 22:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1951726 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:33:27 UTC
Target Upstream Version:
Embargoed:
igreen: needinfo+
igreen: needinfo?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:33:45 UTC

Description Mario Abajo 2020-09-17 12:12:16 UTC
Description of problem:
Customer has updated from openshift 4.3.5 to 4.3.28 and then it has started complaining about PV taking 5 to 10 min to get created and bind.
They have tried, in of their cluster, to use the CSI driver from vmware and it manage to create the volume is seconds.

Version-Release number of selected component (if applicable):

Openshift 4.3.28
Vsphere 6.7u3

How reproducible:
Always

Steps to Reproduce:
1. define a PVC that uses a storageclass in vsphere
2. Wait for 5 to 10 minutes
3. PV is created and bind

Actual results:

Logs from kube-admin-controller-manager:
~~~
I0915 16:29:26.197884       1 vsphere.go:1173] Starting to create a vSphere volume with volumeOptions: &{CapacityKB:2097152 Tags:map[kubernetes.io/created-for/pv/name:pvc-089a7d35-eae9-480f-a8db-4f3440474847 kubernetes.io/created-for/pvc/name:test-volume2 kubernetes.io/created-for/pvc/namespace:default] Name:ilocpces401-rq4qs-dynamic-pvc-089a7d35-eae9-480f-a8db-4f3440474847 DiskFormat:zeroedthick Datastore:CSSA-IBOX2998-IL-OCP4-04-13-DYN-CES401 VSANStorageProfileData: StoragePolicyName: StoragePolicyID: SCSIControllerType: Zone:[] SelectedNode:nil}
^^ Here we begin the volume creation with name: "ilocpces401-rq4qs-dynamic-pvc-089a7d35-eae9-480f-a8db-4f3440474847"

I0915 16:29:27.403350       1 vsphere.go:1225] Volume topology : []

I0915 16:36:38.143990       1 vsphere_volume_util.go:158] Successfully created vsphere volume ilocpces401-rq4qs-dynamic-pvc-089a7d35-eae9-480f-a8db-4f3440474847
^^ And here we see the volume created, this is when vsphere notify us about the volume has been created

I0915 16:36:38.149080       1 pv_controller.go:1559] volume "pvc-089a7d35-eae9-480f-a8db-4f3440474847" provisioned for claim "default/test-volume2"
I0915 16:36:38.149266       1 event.go:255] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"test-volume2", UID:"089a7d35-eae9-480f-a8db-4f3440474847", APIVersion:"v1", ResourceVersion:"123010048", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-089a7d35-eae9-480f-a8db-4f3440474847 using kubernetes.io/vsphere-volume
I0915 16:36:38.154599       1 pv_controller.go:798] volume "pvc-089a7d35-eae9-480f-a8db-4f3440474847" entered phase "Bound"
I0915 16:36:38.154626       1 pv_controller.go:901] volume "pvc-089a7d35-eae9-480f-a8db-4f3440474847" bound to claim "default/test-volume2"
^^ All of this is the "discover" of the volume as part of the cluster and it's binding to the pvc
~~~

Expected results:
To be back the "seconds" range of creation in pair with CSI driver if possible

Additional info:
We are trying to gather logs or events in the vsphere side. The impact of the issue is high. 
We have increased the loglevel on the kube-controller using this:

$ oc patch kubecontrollermanager cluster -p='{"spec": { "logLevel": "debug"}}' --type=merge
$ oc patch kubecontrollermanager cluster -p='{"spec": {"forceRedeploymentReason": "loglevel change-'"$( date --rfc-3339=ns )"'"}}' --type=merge

Is there any other way to increase the loglevel or to get more information from openshift side?

Comment 1 Maciej Szulik 2020-09-17 12:29:23 UTC
Sending over to storage team.

Comment 7 Jan Safranek 2020-10-23 14:25:10 UTC
We did not get the kubelet logs nor kubelet goroutine dump and we are unable to diagnose this issue further. Feel free to reopen the bug once you have the requested information.

Comment 56 Wei Duan 2021-04-27 01:49:57 UTC
Marked as "VERIFIED" to go ahead for 4.7.

Comment 59 errata-xmlrpc 2021-07-27 22:33:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.