Created attachment 1863971 [details] File to create 100 pvc for storageclass in immediate volume binding mode. Description of problem (please be detailed as possible and provide log snippests): Creating 100 pvc in batch with immediate volume binding mode (also applicable if using waitforfirstconsumer and creating the pod in batch) takes more than 3 minutes to get all bounds and there are some error "context deadline exceeded" in topolvm-controller with fails and retries Version of all relevant components (if applicable): all Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Wait. My believe is with more than 100 PVC the process will be stuck for long time Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? yes Can this issue reproduce from the UI? No If this is a regression, please provide more details to justify this: No Steps to Reproduce: 1. Create ocp sno cluster 2. Install LVMO operator and install a cluster 3. Define storageclass with immediate volume bind mode. 4. Create 100 pvc from one file. 5. Delete the pvc with "oc delete -f <file> Actual results: The creation of 100 pvc takes 3+ minutes with error in log: E0303 09:45:49.221743 1 controller.go:957] error syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded I0303 09:45:49.221757 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage-test", Name:"lvmpvc-100", UID:"b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", APIVersion:"v1", ResourceVersion:"4105735", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded Expected results: Pvc should be created immediately with no error in log. Additional info: Creation and deletion of one of the PV in the log. As you can see there are failure and retries for creation and deletion. [srozen@localhost 4.10]$ oc logs topolvm-controller-98994987c-pw2xk --all-containers | grep fail | grep b5dc9fac-5ffa-455b-b6aa-ceddd95a9283 W0303 09:45:49.221697 1 controller.go:934] Retrying syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 0 E0303 09:45:49.221743 1 controller.go:957] error syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded I0303 09:45:49.221757 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage-test", Name:"lvmpvc-100", UID:"b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", APIVersion:"v1", ResourceVersion:"4105735", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 09:46:00.223997 1 controller.go:934] Retrying syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 1 E0303 09:46:00.224042 1 controller.go:957] error syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded I0303 09:46:00.224040 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage-test", Name:"lvmpvc-100", UID:"b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", APIVersion:"v1", ResourceVersion:"4105735", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 09:46:12.225518 1 controller.go:934] Retrying syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 2 E0303 09:46:12.225557 1 controller.go:957] error syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded I0303 09:46:12.225606 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage-test", Name:"lvmpvc-100", UID:"b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", APIVersion:"v1", ResourceVersion:"4105735", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 09:46:26.226929 1 controller.go:934] Retrying syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 3 E0303 09:46:26.226976 1 controller.go:957] error syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded I0303 09:46:26.227127 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage-test", Name:"lvmpvc-100", UID:"b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", APIVersion:"v1", ResourceVersion:"4105735", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 09:46:44.228599 1 controller.go:934] Retrying syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 4 E0303 09:46:44.228650 1 controller.go:957] error syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded I0303 09:46:44.228680 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage-test", Name:"lvmpvc-100", UID:"b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", APIVersion:"v1", ResourceVersion:"4105735", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 09:47:10.230599 1 controller.go:934] Retrying syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 5 E0303 09:47:10.230745 1 controller.go:957] error syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded I0303 09:47:10.230745 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage-test", Name:"lvmpvc-100", UID:"b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", APIVersion:"v1", ResourceVersion:"4105735", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 09:47:52.232527 1 controller.go:934] Retrying syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 6 E0303 09:47:52.232574 1 controller.go:957] error syncing claim "b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded I0303 09:47:52.232604 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"storage-test", Name:"lvmpvc-100", UID:"b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", APIVersion:"v1", ResourceVersion:"4105735", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "odf-lvm-imm-vg1": rpc error: code = DeadlineExceeded desc = context deadline exceeded E0303 10:32:04.716309 1 controller.go:1481] delete "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": volume deletion failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 10:32:04.716377 1 controller.go:989] Retrying syncing volume "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 0 E0303 10:32:15.718173 1 controller.go:1481] delete "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": volume deletion failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 10:32:15.718233 1 controller.go:989] Retrying syncing volume "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 1 E0303 10:32:27.719535 1 controller.go:1481] delete "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": volume deletion failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 10:32:27.719609 1 controller.go:989] Retrying syncing volume "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 2 E0303 10:32:41.720740 1 controller.go:1481] delete "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": volume deletion failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 10:32:41.720798 1 controller.go:989] Retrying syncing volume "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 3 E0303 10:32:59.722164 1 controller.go:1481] delete "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": volume deletion failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 10:32:59.722340 1 controller.go:989] Retrying syncing volume "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 4 E0303 10:33:25.724684 1 controller.go:1481] delete "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283": volume deletion failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded W0303 10:33:25.724761 1 controller.go:989] Retrying syncing volume "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283", failure 5
Created attachment 1863974 [details] topolvm controller logs.
Hi Shay, - I could see some contradicting results from the logs, at-least at some point the same setup was re-used for creating PVCs either in batch mode or linearly and as a result the logs are mixed up from various tests. - Volumes are succeeded, started and provisioned (in that order) from provisioner logs > # from storage class "odf-lvm-vg1" # grep -P 'lvmpvc-\d+.*odf-lvm-vg1.*(started|provisioned|succeeded)$' /tmp/lvm.log | awk '{$2=""; print $0}' | uniq | tail -n10 I0303 1 controller.go:1455] provision "storage-test/lvmpvc-96" class "odf-lvm-vg1": succeeded I0303 1 controller.go:1337] provision "storage-test/lvmpvc-97" class "odf-lvm-vg1": started I0303 1 controller.go:1442] provision "storage-test/lvmpvc-97" class "odf-lvm-vg1": volume "pvc-5b71d24d-cba6-4d76-83b6-9e5ea5f073ac" provisioned I0303 1 controller.go:1455] provision "storage-test/lvmpvc-97" class "odf-lvm-vg1": succeeded I0303 1 controller.go:1337] provision "storage-test/lvmpvc-98" class "odf-lvm-vg1": started I0303 1 controller.go:1442] provision "storage-test/lvmpvc-98" class "odf-lvm-vg1": volume "pvc-8493a5ba-8769-4483-b997-e311c7b7f926" provisioned I0303 1 controller.go:1455] provision "storage-test/lvmpvc-98" class "odf-lvm-vg1": succeeded I0303 1 controller.go:1337] provision "storage-test/lvmpvc-99" class "odf-lvm-vg1": started I0303 1 controller.go:1442] provision "storage-test/lvmpvc-99" class "odf-lvm-vg1": volume "pvc-80b5d75e-e718-4154-b39d-bbcec1261331" provisioned I0303 1 controller.go:1455] provision "storage-test/lvmpvc-99" class "odf-lvm-vg1": succeeded > # from storage class "odf-lvm-imm-vg1" # grep -P 'lvmpvc-\d+.*odf-lvm-imm-vg1.*(started|provisioned|succeeded)$' /tmp/lvm.log | awk '{$2=""; print $0}' | uniq | tail -n10 I0303 1 controller.go:1455] provision "storage-test/lvmpvc-97" class "odf-lvm-imm-vg1": succeeded I0303 1 controller.go:1337] provision "storage-test/lvmpvc-98" class "odf-lvm-imm-vg1": started I0303 1 controller.go:1442] provision "storage-test/lvmpvc-98" class "odf-lvm-imm-vg1": volume "pvc-6102e518-51e8-46c4-ab34-ad6da9acf54d" provisioned I0303 1 controller.go:1455] provision "storage-test/lvmpvc-98" class "odf-lvm-imm-vg1": succeeded I0303 1 controller.go:1337] provision "storage-test/lvmpvc-99" class "odf-lvm-imm-vg1": started I0303 1 controller.go:1442] provision "storage-test/lvmpvc-99" class "odf-lvm-imm-vg1": volume "pvc-8f7391ea-8054-401b-bd9e-1a24d6f3a5d4" provisioned I0303 1 controller.go:1455] provision "storage-test/lvmpvc-99" class "odf-lvm-imm-vg1": succeeded I0303 1 controller.go:1337] provision "storage-test/lvmpvc-100" class "odf-lvm-imm-vg1": started I0303 1 controller.go:1442] provision "storage-test/lvmpvc-100" class "odf-lvm-imm-vg1": volume "pvc-b5dc9fac-5ffa-455b-b6aa-ceddd95a9283" provisioned I0303 1 controller.go:1455] provision "storage-test/lvmpvc-100" class "odf-lvm-imm-vg1": succeeded Ask: 1. Please restart topolvm-controller and topolvm-node pods when performing different type of tests as this'll not effect the functionality and also provides consistent logs, could you please do the same for this bug again? 2. Let's isolate the issue from multiple operations as I could see some delete operations from the logs, although distinct operations (delete/expand/create) will be queued sequentially, when doing batch processing they may interfere in syncing the ops for underlying lvm infra, so pls isolate this test. General: 1. Your deductions on bulk ops are partly accurate in-terms of lvm syncing multiple ops, however at LVMO/Topolvm layer this shouldn't be an issue as all the commands will be funneled through lvmd sequentially, however I'd like to confirm this after an isolated re-test. Pls let me know if additional info is required. Thanks, Leela.
It's ok from my side.
Not a 4.11 blocker and can't be fixed before 4.11 dev freeze. Moving it to 4.12
Moving this to 4.13 as this is low priority.