Description of problem (please be detailed as possible and provide log snippets): [RDR][CEPHFS] volsync-dd-io-pvc pvc's are taking a lot of time to come in Bound state Version of all relevant components (if applicable): OCP version:- 4.12.0-0.nightly-2022-10-05-053337 ODF version:- 4.12.0-70 CEPH version:- ceph version 16.2.10-41.el8cp (26bc3d938546adfb098168b7b565d4f9fa377775) pacific (stable) ACM version:- 2.6.1 SUBMARINER version:- v0.13.0 VOLSYNC version:- volsync-product.v0.5.0 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy RDR cluster 2. Deploy CEPHFS DR Application 3. Observe volsyncPod and volsyncPVC Actual results: volsync-dd-io-pvc-1-src Pending ocs-storagecluster-cephfs 27m volsync-dd-io-pvc-2-src Pending ocs-storagecluster-cephfs 6m59s volsync-dd-io-pvc-3-src Pending ocs-storagecluster-cephfs 37m volsync-dd-io-pvc-4-src Pending ocs-storagecluster-cephfs 6m59s volsync-dd-io-pvc-7-src Pending ocs-storagecluster-cephfs 37m oc describe pvc volsync-dd-io-pvc-1-src Name: volsync-dd-io-pvc-1-src Namespace: busybox-workloads-8 StorageClass: ocs-storagecluster-cephfs Status: Pending Volume: Labels: app.kubernetes.io/created-by=volsync volsync.backube/cleanup=9e226d9a-226e-434e-8cfd-f8ae0d8cdb86 Annotations: volume.beta.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem DataSource: APIGroup: snapshot.storage.k8s.io Kind: VolumeSnapshot Name: volsync-dd-io-pvc-1-src Used By: volsync-rsync-src-dd-io-pvc-1-qnhql Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 18m (x12 over 27m) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-6c56f845fc-qg4xx_703843f9-ea2a-49cb-85f1-317d06dcfdc7 failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Aborted desc = clone from snapshot is pending Warning ProvisioningFailed 8m23s (x2 over 13m) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-6c56f845fc-qg4xx_703843f9-ea2a-49cb-85f1-317d06dcfdc7 failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Aborted desc = clone from snapshot is already in progress Normal Provisioning 6m54s (x15 over 27m) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-6c56f845fc-qg4xx_703843f9-ea2a-49cb-85f1-317d06dcfdc7 External provisioner is provisioning volume for claim "busybox-workloads-8/volsync-dd-io-pvc-1-src" Normal ExternalProvisioning 2m30s (x104 over 27m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.cephfs.csi.ceph.com" or manually created by system administrator oc get volumesnapshots.snapshot.storage.k8s.io NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE volsync-dd-io-pvc-1-src true dd-io-pvc-1 117Gi ocs-storagecluster-cephfsplugin-snapclass snapcontent-048024ce-42aa-43c1-81b2-e321702ac071 28m 28m volsync-dd-io-pvc-2-src true dd-io-pvc-2 143Gi ocs-storagecluster-cephfsplugin-snapclass snapcontent-126fe6bb-08a5-44c8-a1ee-0d61071d61f5 7m54s 8m2s volsync-dd-io-pvc-3-src true dd-io-pvc-3 134Gi ocs-storagecluster-cephfsplugin-snapclass snapcontent-1f540195-d5eb-4872-9154-c6d21ab88077 37m 38m volsync-dd-io-pvc-4-src true dd-io-pvc-4 106Gi ocs-storagecluster-cephfsplugin-snapclass snapcontent-712066ae-0fcc-4d50-ba14-953ed358cbed 7m59s 8m2s volsync-dd-io-pvc-7-src true dd-io-pvc-7 149Gi ocs-storagecluster-cephfsplugin-snapclass snapcontent-a4318685-6b6c-410d-95ed-70c65ac8a06d 37m 38m oc describe volumesnapshots.snapshot.storage.k8s.io volsync-dd-io-pvc-1-src Name: volsync-dd-io-pvc-1-src Namespace: busybox-workloads-8 Labels: app.kubernetes.io/created-by=volsync volsync.backube/cleanup=9e226d9a-226e-434e-8cfd-f8ae0d8cdb86 Annotations: <none> API Version: snapshot.storage.k8s.io/v1 Kind: VolumeSnapshot Metadata: Creation Timestamp: 2022-10-12T07:40:00Z Finalizers: snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection snapshot.storage.kubernetes.io/volumesnapshot-bound-protection Generation: 1 Managed Fields: API Version: snapshot.storage.k8s.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: .: f:app.kubernetes.io/created-by: f:volsync.backube/cleanup: f:ownerReferences: .: k:{"uid":"9e226d9a-226e-434e-8cfd-f8ae0d8cdb86"}: f:spec: .: f:source: .: f:persistentVolumeClaimName: f:volumeSnapshotClassName: Manager: manager Operation: Update Time: 2022-10-12T07:40:00Z API Version: snapshot.storage.k8s.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection": v:"snapshot.storage.kubernetes.io/volumesnapshot-bound-protection": Manager: snapshot-controller Operation: Update Time: 2022-10-12T07:40:01Z API Version: snapshot.storage.k8s.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:boundVolumeSnapshotContentName: f:creationTime: f:readyToUse: f:restoreSize: Manager: snapshot-controller Operation: Update Subresource: status Time: 2022-10-12T07:40:06Z Owner References: API Version: volsync.backube/v1alpha1 Block Owner Deletion: true Controller: true Kind: ReplicationSource Name: dd-io-pvc-1 UID: 9e226d9a-226e-434e-8cfd-f8ae0d8cdb86 Resource Version: 6485976 UID: 048024ce-42aa-43c1-81b2-e321702ac071 Spec: Source: Persistent Volume Claim Name: dd-io-pvc-1 Volume Snapshot Class Name: ocs-storagecluster-cephfsplugin-snapclass Status: Bound Volume Snapshot Content Name: snapcontent-048024ce-42aa-43c1-81b2-e321702ac071 Creation Time: 2022-10-12T07:40:01Z Ready To Use: true Restore Size: 117Gi Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CreatingSnapshot 28m snapshot-controller Waiting for a snapshot busybox-workloads-8/volsync-dd-io-pvc-1-src to be created by the CSI driver. Normal SnapshotCreated 28m snapshot-controller Snapshot busybox-workloads-8/volsync-dd-io-pvc-1-src was successfully created by the CSI driver. Normal SnapshotReady 28m snapshot-controller Snapshot busybox-workloads-8/volsync-dd-io-pvc-1-src is ready to use. oc get volumesnapshotcontents.snapshot.storage.k8s.io NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT VOLUMESNAPSHOTNAMESPACE AGE snapcontent-048024ce-42aa-43c1-81b2-e321702ac071 true 125627793408 Delete openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass volsync-dd-io-pvc-1-src busybox-workloads-8 28m snapcontent-126fe6bb-08a5-44c8-a1ee-0d61071d61f5 true 153545080832 Delete openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass volsync-dd-io-pvc-2-src busybox-workloads-8 8m39s snapcontent-1f540195-d5eb-4872-9154-c6d21ab88077 true 143881404416 Delete openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass volsync-dd-io-pvc-3-src busybox-workloads-8 38m snapcontent-712066ae-0fcc-4d50-ba14-953ed358cbed true 113816633344 Delete openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass volsync-dd-io-pvc-4-src busybox-workloads-8 8m39s snapcontent-a4318685-6b6c-410d-95ed-70c65ac8a06d true 159987531776 Delete openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass volsync-dd-io-pvc-7-src busybox-workloads-8 38m Expected results: PVC should not take this long to come in Bound state Additional info:
FYI, also, we have a similar bz https://bugzilla.redhat.com/show_bug.cgi?id=2115558 (https://github.com/rook/rook/issues/10619 upstream issue to track) the similar situations
Closing this one as NOT A BUG as its expected result, please reopen if you think its not a bug.
This is a known issue, requires large effort in CephFS, lowering severity
moving the bug to new as there is no fix and setting needinfo for Pratik to check this out
Based on comment #33 moving out to 4.14.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832