Description of problem (please be detailed as possible and provide log snippests): Running the test test_clone_with_different_access_mode which creates 9 clones of an existing PVCs with an access mode different then parent PVC causes the mgr pod CLBO state. Deployment: Downstream-OCP4-15-VSPHERE6-UPI-ENCRYPTION-1AZ-RHCOS-VSAN-LSO-VMDK-3M-3W Clone with yaml bellow has stuck in Pending state apiVersion: v1 kind: PersistentVolumeClaim metadata: name: clone-pvc-test-47913a39ff9548-bec6e61beb namespace: namespace-test-b0ccbc497af947f7981fb8937 spec: accessModes: - ReadWriteMany dataSource: kind: PersistentVolumeClaim name: pvc-test-47913a39ff95488caaf4418f7beae23 resources: requests: storage: 3Gi storageClassName: ocs-storagecluster-cephfs 2024-01-03 21:30:12 Name: clone-pvc-test-47913a39ff9548-bec6e61beb 2024-01-03 21:30:12 Namespace: namespace-test-b0ccbc497af947f7981fb8937 2024-01-03 21:30:12 StorageClass: ocs-storagecluster-cephfs 2024-01-03 21:30:12 Status: Pending 2024-01-03 21:30:12 Volume: 2024-01-03 21:30:12 Labels: <none> 2024-01-03 21:30:12 Annotations: volume.beta.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com 2024-01-03 21:30:12 volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com 2024-01-03 21:30:12 Finalizers: [kubernetes.io/pvc-protection] 2024-01-03 21:30:12 Capacity: 2024-01-03 21:30:12 Access Modes: 2024-01-03 21:30:12 VolumeMode: Filesystem 2024-01-03 21:30:12 DataSource: 2024-01-03 21:30:12 Kind: PersistentVolumeClaim 2024-01-03 21:30:12 Name: pvc-test-47913a39ff95488caaf4418f7beae23 2024-01-03 21:30:12 Used By: <none> 2024-01-03 21:30:12 Events: 2024-01-03 21:30:12 Type Reason Age From Message 2024-01-03 21:30:12 ---- ------ ---- ---- ------- 2024-01-03 21:30:12 Warning ProvisioningFailed 3m32s openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-cc6b5b547-nkxj2_e0a815a7-e343-44db-ab00-151a5a1712d8 failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Aborted desc = clone from snapshot is pending 2024-01-03 21:30:12 Normal Provisioning 56s (x9 over 3m32s) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-cc6b5b547-nkxj2_e0a815a7-e343-44db-ab00-151a5a1712d8 External provisioner is provisioning volume for claim "namespace-test-b0ccbc497af947f7981fb8937/clone-pvc-test-47913a39ff9548-bec6e61beb" 2024-01-03 21:30:12 Warning ProvisioningFailed 40s (x8 over 3m31s) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-cc6b5b547-nkxj2_e0a815a7-e343-44db-ab00-151a5a1712d8 failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Aborted desc = clone from snapshot is already in progress 2024-01-03 21:30:12 Normal ExternalProvisioning 5s (x16 over 3m32s) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'openshift-storage.cephfs.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered. Alerts captured: state': 'firing', 'activeAt': '2024-01-03T17:05:30.245594999Z', 'value': '2.102324499988556e+04'}, {'labels': {'alertname': 'KubePodCrashLooping', 'container': 'mgr', 'endpoint': 'https-main', 'job': 'kube-state-metrics', 'namespace': 'openshift-storage', 'pod': 'rook-ceph-mgr-b-7f5c74fb5d-jlc52', 'reason': 'CrashLoopBackOff', 'service': 'kube-state-metrics', 'severity': 'warning', 'uid': '82dcdb73-34ff-423e-aae0-161827947181'}, 'annotations': {'description': 'Pod openshift-storage/rook-ceph-mgr-b-7f5c74fb5d-jlc52 (mgr) is in waiting state (reason: "CrashLoopBackOff").', 'summary': 'Pod is crash looping.'}, 'state': 'firing', 'activeAt': '2024-01-03T19:29:22.487572828Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubePodCrashLooping', 'container': 'mgr', 'endpoint': 'https-main', 'job': 'kube-state-metrics', 'namespace': 'openshift-storage', 'pod': 'rook-ceph-mgr-a-5448cf4785-q68wp', 'reason': 'CrashLoopBackOff', 'service': 'kube-state-metrics', 'severity': 'warning', 'uid': 'b58946ed-10b4-44ca-993d-6ed882173642'}, 'annotations': {'description': 'Pod openshift-storage/rook-ceph-mgr-a-5448cf4785-q68wp (mgr) is in waiting state (reason: "CrashLoopBackOff").', 'summary': 'Pod is crash looping.'} test log - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-061vue1cslv33-uba/j-061vue1cslv33-uba_20240103T120010/logs/ocs-ci-logs-1704292772/by_outcome/failed/tests/functional/pv/pvc_clone/test_clone_with_different_access_mode.py/TestCloneWithDifferentAccessMode/test_clone_with_different_access_mode/logs Version of all relevant components (if applicable): OCS 4.15.0-100 OCP 4.15.0-0.nightly-2024-01-03-015912 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? 2/2 Can this issue reproduce from the UI? yes If this is a regression, please provide more details to justify this: test never failed on ODF 4.14 but fails on ODF 4.15 Steps to Reproduce: 1. Create CephFileSystem and CephBlockPool PVCs of different volume modes and access modes 2. Attach pvc to a fio pods, run IO to fill-up 1 Gi from existing 3 Gi of each PVC 3. Clone each PVC with different access mode that the original PVC, verify each cloned PVC status Actual results: pvc stuck in pending, mgr pods are in CLBO Expected results: each cloned pvc is bound Additional info: must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-061vue1cslv33-uba/j-061vue1cslv33-uba_20240103T120010/logs/testcases_1704292772/j-061vue1cslv33-u/ 2 similar failures on the same deployment on ODF 4.15: Downstream-OCP4-15-VSPHERE6-UPI-ENCRYPTION-1AZ-RHCOS-VSAN-LSO-VMDK-3M-3W test never failed on ODF 4.14
> Do we have other tests just creating multiple parallel cephfs clones in a similar way and are those tests passing ? I don't see such tests but I can add more to picture. From observed tests-history we had 19 tests Passed on multiple different platforms on ODF 4.15, we had all 14 tests Passed on 4.14 including 7 passed on same Downstream-OCP4-15-VSPHERE6-UPI-ENCRYPTION-1AZ-RHCOS-VSAN-LSO-VMDK-3M-3W cluster, and we don't have history of passing tests on the same cluster type on ODF 4.15, all 2 test runs were failing with mgr pods in CLBO and PVCs Pending. @jijoy wdyt?
@khiremat, > I didn't the understand the testcase. please take a look on the test log, it has every command and cr applied during the test with the timestamps added. http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-061vue1cslv33-uba/j-061vue1cslv33-uba_20240103T120010/logs/ocs-ci-logs-1704292772/by_outcome/failed/tests/functional/pv/pvc_clone/test_clone_with_different_access_mode.py/TestCloneWithDifferentAccessMode/test_clone_with_different_access_mode/logs basically the test clones the PVC but changes the "accessModes", for example ReadWriteMany -> ReadWriteOnce, etc.
*** This bug has been marked as a duplicate of bug 2258357 ***