Created attachment 1825761 [details] test logs Description of problem (please be detailed as possible and provide log snippests): Below ocs-ci tests in tier2 fails due to missing provisioner, "openshift-storage.rbd.csi.ceph.com". tests: tests/manage/storageclass/test_create_multiple_sc_with_same_pool_name.py::TestCreateMultipleScWithSamePoolName::test_create_multiple_sc_with_same_pool_name[CephBlockPool] tests/manage/storageclass/test_create_sc_reclaim_policy_rep2_comp.py::TestScReclaimPolicyRetainRep2Comp::test_sc_reclaim_policy_retain_rep2_comp Error: E ocs_ci.ocs.exceptions.ResourceWrongStatusException: Resource pvc-test-4a62eff8cc9249b19ad6677b09ae68e describe output: Name: pvc-test-4a62eff8cc9249b19ad6677b09ae68e E Namespace: namespace-test-576e92034cc14d85a164dfd5a E StorageClass: storageclass-test-rbd-d1e8e64b417048149b E Status: Pending E Volume: E Labels: <none> E Annotations: volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com E Finalizers: [kubernetes.io/pvc-protection] E Capacity: E Access Modes: E VolumeMode: Filesystem E Used By: <none> E Events: E Type Reason Age From Message E ---- ------ ---- ---- ------- E Normal Provisioning 62s openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-c649865cb-h7bcc_8962ebeb-4840-4e3a-879e-3d280449cae0 External provisioner is provisioning volume for claim "namespace-test-576e92034cc14d85a164dfd5a/pvc-test-4a62eff8cc9249b19ad6677b09ae68e" E Normal ExternalProvisioning 15s (x5 over 62s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator Version of all relevant components (if applicable): OCS 4.9 (tested on 4.9.0-156.ci) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? pvc provisioning fails which uses this provisioner. Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? yes Can this issue reproduce from the UI? no If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Install ocp cluster 2. Deploy ODF along with LSO 3. executed the ocs-ci test. Actual results: pvc's stays in pending state. Expected results: pvc's get provisioned successfully and test passes. Additional info:
Hey Abdul, Can you also attach the must-gather for the same? Logs will definitely help to get a deeper understanding of the issue. Thanks, Yug
Please find the must-gather logs in google drive : https://drive.google.com/file/d/1TYPa3cXPGIU1VB1ymBQvIoBprzsECCeg/view?usp=sharing
Hey Abdul, In the logs, it looks like the parallel PVC creation was attempted which led the rbd command to hanged, due to which no response was returned by the CreateVolume call. Since no response was returned from the first CreateVolume call; eventually, all the upcoming new calls returned "operation already exists". It is a known librbd issue that is tracked here: https://tracker.ceph.com/issues/52537 PR to fix the same: https://github.com/ceph/ceph/pull/43113 Tracker issue in ceph-csi: https://github.com/ceph/ceph-csi/issues/2521 Also, for workarounds, you can either: 1. Rollback to ceph octopus release (harder to hit the issue there) 2. https://github.com/ceph/ceph-csi/issues/2521#issuecomment-924638203 Regards, Yug Gupta
We have a similar bz open for the same issue which is already on QA. Closing this one as a Duplicate. Please feel free to open if found otherwise. *** This bug has been marked as a duplicate of bug 1986794 ***