Bug 2007442 - [IBM Z] PVC in pending state due to missing provisioner
Summary: [IBM Z] PVC in pending state due to missing provisioner
Keywords:
Status: CLOSED DUPLICATE of bug 1986794
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.9
Hardware: s390x
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Yug Gupta
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-23 21:25 UTC by Abdul Kandathil (IBM)
Modified: 2023-08-09 16:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-29 09:37:10 UTC
Embargoed:


Attachments (Terms of Use)
test logs (5.20 MB, application/zip)
2021-09-23 21:25 UTC, Abdul Kandathil (IBM)
no flags Details

Description Abdul Kandathil (IBM) 2021-09-23 21:25:15 UTC
Created attachment 1825761 [details]
test logs

Description of problem (please be detailed as possible and provide log
snippests):
Below ocs-ci tests in tier2 fails due to missing provisioner, "openshift-storage.rbd.csi.ceph.com".

tests: 

tests/manage/storageclass/test_create_multiple_sc_with_same_pool_name.py::TestCreateMultipleScWithSamePoolName::test_create_multiple_sc_with_same_pool_name[CephBlockPool]

tests/manage/storageclass/test_create_sc_reclaim_policy_rep2_comp.py::TestScReclaimPolicyRetainRep2Comp::test_sc_reclaim_policy_retain_rep2_comp

Error:

E           ocs_ci.ocs.exceptions.ResourceWrongStatusException: Resource pvc-test-4a62eff8cc9249b19ad6677b09ae68e describe output: Name:          pvc-test-4a62eff8cc9249b19ad6677b09ae68e
E           Namespace:     namespace-test-576e92034cc14d85a164dfd5a
E           StorageClass:  storageclass-test-rbd-d1e8e64b417048149b
E           Status:        Pending
E           Volume:
E           Labels:        <none>
E           Annotations:   volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
E           Finalizers:    [kubernetes.io/pvc-protection]
E           Capacity:
E           Access Modes:
E           VolumeMode:    Filesystem
E           Used By:       <none>
E           Events:
E             Type    Reason                Age                From                                                                                                               Message
E             ----    ------                ----               ----                                                                                                               -------
E             Normal  Provisioning          62s                openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-c649865cb-h7bcc_8962ebeb-4840-4e3a-879e-3d280449cae0  External provisioner is provisioning volume for claim "namespace-test-576e92034cc14d85a164dfd5a/pvc-test-4a62eff8cc9249b19ad6677b09ae68e"
E             Normal  ExternalProvisioning  15s (x5 over 62s)  persistentvolume-controller                                                                                        waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator


Version of all relevant components (if applicable): OCS 4.9 (tested on 4.9.0-156.ci)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

pvc provisioning fails which uses this provisioner.

Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
yes

Can this issue reproduce from the UI?
no

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install ocp cluster
2. Deploy ODF along with LSO
3. executed the ocs-ci test.


Actual results:
pvc's stays in pending state.


Expected results:
pvc's get provisioned successfully and test passes.

Additional info:

Comment 2 Yug Gupta 2021-09-24 02:37:06 UTC
Hey Abdul,

Can you also attach the must-gather for the same? Logs will definitely help to get a deeper understanding of the issue.

Thanks,
Yug

Comment 3 Abdul Kandathil (IBM) 2021-09-24 08:07:50 UTC
Please find the must-gather logs in google drive : https://drive.google.com/file/d/1TYPa3cXPGIU1VB1ymBQvIoBprzsECCeg/view?usp=sharing

Comment 5 Yug Gupta 2021-09-27 03:25:48 UTC
Hey Abdul,

In the logs, it looks like the parallel PVC creation was attempted which led the rbd command to hanged, due to which no response was returned by the CreateVolume call.
Since no response was returned from the first CreateVolume call; eventually, all the upcoming new calls returned "operation already exists".

It is a known librbd issue that is tracked here: https://tracker.ceph.com/issues/52537
PR to fix the same: https://github.com/ceph/ceph/pull/43113
Tracker issue in ceph-csi: https://github.com/ceph/ceph-csi/issues/2521

Also, for workarounds, you can either:

1. Rollback to ceph octopus release (harder to hit the issue there)
2. https://github.com/ceph/ceph-csi/issues/2521#issuecomment-924638203 

Regards,
Yug Gupta

Comment 6 Yug Gupta 2021-09-29 09:37:10 UTC
We have a similar bz open for the same issue which is already on QA. Closing this one as a Duplicate. Please feel free to open if found otherwise.

*** This bug has been marked as a duplicate of bug 1986794 ***


Note You need to log in before you can comment on or make changes to this bug.