2169779 – [vSphere]: rook-ceph-mon-* pvc are in pending state

Bug 2169779 - [vSphere]: rook-ceph-mon-* pvc are in pending state

Summary: [vSphere]: rook-ceph-mon-* pvc are in pending state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.13.0
Assignee:	Subham Rai
QA Contact:	Petr Balogh
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-02-14 16:35 UTC by Vijay Avuthu
Modified:	2023-08-09 17:03 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-06-21 15:23:59 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2023:3742	0	None	None	None	2023-06-21 15:24:51 UTC

Description Vijay Avuthu 2023-02-14 16:35:55 UTC

Description of problem (please be detailed as possible and provide log
snippests):

ODF 4.13 deployment on vSphere is failing due to rook-ceph-mon-* PVC are in pending state


Version of all relevant components (if applicable):
openshift installer (4.13.0-0.nightly-2023-02-13-235211)
ocs-registry:4.13.0-73

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
3/3

Can this issue reproduce from the UI?
Not tried


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install ODF 4.13 on vSphere platform
2. check storagecluster is in ready state and all PVC's are in Bound state
3.


Actual results:

$ oc get storagecluster

NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   37m   Progressing              2023-02-14T15:51:23Z   4.12.0

$ oc get pvc

NAME              STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
rook-ceph-mon-a   Pending                                      thin-csi       36m
rook-ceph-mon-b   Pending                                      thin-csi       36m
rook-ceph-mon-c   Pending                                      thin-csi       36m


Expected results:

storagecluster should be in Ready state


Additional info:

From OCP 4.13, default SC in vSPhere is changed to thin-csi and we are using the same for storagecluster creation.

$ oc get sc
NAME                          PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
ocs-storagecluster-ceph-rgw   openshift-storage.ceph.rook.io/bucket   Delete          Immediate              false                  39m
thin-csi (default)            csi.vsphere.vmware.com                  Delete          WaitForFirstConsumer   true                   55m


$ oc describe storagecluster ocs-storagecluster
Name:         ocs-storagecluster
Namespace:    openshift-storage
Labels:       <none>
Annotations:  uninstall.ocs.openshift.io/cleanup-policy: delete
              uninstall.ocs.openshift.io/mode: graceful
API Version:  ocs.openshift.io/v1
Kind:         StorageCluster
.
.
.
Status:
  Conditions:
    Last Heartbeat Time:   2023-02-14T16:31:57Z
    Last Transition Time:  2023-02-14T15:51:24Z
    Message:               Error while reconciling: some StorageClasses were skipped while waiting for pre-requisites to be met: [ocs-storagecluster-cephfs,ocs-storagecluster-ceph-rbd]
    Reason:                ReconcileFailed
    Status:                False
    Type:                  ReconcileComplete
    Last Heartbeat Time:   2023-02-14T15:51:24Z
    Last Transition Time:  2023-02-14T15:51:24Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                False
    Type:                  Available
    Last Heartbeat Time:   2023-02-14T15:51:24Z
    Last Transition Time:  2023-02-14T15:51:24Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                True
    Type:                  Progressing
    Last Heartbeat Time:   2023-02-14T15:51:24Z
    Last Transition Time:  2023-02-14T15:51:24Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                False
    Type:                  Degraded
    Last Heartbeat Time:   2023-02-14T15:51:24Z
    Last Transition Time:  2023-02-14T15:51:24Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                Unknown
    Type:                  Upgradeable


> $ oc get pvc
NAME              STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
rook-ceph-mon-a   Pending                                      thin-csi       41m
rook-ceph-mon-b   Pending                                      thin-csi       41m
rook-ceph-mon-c   Pending                                      thin-csi       41m

> $ oc describe pvc rook-ceph-mon-a
Name:          rook-ceph-mon-a
Namespace:     openshift-storage
StorageClass:  thin-csi
Status:        Pending
Volume:        
Labels:        app=rook-ceph-mon
               app.kubernetes.io/component=cephclusters.ceph.rook.io
               app.kubernetes.io/created-by=rook-ceph-operator
               app.kubernetes.io/instance=a
               app.kubernetes.io/managed-by=rook-ceph-operator
               app.kubernetes.io/name=ceph-mon
               app.kubernetes.io/part-of=ocs-storagecluster-cephcluster
               ceph-version=16.2.10-94
               ceph_daemon_id=a
               ceph_daemon_type=mon
               mon=a
               mon_canary=true
               mon_cluster=openshift-storage
               pvc_name=rook-ceph-mon-a
               pvc_size=50Gi
               rook-version=v4.13.0-0.d94a73188db5ba4deac2618d195ddadd60212d5f
               rook.io/operator-namespace=openshift-storage
               rook_cluster=openshift-storage
Annotations:   volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
               volume.kubernetes.io/selected-node: compute-1
               volume.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
Finalizers:    [kubernetes.io/pvc-protection]
.
.
.
Events:
  Type     Reason                Age                  From                                                                         Message
  ----     ------                ----                 ----                                                                         -------
  Normal   WaitForFirstConsumer  41m                  persistentvolume-controller                                                  waiting for first consumer to be created before binding
  Warning  ProvisioningFailed    15m (x15 over 41m)   csi.vsphere.vmware.com_control-plane-0_90f251a8-6a4d-4c55-91bb-870e17f1c925  failed to provision volume with StorageClass "thin-csi": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   Provisioning          5m1s (x18 over 41m)  csi.vsphere.vmware.com_control-plane-0_90f251a8-6a4d-4c55-91bb-870e17f1c925  External provisioner is provisioning volume for claim "openshift-storage/rook-ceph-mon-a"
  Normal   ExternalProvisioning  94s (x166 over 41m)  persistentvolume-controller                                                  waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator

> job link: https://url.corp.redhat.com/6781bd2

must gather: https://url.corp.redhat.com/542d982

Comment 3 Travis Nielsen 2023-02-14 20:43:25 UTC

The provisioner is not creating and binding the PV:
 waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator

Can you create any other test pod that provisions a volume from the thin-csi storage class? The thin-csi provisioner doesn't appear to be working.

Comment 8 Subham Rai 2023-02-21 03:41:13 UTC

Can you try testing on the latest 4.13 builds ex: `4.13.0-86`, there was a fix that is included in `4.13.0-85`.

Comment 10 Subham Rai 2023-02-21 14:18:28 UTC

@pbalogh is cluster live?

Comment 15 Subham Rai 2023-02-22 08:00:34 UTC

Once Nitin change the storageclass name, were able to see the error mentioned in the logs bug

those errors come from k8s or ocp where you forgot to apply some security labels in a namespace, I tried applying some labels in the namespace and the error was restricted to

```
2023-02-22 07:53:22.763567 I | op-mon: waiting for canary pod creation rook-ceph-mon-b-canary
W0222 07:53:22.971775       1 warnings.go:70] would violate PodSecurity "baseline:latest": hostPath volumes (volumes "ceph-daemons-sock-dir", "rook-ceph-log", "rook-ceph-crash"), privileged (containers "mon", "log-collector" must not set securityContext.privileged=true)
```

Comment 16 Subham Rai 2023-02-22 08:03:16 UTC

I applied this label ```kubectl label --overwrite ns openshift-storage \
  pod-security.kubernetes.io/enforce=privileged \
  pod-security.kubernetes.io/warn=baseline \
  pod-security.kubernetes.io/audit=baseline``` and the error was limited to one single error mentioned. Please check that

Comment 24 Subham Rai 2023-02-23 03:02:18 UTC

can we close this bz?

Comment 26 Travis Nielsen 2023-02-28 15:27:34 UTC

Moving to ON_QA to verify the resolution.

Comment 31 Subham Rai 2023-04-14 08:56:58 UTC

everything place, removing my needinfo

Comment 34 errata-xmlrpc 2023-06-21 15:23:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742

Note You need to log in before you can comment on or make changes to this bug.