Bug 2242121 - [RDR] A few cephfs pvcs are stuck in pending state over fresh deployment
Summary: [RDR] A few cephfs pvcs are stuck in pending state over fresh deployment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.14
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.14.0
Assignee: Rakshith
QA Contact: kmanohar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-10-04 13:17 UTC by Aman Agrawal
Modified: 2023-11-08 18:56 UTC (History)
2 users (show)

Fixed In Version: 4.14.0-148
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 18:54:58 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-csi issues 4162 0 None open Concurrent Map Writes With CephFS Plugin 2023-10-05 11:22:05 UTC
Github ceph ceph-csi pull 4163 0 None Draft cephfs: safeguard localClusterState struct from race conditions 2023-10-05 11:22:05 UTC
Github red-hat-storage ceph-csi pull 191 0 None open BUG 2242121: cephfs: safeguard localClusterState struct from race conditions 2023-10-11 06:18:15 UTC
Red Hat Product Errata RHSA-2023:6832 0 None None None 2023-11-08 18:56:32 UTC

Comment 8 kmanohar 2023-11-02 17:14:14 UTC
VERIFICATION COMMENTS
=====================

Steps to reproduce:-
-------------------
  
- create a high number of cephfs workloads at a time, 
- Wait for all cephfs PVC to be bound,
- Restart csi-cephfsplugin provisioner `oc delete -n openshift -lapp=csi-cephfsplugin-provisioner`

Repeat above three steps a few times since we may not hit the BZ everytime and observe the pod status

Verification O/P
----------------

On C1:

pods

NAME                                      READY   STATUS    RESTARTS   AGE
dd-io-1-5dbcfccf76-jb244                  1/1     Running   0          3h21m
dd-io-2-684fc84b64-brq59                  1/1     Running   0          3h21m
dd-io-3-68bf99586d-72xh8                  1/1     Running   0          3h21m
dd-io-4-757c8d8b7b-745tq                  1/1     Running   0          3h21m
dd-io-5-74768ccf84-trqbm                  1/1     Running   0          3h21m
dd-io-6-68d5769c76-jqt2h                  1/1     Running   0          3h21m
dd-io-7-67d87688b4-cfc4c                  1/1     Running   0          3h21m
volsync-rsync-tls-src-dd-io-pvc-1-nvlv9   1/1     Running   0          77s
volsync-rsync-tls-src-dd-io-pvc-3-wtx2k   1/1     Running   0          79s
volsync-rsync-tls-src-dd-io-pvc-4-hn2cz   1/1     Running   0          78s
volsync-rsync-tls-src-dd-io-pvc-5-mccr9   1/1     Running   0          76s
volsync-rsync-tls-src-dd-io-pvc-7-bh7l9   1/1     Running   0          77s

oc get vrg

NAME                                 DESIREDSTATE   CURRENTSTATE
busybox-1-cephfs-c2-placement-drpc   primary        Primary

$ oc get replicationsources.volsync.backube
NAME          SOURCE        LAST SYNC              DURATION          NEXT SYNC
dd-io-pvc-1   dd-io-pvc-1   2023-11-02T15:32:02Z   2m2.461129247s    2023-11-02T15:40:00Z
dd-io-pvc-2   dd-io-pvc-2   2023-11-02T15:30:27Z   27.716793967s     2023-11-02T15:40:00Z
dd-io-pvc-3   dd-io-pvc-3   2023-11-02T15:31:41Z   1m41.027619578s   2023-11-02T15:40:00Z
dd-io-pvc-4   dd-io-pvc-4   2023-11-02T15:31:35Z   1m35.070631687s   2023-11-02T15:40:00Z
dd-io-pvc-5   dd-io-pvc-5   2023-11-02T15:21:27Z   1m27.106419851s   2023-11-02T15:30:00Z
dd-io-pvc-6   dd-io-pvc-6   2023-11-02T15:31:14Z   1m14.546227293s   2023-11-02T15:40:00Z
dd-io-pvc-7   dd-io-pvc-7   2023-11-02T15:31:41Z   1m41.283488158s   2023-11-02T15:40:00Z


pvc

NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
dd-io-pvc-1   Bound    pvc-5581b6d9-a81e-48a8-a74f-ff5b22e671f8   117Gi      RWO            ocs-storagecluster-cephfs   3h23m
dd-io-pvc-2   Bound    pvc-6f74e2ff-93e5-4176-89d6-d06652a30eae   143Gi      RWO            ocs-storagecluster-cephfs   3h23m
dd-io-pvc-3   Bound    pvc-7d531c3d-b7ef-45f5-bef5-b8cc34d755f3   134Gi      RWO            ocs-storagecluster-cephfs   3h23m
dd-io-pvc-4   Bound    pvc-295ae8f9-ba4a-4291-895b-68bf6a6de3dd   106Gi      RWO            ocs-storagecluster-cephfs   3h23m
dd-io-pvc-5   Bound    pvc-7a03286c-5329-4f1c-a24f-7bec3ff1f950   115Gi      RWO            ocs-storagecluster-cephfs   3h23m
dd-io-pvc-6   Bound    pvc-49339510-367a-41c8-ba9d-0a427018bc51   129Gi      RWO            ocs-storagecluster-cephfs   3h23m
dd-io-pvc-7   Bound    pvc-bcb8a8cb-51ba-4518-8ace-b7f4f89b1abc   149Gi      RWO            ocs-storagecluster-cephfs   3h23m


$ oc get drpc busybox-1-cephfs-c2-placement-drpc -o yaml
apiVersion: ramendr.openshift.io/v1alpha1
kind: DRPlacementControl
metadata:
  annotations:
    drplacementcontrol.ramendr.openshift.io/app-namespace: appset-busybox-1-cephfs-c2
    drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: kmanohar-clu2
  creationTimestamp: "2023-11-02T12:11:28Z"
  finalizers:
  - drpc.ramendr.openshift.io/finalizer
  generation: 1
  labels:
    cluster.open-cluster-management.io/backup: resource
  name: busybox-1-cephfs-c2-placement-drpc
  namespace: openshift-gitops
  ownerReferences:
  - apiVersion: cluster.open-cluster-management.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Placement
    name: busybox-1-cephfs-c2-placement
    uid: 05cc03f3-8461-4e2b-bcc8-63213e11f725
  resourceVersion: "22634928"
  uid: c625e979-471a-402e-85a0-ce5abf626681
spec:
  drPolicyRef:
    apiVersion: ramendr.openshift.io/v1alpha1
    kind: DRPolicy
    name: dr-policy-10m-2
  placementRef:
    apiVersion: cluster.open-cluster-management.io/v1beta1
    kind: Placement
    name: busybox-1-cephfs-c2-placement
    namespace: openshift-gitops
  preferredCluster: kmanohar-clu2
  pvcSelector:
    matchLabels:
      appname: busybox-cephfs
status:
  conditions:
  - lastTransitionTime: "2023-11-02T12:11:28Z"
    message: Initial deployment completed
    observedGeneration: 1
    reason: Deployed
    status: "True"
    type: Available
  - lastTransitionTime: "2023-11-02T12:11:28Z"
    message: Ready
    observedGeneration: 1
    reason: Success
    status: "True"
    type: PeerReady
  lastGroupSyncDuration: 2m6.004301491s
  lastGroupSyncTime: "2023-11-02T15:30:27Z"
  lastUpdateTime: "2023-11-02T15:32:25Z"
  phase: Deployed
  preferredDecision:
    clusterName: kmanohar-clu2
    clusterNamespace: kmanohar-clu2
  progression: Completed
  resourceConditions:
    conditions:
    - lastTransitionTime: "2023-11-02T12:12:32Z"
      message: All VolSync PVCs are ready
      observedGeneration: 1
      reason: Ready
      status: "True"
      type: DataReady
    - lastTransitionTime: "2023-11-02T12:13:16Z"
      message: All VolSync PVCs are protected
      observedGeneration: 1
      reason: DataProtected
      status: "True"
      type: DataProtected
    - lastTransitionTime: "2023-11-02T12:11:28Z"
      message: Restored cluster data
      observedGeneration: 1
      reason: Restored
      status: "True"
      type: ClusterDataReady
    - lastTransitionTime: "2023-11-02T12:13:16Z"
      message: All VolSync PVCs are protected
      observedGeneration: 1
      reason: DataProtected
      status: "True"
      type: ClusterDataProtected
    resourceMeta:
      generation: 1
      kind: VolumeReplicationGroup
      name: busybox-1-cephfs-c2-placement-drpc
      namespace: appset-busybox-1-cephfs-c2
      protectedpvcs:
      - dd-io-pvc-3
      - dd-io-pvc-6
      - dd-io-pvc-7
      - dd-io-pvc-2
      - dd-io-pvc-4
      - dd-io-pvc-1
      - dd-io-pvc-5


Verified On:
------------

ODF Version - 4.14.0-150
OCP - 4.14.0-0.nightly-2023-10-15-164249
Submariner - 0.16.0(594788)
ACM - 2.9.0(2.9.0-DOWNSTREAM-2023-10-03-20-08-35)
Ceph version - ceph version 17.2.6-146.el9cp (1d01c2b30b5fd39787bb8804707c4b2e52e30137) quincy (stable)

Comment 11 errata-xmlrpc 2023-11-08 18:54:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832


Note You need to log in before you can comment on or make changes to this bug.