Bug 2267731

Summary: [RDR] RBD apps fail to Relocate when using stale Ceph pool IDs from replacement cluster
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Annette Clewett <aclewett>
Component: rookAssignee: Santosh Pillai <sapillai>
Status: NEW --- QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.15CC: amagrawa, ebenahar, kbg, kmanohar, kseeger, mrajanna, muagarwa, ndevos, odf-bz-bot, prsurve, sapillai, srangana, tnielsen, uchapaga
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.RBD applications fail to Relocate when using stale Ceph pool IDs from replacement cluster For the applications created before the new peer cluster is created, it is not possible to mount the RBD PVC because when a peer cluster is replaced, it is not possible to update the CephBlockPoolID’s mapping in the CSI configmap. Workaround: Update the `rook-ceph-csi-mapping-config` configmap with cephBlockPoolID's mapping on the peer cluster that is not replaced. This enables mounting the RBD PVC for the application.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Annette Clewett 2024-03-04 17:05:03 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Created RDR environment with hub cluster (perf1) and 2 managed clusters perf2 and perf3. Then tested replacement cluster steps using KCS https://access.redhat.com/articles/7049245 and added new recovery cluster perf-2.

Last step of Relocating back to Primary cluster failed and shows RBD app pods in creating mode because their PVC/PV are in a bad state.

This is because when "perf-2" was added as a new recovery cluster it has Ceph pool IDs which have changed compared to the original replacement cluster "perf2".

perf3 when RBD apps Relocated from:

$ ceph df | grep -B 3 -A 1 cephblockpool
 
--- POOLS ---
POOL                                                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
ocs-storagecluster-cephblockpool                        1   32  837 MiB      407  2.3 GiB   0.94     83 GiB
.mgr                                                    2    1  705 KiB        2  2.1 MiB      0     83 GiB

new perf-2 where RBD apps Relocated to:

$ ceph df | grep -B 2 cephblockpool
POOL                                                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                                                    1    1  577 KiB        2  1.7 MiB      0     82 GiB
ocs-storagecluster-cephblockpool                        2   32  817 MiB      378  2.3 GiB   0.91     82 GiB

Version of all relevant components (if applicable):
OCP 4.14.11
ODF 4.15 (build 146)
ACM 2.9.2

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, RBD apps are in failed state.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
5

Can this issue reproducible?
Is intermittent because Ceph pool IDs do not always change when new recovery cluster created with ODF installed.

Steps to Reproduce: 
0) Create RDR environment with hub cluster and 2 managed clusters with names perf2 and perf3 in ACM cluster view.
1) Fail original perf2 cluster (pwr down all nodes)
2) Failover perf2 rbd and cephfs apps to perf3
3) Validate apps failed over correctly and are working as expected given perf2 down (replication between clusters is down)
4) Delete DRCluster perf2 using hub cluster
5) Validate s3Profile for perf2 removed from all VRGs on perf3 
6) Disable DR for all rbd and cephfs apps from perf2
7) Remove all DR config from perf3 and hub cluster
8) Remove submariner using ACM UI
9) Detach perf2 cluster using ACM UI
10) Create new cluster and add cluster using ACM UI as perf-2
11) Install ODF 4.15 build 146 on perf-2
12) Add submariner add-ons using ACM UI
13) Install MCO (ODF 4.15 build 146) using hub cluster
14) Create first DRPolicy
15) Apply DR policy to rbd and cephfs apps originally on perf2
16) Relocate rbd and cephfs apps back to perf-2


Actual results:
RBD apps failed because of bad PVC/PV state.

Expected results:
RBD apps are created with healthy PVC/PV state.

Additional info:

Shyam's diagnosis:

The issue is in the Pool ID mapping comfig map for Ceph-csi (as follows):

perf-2 (c1)
===========

Pool ID for the RBD pool is 2 :pool 2 'ocs-storagecluster-cephblockpool' (from ceph osd pool ls detail)

CSI mapping ConfigMap has this:$ oc get cm -n openshift-storage rook-ceph-csi-mapping-config -o yaml
apiVersion: v1
data:
  csi-mapping-config-json: '[{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"1":"2"}]}]'
kind: ConfigMap

perf3 (c2)
==========

Pool ID for the RBD pool is 1: pool 1 'ocs-storagecluster-cephblockpool'

CSI mapping ConfigMap has this:

$ oc get cm -n openshift-storage rook-ceph-csi-mapping-config -o yaml
apiVersion: v1
data:
  csi-mapping-config-json: '[{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"8":"1"}]}]'
kind: ConfigMap

The PVC was initially created on the cluster that was lost and hence had this as the CSI Volume Handle volumeHandle: 0001-0011-openshift-storage-0000000000000008-06e1ec21-887c-4734-baf4-8f12a319ae0a

Note the 000008 that is the Pool ID, which is not the pool ID in either of the current clusters.

When this was failed over to perf3 the existing CSI mapping mapped ID 8 to ID 1 in perf3, this is correct.

When we added the new cluster perf-2 neither of the CSI mappings work, as the new cluster has Pool ID 1, which is why the error messages also point to the pool with ID 8 on perf-2: pool 8 'ocs-storagecluster-cephobjectstore.rgw.log'.

Anyway, this is an interesting issue, we need to map an non-existing Pool ID to one of the existing pool IDs in the current clusters. Ceph-CSI would need to fix this.

Comment 11 Santosh Pillai 2024-04-17 15:10:25 UTC
Does the workaround provided in the comment #6 work?

Moving it to 4.17 to check if there are any permanent solutions other than the workaround. Please move it back to 4.16 if workaround does not work!

Comment 13 kmanohar 2024-05-22 17:42:35 UTC
@mrajanna

During Cluster replacement, I observed this behaviour again. Performed the WA suggested. This time during relocate, PVCs on C2(surviving cluster) are stuck in terminating state. Kindly help me in ensuring that the WA has completely worked, also the reason for PVC stuck.

C1 (Recovery cluster)
---


$ ceph osd pool ls detail

pool 1 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 9 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 210 lfor 0/0/26 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd read_balance_score 1.13
pool 2 'ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 13 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 123 flags hashpspool stripe_width 0 pg_num_min 8 application rgw read_balance_score 1.88
pool 3 'ocs-storagecluster-cephobjectstore.rgw.meta' replicated size 3 min_size 2 crush_rule 12 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 123 flags hashpspool stripe_width 0 pg_num_min 8 application rgw read_balance_score 1.50


oc get cm -n openshift-storage rook-ceph-csi-mapping-config -o yaml

apiVersion: v1
data:
  csi-mapping-config-json: '[{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"1":"1"}]}]'
kind: ConfigMap
metadata:
  creationTimestamp: "2024-05-22T06:02:15Z"
  name: rook-ceph-csi-mapping-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: false
    controller: true
    kind: Deployment
    name: rook-ceph-operator
    uid: 2dd3eaf4-caaf-4b05-ab07-542e168f1887
  resourceVersion: "3906057"
  uid: d782c3b2-5715-4842-9127-815273c8d8fe



C2(surviving cluster)
---

$ ceph osd pool ls detail

pool 1 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 9 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 2408 lfor 0/0/38 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd read_balance_score 1.13
pool 2 'ocs-storagecluster-cephobjectstore.rgw.log' replicated size 3 min_size 2 crush_rule 11 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 2041 flags hashpspool stripe_width 0 pg_num_min 8 application rgw read_balance_score 1.88
pool 3 'ocs-storagecluster-cephobjectstore.rgw.control' replicated size 3 min_size 2 crush_rule 13 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 2041 flags hashpspool stripe_width 0 pg_num_min 8 application rgw read_balance_score 1.50


Edited the config map by adding 1:1 mapping. I believe this is the right way.


oc get cm -n openshift-storage rook-ceph-csi-mapping-config -o yaml

apiVersion: v1
data:
  csi-mapping-config-json: '[{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"3":"1"}]},{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping”:[{“1”:”1”}]}]’            <---- Here I have added the 1:1 pool mapping
kind: ConfigMap
metadata:
  creationTimestamp: "2024-05-20T10:19:47Z"
  name: rook-ceph-csi-mapping-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: false
    controller: true
    kind: Deployment
    name: rook-ceph-operator
    uid: d79ca804-3a2f-4f7e-96fb-76780814bd38
  resourceVersion: "5653111"
  uid: f235cd1e-24b7-4062-bd76-ebca186801b9


HUB
---

oc get drpc  app-sub-busybox1-placement-1-drpc -o yaml

apiVersion: ramendr.openshift.io/v1alpha1
kind: DRPlacementControl
metadata:
  annotations:
    drplacementcontrol.ramendr.openshift.io/app-namespace: app-sub-busybox1
    drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: kmanohar-c1
  creationTimestamp: "2024-05-22T08:20:37Z"
  finalizers:
  - drpc.ramendr.openshift.io/finalizer
  generation: 2
  labels:
    cluster.open-cluster-management.io/backup: ramen
  name: app-sub-busybox1-placement-1-drpc
  namespace: app-sub-busybox1
  ownerReferences:
  - apiVersion: cluster.open-cluster-management.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Placement
    name: app-sub-busybox1-placement-1
    uid: 353a0fa5-2d42-4b02-b887-a1ecd5ed4e73
  resourceVersion: "5409642"
  uid: 05db7b13-1aac-4b6b-883b-54ea204fe8b5
spec:
  action: Relocate
  drPolicyRef:
    apiVersion: ramendr.openshift.io/v1alpha1
    kind: DRPolicy
    name: dr-policy-10mn
  failoverCluster: kmanohar-c2
  placementRef:
    apiVersion: cluster.open-cluster-management.io/v1beta1
    kind: Placement
    name: app-sub-busybox1-placement-1
    namespace: app-sub-busybox1
  preferredCluster: kmanohar-c1
  pvcSelector:
    matchLabels:
      appname: busybox_app1
status:
  actionStartTime: "2024-05-22T09:38:21Z"
  conditions:
  - lastTransitionTime: "2024-05-22T09:40:22Z"
    message: Completed
    observedGeneration: 2
    reason: Relocated
    status: "True"
    type: Available
  - lastTransitionTime: "2024-05-22T09:39:20Z"
    message: Relocation in progress to cluster "kmanohar-c1"
    observedGeneration: 2
    reason: NotStarted
    status: "False"
    type: PeerReady
  - lastTransitionTime: "2024-05-22T12:12:11Z"
    message: 'VolumeReplicationGroup (app-sub-busybox1/app-sub-busybox1-placement-1-drpc)
      on cluster kmanohar-c1 is reporting errors (Failed to restore PVs/PVCs: failed
      to restore PV/PVC for VolRep (failed to restore PVs and PVCs using profile list
      ([s3profile-kmanohar-c1-ocs-storagecluster s3profile-kmanohar-c2-ocs-storagecluster]):
      failed to restore all []v1.PersistentVolumeClaim. Total/Restored 20/0)) restoring
      workload resources, retrying till ClusterDataReady condition is met'
    observedGeneration: 2
    reason: Error
    status: "False"
    type: Protected
  lastGroupSyncBytes: 87576576
  lastGroupSyncDuration: 1s
  lastGroupSyncTime: "2024-05-22T09:30:01Z"
  lastUpdateTime: "2024-05-22T16:48:09Z"
  observedGeneration: 2
  phase: Relocated
  preferredDecision:
    clusterName: kmanohar-c2
    clusterNamespace: kmanohar-c2
  progression: Cleaning Up
  resourceConditions:
    conditions:
    - lastTransitionTime: "2024-05-22T12:11:40Z"
      message: Initializing VolumeReplicationGroup
      observedGeneration: 1
      reason: Initializing
      status: Unknown
      type: DataReady
    - lastTransitionTime: "2024-05-22T12:11:40Z"
      message: Initializing VolumeReplicationGroup
      observedGeneration: 1
      reason: Initializing
      status: Unknown
      type: DataProtected
    - lastTransitionTime: "2024-05-22T12:11:51Z"
      message: 'Failed to restore PVs/PVCs: failed to restore PV/PVC for VolRep (failed
        to restore PVs and PVCs using profile list ([s3profile-kmanohar-c1-ocs-storagecluster
        s3profile-kmanohar-c2-ocs-storagecluster]): failed to restore all []v1.PersistentVolumeClaim.
        Total/Restored 20/0)'
      observedGeneration: 1
      reason: Error
      status: "False"
      type: ClusterDataReady
    - lastTransitionTime: "2024-05-22T12:11:40Z"
      message: Initializing VolumeReplicationGroup
      observedGeneration: 1
      reason: Initializing
      status: Unknown
      type: ClusterDataProtected
    resourceMeta:
      generation: 1
      kind: VolumeReplicationGroup
      name: app-sub-busybox1-placement-1-drpc
      namespace: app-sub-busybox1


Additional Info:-
---------------

--> I did node restart of compute-0 and compute-2 but that didn't help, did rbdplugin-provisioner respin too, that didn't help too.
--> CephFS based application relocate was successful
--> Submariner connectivity is intact


Must gather - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/keerthana/BZ-2267731/


Cluster details

c1 - https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/37350/
c2 - https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/37348/
hub - https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/37349/