2290320 – Restore storagecluster and recover ODF from deleting Phase

Bug 2290320 - Restore storagecluster and recover ODF from deleting Phase [NEEDINFO]

Summary: Restore storagecluster and recover ODF from deleting Phase

Keywords:
Status:	ASSIGNED
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Parth Arora
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-06-04 03:12 UTC by Soumi Mitra
Modified:	2024-10-14 09:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
Flags:	paarora: needinfo? (smitra) paarora: needinfo? (khover)

Attachments	(Terms of Use)

Description Soumi Mitra 2024-06-04 03:12:44 UTC

Description of problem (please be detailed as possible and provide log
snippests):

OCS storagecluster/storagesystem was deleted and ODF entered in terminating mode, but all the PVCs exist which prevented the deletion


For the cephcluster cr yaml, it shows there was a deletion initiated which did not complete. The deletion didn't went through since there were dependent objects that was unable to remove. Please see the conditions > 'CephCluster "openshift-storage/ocs-storagecluster-cephcluster" will not be deleted until all dependents are removed . This failed with  reason: ObjectHasDependents



oc get cephcluster -o yaml
apiVersion: v1
items:
- apiVersion: ceph.rook.io/v1
  kind: CephCluster
  
  ...
    conditions:
    - lastHeartbeatTime: "2024-05-17T02:11:20Z"
      lastTransitionTime: "2023-11-05T09:59:50Z"
      message: Cluster created successfully
      reason: ClusterCreated
      status: "True"
      type: Ready
    - lastHeartbeatTime: "2024-06-03T06:55:16Z"
      lastTransitionTime: "2024-05-17T02:12:01Z"
      message: 'CephCluster "openshift-storage/ocs-storagecluster-cephcluster" will
        not be deleted until all dependents are removed: CephBlockPool: [ocs-storagecluster-cephblockpool],
        CephFilesystem: [ocs-storagecluster-cephfilesystem], CephObjectStore: [ocs-storagecluster-cephobjectstore],
        CephObjectStoreUser: [noobaa-ceph-objectstore-user ocs-storagecluster-cephobjectstoreuser
        prometheus-user]'
      reason: ObjectHasDependents
      status: "True"
      type: DeletionIsBlocked
    - lastHeartbeatTime: "2024-06-03T06:55:15Z"
      lastTransitionTime: "2024-05-17T02:12:00Z"
      message: Deleting the CephCluster
      reason: ClusterDeleting
      status: "True"
      type: Deleting
    message: Deleting the CephCluster
	
	


2024-05-31 02:35:36.204878 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". CephCluster "openshift-storage/ocs-storagecluster-cephcluster" will not be deleted until all dependents are removed: CephBlockPool: [ocs-storagecluster-cephblockpool], CephFilesystem: [ocs-storagecluster-cephfilesystem], CephObjectStore: [ocs-storagecluster-cephobjectstore], CephObjectStoreUser: [noobaa-ceph-objectstore-user ocs-storagecluster-cephobjectstoreuser prometheus-user]




Version of all relevant components (if applicable):

ODF 4.12

ocs-operator.v4.12.11-rhodf                    OpenShift Container Storage                4.12.11-rhodf   ocs-operator.v4.11.13                          Succeeded
odf-csi-addons-operator.v4.12.11-rhodf         CSI Addons                                 4.12.11-rhodf   odf-csi-addons-operator.v4.11.13               Succeeded
odf-multicluster-orchestrator.v4.12.12-rhodf   ODF Multicluster Orchestrator              4.12.12-rhodf   odf-multicluster-orchestrator.v4.12.11-rhodf   Succeeded
odf-operator.v4.12.11-rhodf                    OpenShift Data Foundation                  4.12.11-rhodf   odf-operator.v4.11.13                          Succeeded
odr-hub-operator.v4.12.12-rhodf                Openshift DR Hub Operator                  4.12.12-rhodf   odr-hub-operator.v4.12.11-rhodf                Succeeded
openshift-gitops-operator.v1.11.2              Red Hat OpenShift GitOps                   1.11.2          openshift-gitops-operator.v1.11.1              Succeeded




Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes


Is there any workaround available to the best of your knowledge?



There are two options which we discussed with customer :

a] First is to take back up of their data and reinstall ODf
b] Second is to restore the cluster using upstream procedure 
https://www.rook.io/docs/rook/v1.14/Troubleshooting/disaster-recovery/#restoring-crds-after-deletion

The cluster is being extensively used for Quay and several applications are using ODF based PVCs and OBCs hence customer is not okay with the first option.

Ask : Can we recover the cluster using the upstream procedure and attempt to restore the cephcluster ?

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
N/A

If this is a regression, please provide more details to justify this:
N/A

Steps to Reproduce:
N/A


Actual results:
N/A

Expected results:

N/A


Additional info:

Comment 4 Santosh Pillai 2024-06-04 09:50:21 UTC

Its not very clear what's the requirement here. From what I understand the customer deleted the cluster, it got stuck due to the dependent resources, but now the customer wants the restore the same cluster?

is that correct?

Comment 5 kelwhite 2024-06-04 13:49:29 UTC

Santosh,

Yes, that's correct. Upstream has a disaster recovery doc [1] that we were hoping to get a +1 from engineering to use to hopefully stop the deletion of the cephcluster resource.

[1] https://www.rook.io/docs/rook/v1.14/Troubleshooting/disaster-recovery/#restoring-crds-after-deletion

Comment 6 kelwhite 2024-06-04 14:46:30 UTC

I ran through this process on a lab machine. One thing I noticed, is as soon as I remove the finalizer from the cephcluster cr, it gets recreated... might need to scale down the ocs-operator deployment as well? Here are my findings:

[system:admin/openshift-storage  root ~]$ oc get cephcluster
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE   MESSAGE                        HEALTH      EXTERNAL   FSID
ocs-storagecluster-cephcluster   /var/lib/rook     3          53d   Ready   Cluster created successfully   HEALTH_OK              246db4a4-d3a0-4a6a-9d55-17a84bdc0274
[system:admin/openshift-storage  root ~]$ oc delete cephcluster ocs-storagecluster-cephcluster
cephcluster.ceph.rook.io "ocs-storagecluster-cephcluster" deleted

^C[system:admin/openshift-storage  root ~]$ oc get cephcluster -o yaml
apiVersion: v1
items:
- apiVersion: ceph.rook.io/v1
  kind: CephCluster
  metadata:
    creationTimestamp: "2024-04-11T21:10:53Z"
    deletionGracePeriodSeconds: 0
    deletionTimestamp: "2024-06-04T14:08:07Z"
    finalizers:
    - cephcluster.ceph.rook.io
    generation: 12
    labels:
      app: ocs-storagecluster
      replicationid.multicluster.openshift.io: eb5a7b12c32796fcbb2278bcc4a38bf945443a4
    name: ocs-storagecluster-cephcluster
    namespace: openshift-storage
    ownerReferences:
    - apiVersion: ocs.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: StorageCluster
      name: ocs-storagecluster
      uid: 72ca44f0-33a0-4fe2-ba0d-39c5870550db
    resourceVersion: "60596460"
    uid: af973f13-f880-431f-b84b-e90e8a95e08b
...
    conditions:
    - lastHeartbeatTime: "2024-06-04T14:07:15Z"
      lastTransitionTime: "2024-04-11T21:19:32Z"
      message: Cluster created successfully
      reason: ClusterCreated
      status: "True"
      type: Ready
    - lastHeartbeatTime: "2024-06-04T14:08:13Z"
      lastTransitionTime: "2024-06-04T14:08:13Z"
      message: Deleting the CephCluster
      reason: ClusterDeleting
      status: "True"
      type: Deleting
    - lastHeartbeatTime: "2024-06-04T14:08:14Z"
      lastTransitionTime: "2024-06-04T14:08:14Z"
      message: 'CephCluster "openshift-storage/ocs-storagecluster-cephcluster" will
        not be deleted until all dependents are removed: CephBlockPool: [ocs-storagecluster-cephblockpool],
        CephFilesystem: [ocs-storagecluster-cephfilesystem ocs-storagecluster-cephfilesystem-new],
        CephFilesystemSubVolumeGroup: [ocs-storagecluster-cephfilesystem-csi], CephObjectStore:
        [ocs-storagecluster-cephobjectstore], CephObjectStoreUser: [noobaa-ceph-objectstore-user
        ocs-storagecluster-cephobjectstoreuser prometheus-user], CephRBDMirror: [ocs-storagecluster-cephrbdmirror]'
      reason: ObjectHasDependents
      status: "True"
      type: DeletionIsBlocked
    message: Deleting the CephCluster
    observedGeneration: 11
    phase: Deleting
    state: Deleting
...
[system:admin/openshift-storage  root ~]$ oc scale deployment rook-ceph-operator --replicas 0
deployment.apps/rook-ceph-operator scaled
[system:admin/openshift-storage  root ~]$ oc get cephcluster -o yaml > cluster.yaml
[system:admin/openshift-storage  root ~]$ oc get secrets -o yaml >secrets.yaml
[system:admin/openshift-storage  root ~]$ oc get cm -o yaml >configmaps.yaml
[system:admin/openshift-storage  root ~]$ oc get cephcluster ocs-storagecluster-cephcluster -o 'jsonpath={.metadata.uid}'
af973f13-f880-431f-b84b-e90e8a95e08b[system:admin/openshift-storage  root ~]$ 
[system:admin/openshift-storage  root ~]$ ROOK_UID=$(oc get cephcluster ocs-storagecluster-cephcluster -o 'jsonpath={.metadata.uid}')
[system:admin/openshift-storage  root ~]$ RESOURCES=$(oc get secret,configmap,service,deployment,pvc -o jsonpath='{range .items[?(@.metadata.ownerReferences[*].uid=="'"$ROOK_UID"'")]}{.kind}{"/"}{.metadata.name}{"\n"}{end}')
[system:admin/openshift-storage  root ~]$ oc get $RESOURCES
NAME                                                       TYPE                 DATA   AGE
secret/cluster-peer-token-ocs-storagecluster-cephcluster   kubernetes.io/rook   2      53d
secret/rook-ceph-admin-keyring                             kubernetes.io/rook   1      53d
secret/rook-ceph-config                                    kubernetes.io/rook   2      53d
secret/rook-ceph-crash-collector-keyring                   kubernetes.io/rook   1      53d
secret/rook-ceph-exporter-keyring                          kubernetes.io/rook   1      53d
secret/rook-ceph-mgr-a-keyring                             kubernetes.io/rook   1      53d
secret/rook-ceph-mgr-b-keyring                             kubernetes.io/rook   1      53d
secret/rook-ceph-mon                                       kubernetes.io/rook   4      53d
secret/rook-ceph-mons-keyring                              kubernetes.io/rook   1      53d
secret/rook-csi-cephfs-node                                kubernetes.io/rook   2      53d
secret/rook-csi-cephfs-provisioner                         kubernetes.io/rook   2      53d
secret/rook-csi-rbd-node                                   kubernetes.io/rook   2      53d
secret/rook-csi-rbd-provisioner                            kubernetes.io/rook   2      53d

NAME                                DATA   AGE
configmap/rook-ceph-mon-endpoints   5      53d

NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/rook-ceph-exporter   ClusterIP   172.30.27.200    <none>        9926/TCP   53d
service/rook-ceph-mgr        ClusterIP   172.30.107.23    <none>        9283/TCP   53d
service/rook-ceph-mon-d      ClusterIP   172.30.112.231   <none>        3300/TCP   49d
service/rook-ceph-mon-e      ClusterIP   172.30.114.57    <none>        3300/TCP   49d
service/rook-ceph-mon-f      ClusterIP   172.30.241.30    <none>        3300/TCP   49d
service/rook-ceph-osd-0      ClusterIP   172.30.22.209    <none>        6800/TCP   49d
service/rook-ceph-osd-1      ClusterIP   172.30.120.167   <none>        6800/TCP   49d
service/rook-ceph-osd-2      ClusterIP   172.30.12.188    <none>        6800/TCP   49d

NAME                                                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rook-ceph-crashcollector-odf0.libvirt2.ocpcluster.cc   1/1     1            1           53d
deployment.apps/rook-ceph-crashcollector-odf1.libvirt2.ocpcluster.cc   1/1     1            1           53d
deployment.apps/rook-ceph-crashcollector-odf2.libvirt2.ocpcluster.cc   1/1     1            1           53d
deployment.apps/rook-ceph-exporter-odf0.libvirt2.ocpcluster.cc         1/1     1            1           53d
deployment.apps/rook-ceph-exporter-odf1.libvirt2.ocpcluster.cc         1/1     1            1           53d
deployment.apps/rook-ceph-exporter-odf2.libvirt2.ocpcluster.cc         1/1     1            1           53d
deployment.apps/rook-ceph-mgr-a                                        1/1     1            1           53d
deployment.apps/rook-ceph-mgr-b                                        1/1     1            1           53d
deployment.apps/rook-ceph-mon-d                                        1/1     1            1           49d
deployment.apps/rook-ceph-mon-e                                        1/1     1            1           49d
deployment.apps/rook-ceph-mon-f                                        1/1     1            1           49d
deployment.apps/rook-ceph-osd-0                                        1/1     1            1           49d
deployment.apps/rook-ceph-osd-1                                        1/1     1            1           49d
deployment.apps/rook-ceph-osd-2                                        1/1     1            1           49d

NAME                                                STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/ocs-deviceset-0-data-0r6qc8   Bound    local-pv-e059f7de   500Gi      RWO            localblock     53d
persistentvolumeclaim/ocs-deviceset-1-data-0rtkmk   Bound    local-pv-9ef1ecc5   500Gi      RWO            localblock     53d
persistentvolumeclaim/ocs-deviceset-2-data-0djbtg   Bound    local-pv-e0c8a328   500Gi      RWO            localblock     53d

[system:admin/openshift-storage  root ~]$ for resource in $(oc -n openshift-storage get $RESOURCES -o name)
> do
> oc -n openshift-storage patch $resource -p '{"metadata": {"ownerReferences":null}}'
> done
secret/cluster-peer-token-ocs-storagecluster-cephcluster patched
secret/rook-ceph-admin-keyring patched
secret/rook-ceph-config patched
secret/rook-ceph-crash-collector-keyring patched
secret/rook-ceph-exporter-keyring patched
secret/rook-ceph-mgr-a-keyring patched
secret/rook-ceph-mgr-b-keyring patched
secret/rook-ceph-mon patched
secret/rook-ceph-mons-keyring patched
secret/rook-csi-cephfs-node patched
secret/rook-csi-cephfs-provisioner patched
secret/rook-csi-rbd-node patched
secret/rook-csi-rbd-provisioner patched
configmap/rook-ceph-mon-endpoints patched
service/rook-ceph-exporter patched
service/rook-ceph-mgr patched
service/rook-ceph-mon-d patched
service/rook-ceph-mon-e patched
service/rook-ceph-mon-f patched
service/rook-ceph-osd-0 patched
service/rook-ceph-osd-1 patched
service/rook-ceph-osd-2 patched
deployment.apps/rook-ceph-crashcollector-odf0.libvirt2.ocpcluster.cc patched
deployment.apps/rook-ceph-crashcollector-odf1.libvirt2.ocpcluster.cc patched
deployment.apps/rook-ceph-crashcollector-odf2.libvirt2.ocpcluster.cc patched
deployment.apps/rook-ceph-exporter-odf0.libvirt2.ocpcluster.cc patched
deployment.apps/rook-ceph-exporter-odf1.libvirt2.ocpcluster.cc patched
deployment.apps/rook-ceph-exporter-odf2.libvirt2.ocpcluster.cc patched
deployment.apps/rook-ceph-mgr-a patched
deployment.apps/rook-ceph-mgr-b patched
deployment.apps/rook-ceph-mon-d patched
deployment.apps/rook-ceph-mon-e patched
deployment.apps/rook-ceph-mon-f patched
deployment.apps/rook-ceph-osd-0 patched
deployment.apps/rook-ceph-osd-1 patched
deployment.apps/rook-ceph-osd-2 patched
persistentvolumeclaim/ocs-deviceset-0-data-0r6qc8 patched
persistentvolumeclaim/ocs-deviceset-1-data-0rtkmk patched
persistentvolumeclaim/ocs-deviceset-2-data-0djbtg patched
[system:admin/openshift-storage  root ~]$ for resource in $(oc -n openshift-storage get $RESOURCES -o name); do oc get $resource -o yaml >>file.txt; done
[system:admin/openshift-storage  root ~]$ less file.txt 
[system:admin/openshift-storage  root ~]$ less cluster.yaml 
[system:admin/openshift-storage  root ~]$ oc patch cephcluster/ocs-storagecluster-cephcluster --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'
cephcluster.ceph.rook.io/ocs-storagecluster-cephcluster patched
[system:admin/openshift-storage  root ~]$ oc get cephcluster
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE   MESSAGE   HEALTH   EXTERNAL   FSID
ocs-storagecluster-cephcluster   /var/lib/rook     3          7s                                
[system:admin/openshift-storage  root ~]$ oc create -f cluster.yaml 
Error from server (AlreadyExists): error when creating "cluster.yaml": cephclusters.ceph.rook.io "ocs-storagecluster-cephcluster" already exists
[system:admin/openshift-storage  root ~]$ oc get pods
NAME                                                              READY   STATUS    RESTARTS      AGE
csi-addons-controller-manager-68f7c7d494-xp2kk                    2/2     Running   0             4d22h
csi-cephfsplugin-p6hfb                                            2/2     Running   0             42d
csi-cephfsplugin-provisioner-784cf69787-btcqz                     6/6     Running   0             27d
csi-cephfsplugin-provisioner-784cf69787-jx6hm                     6/6     Running   0             53d
csi-cephfsplugin-v7xff                                            2/2     Running   0             42d
csi-cephfsplugin-v9cpg                                            2/2     Running   0             42d
csi-cephfsplugin-vkkwp                                            2/2     Running   0             42d
csi-cephfsplugin-x5mjx                                            2/2     Running   0             42d
csi-rbdplugin-2cnf9                                               3/3     Running   4             53d
csi-rbdplugin-8drgj                                               3/3     Running   0             53d
csi-rbdplugin-hpg5z                                               3/3     Running   1             53d
csi-rbdplugin-jlflf                                               3/3     Running   4             53d
csi-rbdplugin-m82s4                                               3/3     Running   4             53d
csi-rbdplugin-provisioner-7845b8779f-55thz                        7/7     Running   0             27d
csi-rbdplugin-provisioner-7845b8779f-ljp7j                        7/7     Running   0             49d
maintenance-agent-548986c6d7-pk6xp                                1/1     Running   0             27d
noobaa-core-0                                                     1/1     Running   0             53d
noobaa-db-pg-0                                                    1/1     Running   0             53d
noobaa-endpoint-5d8b4755-6f2vz                                    1/1     Running   0             53d
noobaa-operator-67ffc7bdf5-mnbt5                                  1/1     Running   1 (29s ago)   49d
ocs-metrics-exporter-55788b6cdb-r2hjn                             1/1     Running   0             27d
ocs-operator-86f58456c4-rzp7q                                     1/1     Running   0             47d
odf-console-77b5f8c787-686gk                                      1/1     Running   0             53d
odf-operator-controller-manager-6754f68ccc-vrlll                  2/2     Running   0             53d
rook-ceph-crashcollector-odf0.libvirt2.ocpcluster.cc-66bcbt2r9m   1/1     Running   0             53d
rook-ceph-crashcollector-odf1.libvirt2.ocpcluster.cc-74c7b2hpmr   1/1     Running   0             53d
rook-ceph-crashcollector-odf2.libvirt2.ocpcluster.cc-669cdtgg59   1/1     Running   0             26d
rook-ceph-exporter-odf0.libvirt2.ocpcluster.cc-785556d5c8-wslwg   1/1     Running   0             53d
rook-ceph-exporter-odf1.libvirt2.ocpcluster.cc-5bbd756997-7cs75   1/1     Running   0             53d
rook-ceph-exporter-odf2.libvirt2.ocpcluster.cc-5448954bf6-j544k   1/1     Running   0             26d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-99dc5fdbttk4t   2/2     Running   0             26d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-595dbc7fs6z2s   2/2     Running   0             26d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-new-a-85cd4t7td   2/2     Running   0             26d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-new-b-574bc4sdq   2/2     Running   0             26d
rook-ceph-mgr-a-7dcbd99c68-8bvwj                                  3/3     Running   0             41d
rook-ceph-mgr-b-56f6dd854f-49t2c                                  3/3     Running   0             41d
rook-ceph-mon-d-7c857959bd-lg4ck                                  2/2     Running   0             32d
rook-ceph-mon-e-5bb6769b4b-fh9ns                                  2/2     Running   0             32d
rook-ceph-mon-f-5d5b879fd8-lrmlr                                  2/2     Running   0             32d
rook-ceph-osd-0-7f4cfd54b6-wsdsb                                  2/2     Running   0             40d
rook-ceph-osd-1-b84fd6754-vt8v4                                   2/2     Running   0             40d
rook-ceph-osd-2-7bd7d8568b-g4p8f                                  2/2     Running   0             40d
rook-ceph-rbd-mirror-a-6498df6bf6-nhw69                           2/2     Running   0             49d
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-785d6dbrnb7w   2/2     Running   0             53d
rook-ceph-tools-675454f488-qzpc2                                  1/1     Running   0             49d
token-exchange-agent-5d9df8896c-zdg8q                             1/1     Running   0             27d
ux-backend-server-596cb57484-cfhqm                                2/2     Running   0             53d
[system:admin/openshift-storage  root ~]$ oc get $RESOURCES
NAME                                                       TYPE                 DATA   AGE
secret/cluster-peer-token-ocs-storagecluster-cephcluster   kubernetes.io/rook   2      53d
secret/rook-ceph-admin-keyring                             kubernetes.io/rook   1      53d
secret/rook-ceph-config                                    kubernetes.io/rook   2      53d
secret/rook-ceph-crash-collector-keyring                   kubernetes.io/rook   1      53d
secret/rook-ceph-exporter-keyring                          kubernetes.io/rook   1      53d
secret/rook-ceph-mgr-a-keyring                             kubernetes.io/rook   1      53d
secret/rook-ceph-mgr-b-keyring                             kubernetes.io/rook   1      53d
secret/rook-ceph-mon                                       kubernetes.io/rook   4      53d
secret/rook-ceph-mons-keyring                              kubernetes.io/rook   1      53d
secret/rook-csi-cephfs-node                                kubernetes.io/rook   2      53d
secret/rook-csi-cephfs-provisioner                         kubernetes.io/rook   2      53d
secret/rook-csi-rbd-node                                   kubernetes.io/rook   2      53d
secret/rook-csi-rbd-provisioner                            kubernetes.io/rook   2      53d

NAME                                DATA   AGE
configmap/rook-ceph-mon-endpoints   5      53d

NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/rook-ceph-exporter   ClusterIP   172.30.27.200    <none>        9926/TCP   53d
service/rook-ceph-mgr        ClusterIP   172.30.107.23    <none>        9283/TCP   53d
service/rook-ceph-mon-d      ClusterIP   172.30.112.231   <none>        3300/TCP   49d
service/rook-ceph-mon-e      ClusterIP   172.30.114.57    <none>        3300/TCP   49d
service/rook-ceph-mon-f      ClusterIP   172.30.241.30    <none>        3300/TCP   49d
service/rook-ceph-osd-0      ClusterIP   172.30.22.209    <none>        6800/TCP   49d
service/rook-ceph-osd-1      ClusterIP   172.30.120.167   <none>        6800/TCP   49d
service/rook-ceph-osd-2      ClusterIP   172.30.12.188    <none>        6800/TCP   49d

NAME                                                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rook-ceph-crashcollector-odf0.libvirt2.ocpcluster.cc   1/1     1            1           53d
deployment.apps/rook-ceph-crashcollector-odf1.libvirt2.ocpcluster.cc   1/1     1            1           53d
deployment.apps/rook-ceph-crashcollector-odf2.libvirt2.ocpcluster.cc   1/1     1            1           53d
deployment.apps/rook-ceph-exporter-odf0.libvirt2.ocpcluster.cc         1/1     1            1           53d
deployment.apps/rook-ceph-exporter-odf1.libvirt2.ocpcluster.cc         1/1     1            1           53d
deployment.apps/rook-ceph-exporter-odf2.libvirt2.ocpcluster.cc         1/1     1            1           53d
deployment.apps/rook-ceph-mgr-a                                        1/1     1            1           53d
deployment.apps/rook-ceph-mgr-b                                        1/1     1            1           53d
deployment.apps/rook-ceph-mon-d                                        1/1     1            1           49d
deployment.apps/rook-ceph-mon-e                                        1/1     1            1           49d
deployment.apps/rook-ceph-mon-f                                        1/1     1            1           49d
deployment.apps/rook-ceph-osd-0                                        1/1     1            1           49d
deployment.apps/rook-ceph-osd-1                                        1/1     1            1           49d
deployment.apps/rook-ceph-osd-2                                        1/1     1            1           49d

NAME                                                STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/ocs-deviceset-0-data-0r6qc8   Bound    local-pv-e059f7de   500Gi      RWO            localblock     53d
persistentvolumeclaim/ocs-deviceset-1-data-0rtkmk   Bound    local-pv-9ef1ecc5   500Gi      RWO            localblock     53d
persistentvolumeclaim/ocs-deviceset-2-data-0djbtg   Bound    local-pv-e0c8a328   500Gi      RWO            localblock     53d
[system:admin/openshift-storage  root ~]$ oc get deployments|grep rook-ceph-ope
rook-ceph-operator                                      0/0     0            0           53d
[system:admin/openshift-storage  root ~]$ oc scale deployment rook-ceph-operator --replicas 1
deployment.apps/rook-ceph-operator scaled
[system:admin/openshift-storage  root ~]$ oc get deploy|grep oper
noobaa-operator                                         1/1     1            1           53d
ocs-operator                                            1/1     1            1           53d
odf-operator-controller-manager                         1/1     1            1           53d
rook-ceph-operator                                      1/1     1            1           53d


We hang when trying to run ceph -s from the tools pod... according to the rook-ceph-operator logs, were stuck in a loop trying to reconcile:

2024-06-04 14:28:35.006722 I | op-mon: mons running: [e f d]
2024-06-04 14:28:40.580081 I | ceph-spec: ceph-rbd-mirror-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-04T14:27:46Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-04 14:28:40.879155 I | ceph-spec: ceph-block-pool-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-04T14:27:46Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-04 14:28:41.027479 I | ceph-spec: ceph-fs-subvolumegroup-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-04T14:27:46Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-04 14:28:41.029319 I | ceph-spec: ceph-object-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-04T14:27:46Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-04 14:28:41.380374 I | ceph-spec: ceph-file-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-04T14:27:46Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-04 14:28:41.579268 I | ceph-spec: ceph-file-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-04T14:27:46Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-04 14:28:41.691344 I | ceph-spec: ceph-object-store-user-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-04T14:27:46Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-04 14:28:41.691521 I | ceph-spec: ceph-object-store-user-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-04T14:27:46Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-04 14:28:41.691626 I | ceph-spec: ceph-object-store-user-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-04T14:27:46Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}

Comment 7 Santosh Pillai 2024-06-04 15:30:58 UTC

(In reply to kelwhite from comment #5)
> Santosh,
> 
> Yes, that's correct. Upstream has a disaster recovery doc [1] that we were
> hoping to get a +1 from engineering to use to hopefully stop the deletion of
> the cephcluster resource.
> 
> [1]
> https://www.rook.io/docs/rook/v1.14/Troubleshooting/disaster-recovery/
> #restoring-crds-after-deletion

Upstream doc to restore the CRDs should work.

Comment 8 Santosh Pillai 2024-06-04 15:31:21 UTC

(In reply to kelwhite from comment #6)
> I ran through this process on a lab machine. One thing I noticed, is as soon
> as I remove the finalizer from the cephcluster cr, it gets recreated...
> might need to scale down the ocs-operator deployment as well? Here are my
> findings:

That's right. Need to stop the OCS operator deployment as the first step here.

Comment 9 kelwhite 2024-06-04 18:57:14 UTC

Thanks, I will give that a go and report the outcome.

Comment 10 kelwhite 2024-06-05 00:06:11 UTC

Hi,

I ran the steps on a 4.15 cluster, and scaled down the ocs and rook-ceph operators, the same results happened above. The rook-ceph operator is having a hard time reconciling:

[kube:admin/openshift-storage  root ~]$ oc scale deployment rook-ceph-operator ocs-operator --replicas 0
deployment.apps/rook-ceph-operator scaled
deployment.apps/ocs-operator scaled
[kube:admin/openshift-storage  root ~]$ oc get pods|grep oper
noobaa-operator-769b96d865-vvr7q                                  1/1     Running   0          28d
odf-operator-controller-manager-5d5bbccf5f-tqrcf                  2/2     Running   0          28d
[kube:admin/openshift-storage  root ~]$ mkdir backups
[kube:admin/openshift-storage  root ~]$ cd backups/
[kube:admin/openshift-storage  root backups]$ oc get cephcluster -o yaml > cluster.yaml
[kube:admin/openshift-storage  root backups]$ oc get secrets -o yaml >secrets.yaml
[kube:admin/openshift-storage  root backups]$ oc get cm -o yaml >configmaps.yaml
[kube:admin/openshift-storage  root backups]$ oc get cephcluster ocs-storagecluster-cephcluster -o 'jsonpath={.metadata.uid}'
830b951e-46b8-4ba8-9c65-a5e6fa631dd2
[kube:admin/openshift-storage  root backups]$ 
[kube:admin/openshift-storage  root backups]$ ROOK_UID=$(oc get cephcluster ocs-storagecluster-cephcluster -o 'jsonpath={.metadata.uid}')
[system:admin/openshift-storage  root backups]$ oc get cephcluster ocs-storagecluster-cephcluster -o 'jsonpath={.metadata.uid}'
c937d8af-2550-40d6-9f3d-d8a97b1affec
[system:admin/openshift-storage  root backups]$ ROOK_UID=$(oc get cephcluster ocs-storagecluster-cephcluster -o 'jsonpath={.metadata.uid}')
[system:admin/openshift-storage  root backups]$ RESOURCES=$(kubectl get secret,configmap,service,deployment,pvc -o jsonpath='{range .items[?(@.metadata.ownerReferences[*].uid=="'"$ROOK_UID"'")]}{.kind}{"/"}{.metadata.name}{"\n"}{end}')
[system:admin/openshift-storage  root backups]$ kubectl get $RESOURCES
NAME                                                       TYPE                 DATA   AGE
secret/cluster-peer-token-ocs-storagecluster-cephcluster   kubernetes.io/rook   2      55d
secret/rook-ceph-admin-keyring                             kubernetes.io/rook   1      55d
secret/rook-ceph-config                                    kubernetes.io/rook   2      55d
secret/rook-ceph-crash-collector-keyring                   kubernetes.io/rook   1      55d
secret/rook-ceph-exporter-keyring                          kubernetes.io/rook   1      55d
secret/rook-ceph-mgr-a-keyring                             kubernetes.io/rook   1      55d
secret/rook-ceph-mgr-b-keyring                             kubernetes.io/rook   1      55d
secret/rook-ceph-mon                                       kubernetes.io/rook   4      55d
secret/rook-ceph-mons-keyring                              kubernetes.io/rook   1      55d
secret/rook-csi-cephfs-node                                kubernetes.io/rook   2      55d
secret/rook-csi-cephfs-provisioner                         kubernetes.io/rook   2      55d
secret/rook-csi-rbd-node                                   kubernetes.io/rook   2      55d
secret/rook-csi-rbd-provisioner                            kubernetes.io/rook   2      55d

NAME                                DATA   AGE
configmap/rook-ceph-mon-endpoints   5      55d

NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/rook-ceph-exporter   ClusterIP   172.30.11.29     <none>        9926/TCP   55d
service/rook-ceph-mgr        ClusterIP   172.30.132.136   <none>        9283/TCP   55d
service/rook-ceph-mon-d      ClusterIP   172.30.86.183    <none>        3300/TCP   50d
service/rook-ceph-mon-e      ClusterIP   172.30.113.197   <none>        3300/TCP   50d
service/rook-ceph-mon-f      ClusterIP   172.30.168.125   <none>        3300/TCP   50d
service/rook-ceph-osd-0      ClusterIP   172.30.45.34     <none>        6800/TCP   50d
service/rook-ceph-osd-1      ClusterIP   172.30.245.247   <none>        6800/TCP   50d
service/rook-ceph-osd-2      ClusterIP   172.30.208.229   <none>        6800/TCP   50d

NAME                                                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rook-ceph-crashcollector-odf0.libvirt3.ocpcluster.cc   1/1     1            1           55d
deployment.apps/rook-ceph-crashcollector-odf1.libvirt3.ocpcluster.cc   1/1     1            1           55d
deployment.apps/rook-ceph-crashcollector-odf2.libvirt3.ocpcluster.cc   1/1     1            1           55d
deployment.apps/rook-ceph-exporter-odf0.libvirt3.ocpcluster.cc         1/1     1            1           55d
deployment.apps/rook-ceph-exporter-odf1.libvirt3.ocpcluster.cc         1/1     1            1           55d
deployment.apps/rook-ceph-exporter-odf2.libvirt3.ocpcluster.cc         1/1     1            1           55d
deployment.apps/rook-ceph-mgr-a                                        1/1     1            1           55d
deployment.apps/rook-ceph-mgr-b                                        1/1     1            1           55d
deployment.apps/rook-ceph-mon-d                                        1/1     1            1           50d
deployment.apps/rook-ceph-mon-e                                        1/1     1            1           50d
deployment.apps/rook-ceph-mon-f                                        1/1     1            1           50d
deployment.apps/rook-ceph-osd-0                                        1/1     1            1           50d
deployment.apps/rook-ceph-osd-1                                        1/1     1            1           50d
deployment.apps/rook-ceph-osd-2                                        1/1     1            1           50d

NAME                                                STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/ocs-deviceset-0-data-0mwxd4   Bound    local-pv-7678b776   500Gi      RWO            localblock     55d
persistentvolumeclaim/ocs-deviceset-1-data-0dwdbm   Bound    local-pv-2732353f   500Gi      RWO            localblock     55d
persistentvolumeclaim/ocs-deviceset-2-data-0x72z6   Bound    local-pv-b1185a6d   500Gi      RWO            localblock     55d
[system:admin/openshift-storage  root backups]$ for resource in $(oc -n openshift-storage get $RESOURCES -o name)
> do
> oc -n openshift-storage patch $resource -p '{"metadata": {"ownerReferences":null}}'
> done
secret/cluster-peer-token-ocs-storagecluster-cephcluster patched
secret/rook-ceph-admin-keyring patched
secret/rook-ceph-config patched
secret/rook-ceph-crash-collector-keyring patched
secret/rook-ceph-exporter-keyring patched
secret/rook-ceph-mgr-a-keyring patched
secret/rook-ceph-mgr-b-keyring patched
secret/rook-ceph-mon patched
secret/rook-ceph-mons-keyring patched
secret/rook-csi-cephfs-node patched
secret/rook-csi-cephfs-provisioner patched
secret/rook-csi-rbd-node patched
secret/rook-csi-rbd-provisioner patched
configmap/rook-ceph-mon-endpoints patched
service/rook-ceph-exporter patched
service/rook-ceph-mgr patched
service/rook-ceph-mon-d patched
service/rook-ceph-mon-e patched
service/rook-ceph-mon-f patched
service/rook-ceph-osd-0 patched
service/rook-ceph-osd-1 patched
service/rook-ceph-osd-2 patched
deployment.apps/rook-ceph-crashcollector-odf0.libvirt3.ocpcluster.cc patched
deployment.apps/rook-ceph-crashcollector-odf1.libvirt3.ocpcluster.cc patched
deployment.apps/rook-ceph-crashcollector-odf2.libvirt3.ocpcluster.cc patched
deployment.apps/rook-ceph-exporter-odf0.libvirt3.ocpcluster.cc patched
deployment.apps/rook-ceph-exporter-odf1.libvirt3.ocpcluster.cc patched
deployment.apps/rook-ceph-exporter-odf2.libvirt3.ocpcluster.cc patched
deployment.apps/rook-ceph-mgr-a patched
deployment.apps/rook-ceph-mgr-b patched
deployment.apps/rook-ceph-mon-d patched
deployment.apps/rook-ceph-mon-e patched
deployment.apps/rook-ceph-mon-f patched
deployment.apps/rook-ceph-osd-0 patched
deployment.apps/rook-ceph-osd-1 patched
deployment.apps/rook-ceph-osd-2 patched
persistentvolumeclaim/ocs-deviceset-0-data-0mwxd4 patched
persistentvolumeclaim/ocs-deviceset-1-data-0dwdbm patched
persistentvolumeclaim/ocs-deviceset-2-data-0x72z6 patched
[system:admin/openshift-storage  root backups]$ oc delete cephcluster ocs-storagecluster-cephcluster 
cephcluster.ceph.rook.io "ocs-storagecluster-cephcluster" deleted
[system:admin/openshift-storage  root backups]$ oc get cephcluster
No resources found in openshift-storage namespace.
[system:admin/openshift-storage  root backups]$ oc create -f cluster.yaml 
cephcluster.ceph.rook.io/ocs-storagecluster-cephcluster created
[system:admin/openshift-storage  root backups]$ 

// odf operator logs:
2024-06-05 00:01:08.160537 I | ceph-spec: ceph-file-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-05T00:00:32Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-05 00:01:08.238233 I | ceph-spec: ceph-object-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-05T00:00:32Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-05 00:01:08.712411 I | ceph-spec: ceph-rbd-mirror-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-05T00:00:32Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-05 00:01:08.809975 I | ceph-spec: ceph-object-store-user-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-05T00:00:32Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-05 00:01:08.912430 I | ceph-spec: ceph-block-pool-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-05T00:00:32Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-05 00:01:09.108841 I | ceph-spec: ceph-object-store-user-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-05T00:00:32Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-05 00:01:09.109023 I | ceph-spec: ceph-object-store-user-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-05T00:00:32Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}
2024-06-05 00:01:09.161389 I | ceph-spec: ceph-fs-subvolumegroup-controller: CephCluster "ocs-storagecluster-cephcluster" found but skipping reconcile since ceph health is &{Health:HEALTH_ERR Details:map[error:{Severity:Urgent Message:failed to get status. . timed out: exit status 1}] LastChecked:2024-06-05T00:00:32Z LastChanged: PreviousHealth: Capacity:{TotalBytes:0 UsedBytes:0 AvailableBytes:0 LastUpdated:} Versions:<nil> FSID:}

I'll do 1 final test on ODF 4.12 to triple check this still doesn't work, but... based on these so far, I don't feel very confident giving this to the customer.

Comment 13 kelwhite 2024-06-10 14:29:37 UTC

Hi,

I still haven't had time to do any of the testing [1,2]... I might be able to go to it soon...

[1] https://github.com/rook/kubectl-rook-ceph/blob/master/docs/crd.md
[2] https://github.com/red-hat-storage/odf-cli?tab=readme-ov-file#odf-cli

Comment 14 kelwhite 2024-06-11 21:17:40 UTC

Parth,

Following [1] to install krew as I don't have this, then following [2] to install the ...

###################################################################################################
kelson@quorra:~$ (
  set -x; cd "$(mktemp -d)" &&
  OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
  ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
  KREW="krew-${OS}_${ARCH}" &&
  curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
  tar zxvf "${KREW}.tar.gz" &&
  ./"${KREW}" install krew
)
++ mktemp -d
+ cd /tmp/tmp.hcY1I4FORZ
++ uname
++ tr '[:upper:]' '[:lower:]'
+ OS=linux
++ uname -m
++ sed -e s/x86_64/amd64/ -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/'
+ ARCH=amd64
+ KREW=krew-linux_amd64
+ curl -fsSLO https://github.com/kubernetes-sigs/krew/releases/latest/download/krew-linux_amd64.tar.gz
+ tar zxvf krew-linux_amd64.tar.gz
./LICENSE
./krew-linux_amd64
+ ./krew-linux_amd64 install krew
Adding "default" plugin index from https://github.com/kubernetes-sigs/krew-index.git.
Updated the local copy of plugin index.
Installing plugin: krew
Installed plugin: krew
\
 | Use this plugin:
 | 	kubectl krew
 | Documentation:
 | 	https://krew.sigs.k8s.io/
 | Caveats:
 | \
 |  | krew is now installed! To start using kubectl plugins, you need to add
 |  | krew's installation directory to your PATH:
 |  | 
 |  |   * macOS/Linux:
 |  |     - Add the following to your ~/.bashrc or ~/.zshrc:
 |  |         export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
 |  |     - Restart your shell.
 |  | 
 |  |   * Windows: Add %USERPROFILE%\.krew\bin to your PATH environment variable
 |  | 
 |  | To list krew commands and to get help, run:
 |  |   $ kubectl krew
 |  | For a full list of available plugins, run:
 |  |   $ kubectl krew search
 |  | 
 |  | You can find documentation at
 |  |   https://krew.sigs.k8s.io/docs/user-guide/quickstart/.
 | /
/
kelson@quorra:~$ export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
kelson@quorra:~$ exit
kelson@quorra:~$ kubectl krew
krew is the kubectl plugin manager.
You can invoke krew through kubectl: "kubectl krew [command]..."

Usage:
  kubectl krew [command]

Available Commands:
  help        Help about any command
  index       Manage custom plugin indexes
  info        Show information about an available plugin
  install     Install kubectl plugins
  list        List installed kubectl plugins
  search      Discover kubectl plugins
  uninstall   Uninstall plugins
  update      Update the local copy of the plugin index
  upgrade     Upgrade installed plugins to newer versions
  version     Show krew version and diagnostics

Flags:
  -h, --help      help for krew
  -v, --v Level   number for the log level verbosity

Use "kubectl krew [command] --help" for more information about a command.

###################################################################################################
Moving onto [2] to install the rook-ceph krew plugin...

kelson@quorra:~$ kubectl krew install rook-ceph

Updated the local copy of plugin index.
Installing plugin: rook-ceph
Installed plugin: rook-ceph
\
 | Use this plugin:
 | 	kubectl rook-ceph
 | Documentation:
 | 	https://github.com/rook/kubectl-rook-ceph
/
WARNING: You installed plugin "rook-ceph" from the krew-index plugin repository.
   These plugins are not audited for security by the Krew maintainers.
   Run them at your own risk.
###################################################################################################
Now using [3]... to restore a deleted CR, in this case, the cephcluster CR.

kelson@quorra:~$ oc get cephcluster
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE    PHASE   MESSAGE                        HEALTH      EXTERNAL   FSID
ocs-storagecluster-cephcluster   /var/lib/rook     3          110m   Ready   Cluster created successfully   HEALTH_OK              a45b02f0-5d1c-4d4a-bbc9-80ca55b64e7d
kelson@quorra:~$ oc delete cephcluster ocs-storagecluster-cephcluster
cephcluster.ceph.rook.io "ocs-storagecluster-cephcluster" deleted
^Ckelson@quorra:~$ oget cephcluster
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE    PHASE      MESSAGE                    HEALTH      EXTERNAL   FSID
ocs-storagecluster-cephcluster   /var/lib/rook     3          111m   Deleting   Deleting the CephCluster   HEALTH_OK              a45b02f0-5d1c-4d4a-bbc9-80ca55b64e7d
kelson@quorra:~$ kubectl rook-ceph restore-deleted cephcluster ocs-storagecluster-cephcluster
Error: operator namespace 'rook-ceph' does not exist. namespaces "rook-ceph" not found
kelson@quorra:~$ kubectl rook-ceph restore-deleted cephcluster ocs-storagecluster-cephcluster -n openshift-storage
Info: Detecting which resources to restore for crd "cephcluster"
Error: Failed to list resources for crd the server could not find the requested resource

Sadly, this process won't work :(. I think we have some confusion here?... the crd 'cephclusters.ceph.rook.io' isn't in a deleting phase... the resource 'cephcluster' and the object 'ocs-storagecluster-cephcluster' is:

kelson@quorra:~$ oc get crd cephclusters.ceph.rook.io -o yaml|less
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.11.3
    operatorframework.io/installed-alongside-aa5a5720474eade5: openshift-storage/ocs-operator.v4.15.3-rhodf
  creationTimestamp: "2024-06-11T18:43:29Z"
  generation: 1
  labels:
    olm.managed: "true"
    operators.coreos.com/ocs-operator.openshift-storage: ""
  name: cephclusters.ceph.rook.io
  resourceVersion: "37524"
  uid: bdbf2fec-cd45-4e54-a976-516d12d3fb84
...
status:
  acceptedNames:
    kind: CephCluster
    listKind: CephClusterList
    plural: cephclusters
    singular: cephcluster
  conditions:
  - lastTransitionTime: "2024-06-11T18:43:29Z"
    message: no conflicts found
    reason: NoConflicts
    status: "True"
    type: NamesAccepted
  - lastTransitionTime: "2024-06-11T18:43:29Z"
    message: the initial names have been accepted
    reason: InitialNamesAccepted
    status: "True"
    type: Established
  storedVersions:
  - v1

^Ckelson@quorra:~$ oget cephcluster
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE    PHASE      MESSAGE                    HEALTH      EXTERNAL   FSID
ocs-storagecluster-cephcluster   /var/lib/rook     3          111m   Deleting   Deleting the CephCluster   HEALTH_OK              a45b02f0-5d1c-4d4a-bbc9-80ca55b64e7d

Can you clarify my ignorance? Does the CRD 'cephclusters.ceph.rook.io' get passed to the 'StorageCluster' and then the StorageCluster uses this to create the resource 'cephcluster' and the object 'ocs-storagecluster-cephcluster'? 

[1] https://krew.sigs.k8s.io/docs/user-guide/setup/install/
[2] https://github.com/rook/kubectl-rook-ceph/tree/master
[3] https://github.com/rook/kubectl-rook-ceph/blob/master/docs/crd.md

Comment 15 kelwhite 2024-06-11 21:34:57 UTC

Parth,

This is the downstream testing [1]...

###################################################################################################
kelson@quorra:~$ cd git/
kelson@quorra:~/git$ git clone https://github.com/red-hat-storage/odf-cli.git
Cloning into 'odf-cli'...
remote: Enumerating objects: 408, done.
remote: Counting objects: 100% (165/165), done.
remote: Compressing objects: 100% (84/84), done.
remote: Total 408 (delta 98), reused 86 (delta 80), pack-reused 243
Receiving objects: 100% (408/408), 168.72 KiB | 1.17 MiB/s, done.
Resolving deltas: 100% (190/190), done.
kelson@quorra:~/git$ cd odf-cli/ && make build
gofmt -w ./pkg/rook/logs.go ./pkg/rook/osd/osd.go ./cmd/odf/subvolume/subvolume.go ./cmd/odf/restore/crds.go ./cmd/odf/restore/mon_quorum.go ./cmd/odf/restore/restore.go ./cmd/odf/maintenance/maintenance.go ./cmd/odf/maintenance/start.go ./cmd/odf/maintenance/stop.go ./cmd/odf/set/set.go ./cmd/odf/set/log_level.go ./cmd/odf/set/set_recovery_profile.go ./cmd/odf/set/backfillfull_ratio.go ./cmd/odf/set/full_ratio.go ./cmd/odf/set/nearfull_ratio.go ./cmd/odf/set/ceph.go ./cmd/odf/get/health.go ./cmd/odf/get/rook_status.go ./cmd/odf/get/mon_endpoints.go ./cmd/odf/get/get.go ./cmd/odf/get/dr_health.go ./cmd/odf/get/get_recovery_profile.go ./cmd/odf/operator/operator.go ./cmd/odf/operator/rook/set.go ./cmd/odf/operator/rook/restart.go ./cmd/odf/operator/rook/rook.go ./cmd/odf/main.go ./cmd/odf/root/root.go ./cmd/odf/purgeosd/purge_osd.go

env GOOS=linux GOARCH=amd64 go build -o bin/odf  cmd/odf/main.go
go: downloading github.com/rook/kubectl-rook-ceph v0.9.0
go: downloading github.com/spf13/cobra v1.8.0
go: downloading github.com/pkg/errors v0.9.1
go: downloading github.com/rook/rook v1.14.5
go: downloading k8s.io/apimachinery v0.29.3
go: downloading k8s.io/client-go v0.29.3
go: downloading github.com/spf13/pflag v1.0.5
go: downloading github.com/fatih/color v1.16.0
go: downloading gopkg.in/yaml.v3 v3.0.1
go: downloading github.com/golang/mock v1.6.0
go: downloading k8s.io/api v0.29.3
go: downloading github.com/rook/rook/pkg/apis v0.0.0-20240327171914-dc534051324b
go: downloading github.com/imdario/mergo v0.3.16
go: downloading golang.org/x/term v0.18.0
go: downloading k8s.io/klog/v2 v2.120.1
go: downloading golang.org/x/net v0.23.0
go: downloading k8s.io/utils v0.0.0-20240310230437-4693a0247e57
go: downloading github.com/gogo/protobuf v1.3.2
go: downloading github.com/google/gofuzz v1.2.0
go: downloading sigs.k8s.io/yaml v1.4.0
go: downloading sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd
go: downloading github.com/golang/protobuf v1.5.4
go: downloading github.com/google/gnostic-models v0.6.8
go: downloading sigs.k8s.io/structured-merge-diff/v4 v4.4.1
go: downloading github.com/gorilla/websocket v1.5.1
go: downloading golang.org/x/time v0.5.0
go: downloading golang.org/x/oauth2 v0.18.0
go: downloading gopkg.in/inf.v0 v0.9.1
go: downloading github.com/mattn/go-colorable v0.1.13
go: downloading github.com/mattn/go-isatty v0.0.20
go: downloading golang.org/x/sys v0.18.0
go: downloading k8s.io/kube-openapi v0.0.0-20240322212309-b815d8309940
go: downloading github.com/hashicorp/vault/api v1.12.2
go: downloading github.com/k8snetworkplumbingwg/network-attachment-definition-client v1.6.0
go: downloading github.com/kube-object-storage/lib-bucket-provisioner v0.0.0-20221122204822-d1a8c34382f1
go: downloading github.com/libopenstorage/secrets v0.0.0-20231011182615-5f4b25ceede1
go: downloading github.com/openshift/api v0.0.0-20240328065759-f8aa75d189e1
go: downloading github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc
go: downloading github.com/go-logr/logr v1.4.1
go: downloading google.golang.org/protobuf v1.33.0
go: downloading github.com/moby/spdystream v0.2.0
go: downloading golang.org/x/text v0.14.0
go: downloading github.com/json-iterator/go v1.1.12
go: downloading gopkg.in/yaml.v2 v2.4.0
go: downloading github.com/cenkalti/backoff/v3 v3.2.2
go: downloading github.com/hashicorp/errwrap v1.1.0
go: downloading github.com/go-jose/go-jose/v3 v3.0.3
go: downloading github.com/hashicorp/go-cleanhttp v0.5.2
go: downloading github.com/hashicorp/go-multierror v1.1.1
go: downloading github.com/hashicorp/go-retryablehttp v0.7.5
go: downloading github.com/hashicorp/go-rootcerts v1.0.2
go: downloading github.com/hashicorp/go-secure-stdlib/parseutil v0.1.8
go: downloading github.com/hashicorp/go-secure-stdlib/strutil v0.1.2
go: downloading github.com/hashicorp/hcl v1.0.1-vault-5
go: downloading github.com/mitchellh/mapstructure v1.5.0
go: downloading github.com/sirupsen/logrus v1.9.3
go: downloading github.com/google/uuid v1.6.0
go: downloading github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822
go: downloading github.com/mxk/go-flowrate v0.0.0-20140419014527-cca7078d478f
go: downloading github.com/go-openapi/jsonreference v0.21.0
go: downloading github.com/go-openapi/swag v0.23.0
go: downloading github.com/containernetworking/cni v1.1.2
go: downloading github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd
go: downloading github.com/hashicorp/vault/api/auth/approle v0.6.0
go: downloading github.com/hashicorp/vault/api/auth/kubernetes v0.6.0
go: downloading github.com/ryanuber/go-glob v1.0.0
go: downloading github.com/emicklei/go-restful/v3 v3.12.0
go: downloading github.com/modern-go/reflect2 v1.0.2
go: downloading github.com/hashicorp/go-sockaddr v1.0.6
go: downloading github.com/go-openapi/jsonpointer v0.21.0
go: downloading github.com/mailru/easyjson v0.7.7
go: downloading golang.org/x/crypto v0.21.0
go: downloading github.com/josharian/intern v1.0.0
kelson@quorra:~/git/odf-cli$ ./bin/odf -h
Management and troubleshooting tools for ODF clusters.

Usage:
  odf [command]

Available Commands:
  get         Get ODF configuration
  help        Help about any command
  purge-osd   Permanently remove an OSD from the cluster.
  set         Set ODF configuration
  subvolume   Manages subvolumes

Flags:
      --context string              Openshift context to use
  -h, --help                        help for odf
      --kubeconfig string           Openshift config path
  -n, --namespace string            Openshift namespace where the StorageCluster CR is created (default "openshift-storage")
      --operator-namespace string   Openshift namespace where the ODF operator is running

Use "odf [command] --help" for more information about a command.

Reviewing the available commands and the doc [2] section of the github, I was hoping to have an option like 'restore-deleted', but I don't see anything that would assist in this. Can you elaborate on what options would help?

[1] https://github.com/red-hat-storage/odf-cli?tab=readme-ov-file#odf-cli

Comment 16 kelwhite 2024-06-11 21:35:45 UTC

Woops, forgot [2].

[2] https://github.com/red-hat-storage/odf-cli/tree/main/docs

Comment 18 kelwhite 2024-06-12 13:51:39 UTC

Parth,

Ah, 'restore' isn't a part of the help output but 'deleted' is once you pass 'restore', can this be an informal request to add it?:

###################################################################################################
kelson@quorra:~$ odf
Management and troubleshooting tools for ODF clusters.

Usage:
  odf [command]

Available Commands:
  get         Get ODF configuration
  help        Help about any command
  purge-osd   Permanently remove an OSD from the cluster.
  set         Set ODF configuration
  subvolume   Manages subvolumes

Flags:
      --context string              Openshift context to use
  -h, --help                        help for odf
      --kubeconfig string           Openshift config path
  -n, --namespace string            Openshift namespace where the StorageCluster CR is created (default "openshift-storage")
      --operator-namespace string   Openshift namespace where the ODF operator is running

Use "odf [command] --help" for more information about a command.
kelson@quorra:~$ odf restore
Usage:
  odf restore [command]

Available Commands:
  deleted     Restores a CR that was accidentally deleted and is still in terminating state.
  mon-quorum  When quorum is lost, restore quorum to the remaining healthy mon

Flags:
  -h, --help   help for restore

Global Flags:
      --context string              Openshift context to use
      --kubeconfig string           Openshift config path
  -n, --namespace string            Openshift namespace where the StorageCluster CR is created (default "openshift-storage")
      --operator-namespace string   Openshift namespace where the ODF operator is running

Use "odf restore [command] --help" for more information about a command.

###################################################################################################
Anyways, here are my results using [1]:

kelson@quorra:~$ oc get cephcluster
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE      MESSAGE                    HEALTH      EXTERNAL   FSID
ocs-storagecluster-cephcluster   /var/lib/rook     3          18h   Deleting   Deleting the CephCluster   HEALTH_OK              a45b02f0-5d1c-4d4a-bbc9-80ca55b64e7d
kelson@quorra:~$ odf restore deleted cephcluster
Info: Detecting which resources to restore for crd "cephcluster"
Error: Failed to list resources for crd the server could not find the requested resource
kelson@quorra:~$ odf restore deleted cephcluster ocs-storagecluster-cephcluster 
Info: Detecting which resources to restore for crd "cephcluster"
Error: Failed to list resources for crd the server could not find the requested resource
kelson@quorra:~$ odf restore deleted cephcluster ocs-storagecluster-cephcluster -n openshift-storage
Error: accepts between 1 and 2 arg(s), received 4
Usage:
  odf restore deleted [flags]

Examples:
odf restore deleted <CRD> [CRNAME]

Flags:
  -h, --help   help for deleted

Global Flags:
      --context string              Openshift context to use
      --kubeconfig string           Openshift config path
  -n, --namespace string            Openshift namespace where the StorageCluster CR is created (default "openshift-storage")
      --operator-namespace string   Openshift namespace where the ODF operator is running

Error: accepts between 1 and 2 arg(s), received 4
kelson@quorra:~$ odf -n openshift-storage restore deleted cephcluster
Info: Detecting which resources to restore for crd "cephcluster"
Error: Failed to list resources for crd the server could not find the requested resource
kelson@quorra:~$ odf -n openshift-storage restore deleted cephcluster ocs-storagecluster-cephcluster
Info: Detecting which resources to restore for crd "cephcluster"
Error: Failed to list resources for crd the server could not find the requested resource

[1] https://github.com/red-hat-storage/odf-cli/blob/main/docs/restore.md#deleted

Comment 20 kelwhite 2024-06-13 13:23:24 UTC

Parth,

I don't understand the ask. The cephcluster is there:

kelson@quorra:~$ oc get cephcluster
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE      MESSAGE                    HEALTH      EXTERNAL   FSID
ocs-storagecluster-cephcluster   /var/lib/rook     3          18h   Deleting   Deleting the CephCluster   HEALTH_OK

Comment 22 kelwhite 2024-06-14 12:58:26 UTC

Parth,

I can upload the rook-ceph-opeator log, but it's mainly just spam of these two lines and doesn't seem to be very helpful:

2024-06-14 12:44:29.905882 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". CephCluster "openshift-storage/ocs-storagecluster-cephcluster" will no
t be deleted until all dependents are removed: CephBlockPool: [ocs-storagecluster-cephblockpool ocs-storagecluster-cephblockpool-us-east-2a ocs-storagecluster-cephblockpool-us-east-2b ocs-storagecluster-cephblo
ckpool-us-east-2c], CephFilesystem: [ocs-storagecluster-cephfilesystem], CephFilesystemSubVolumeGroup: [ocs-storagecluster-cephfilesystem-csi]
2024-06-14 12:44:39.981115 I | ceph-cluster-controller: CephCluster "openshift-storage/ocs-storagecluster-cephcluster" will not be deleted until all dependents are removed: CephBlockPool: [ocs-storagecluster-ce
phblockpool ocs-storagecluster-cephblockpool-us-east-2a ocs-storagecluster-cephblockpool-us-east-2b ocs-storagecluster-cephblockpool-us-east-2c], CephFilesystem: [ocs-storagecluster-cephfilesystem], CephFilesys
temSubVolumeGroup: [ocs-storagecluster-cephfilesystem-csi]
2024-06-14 12:51:14.094352 I | ceph-cluster-controller: CephCluster "openshift-storage/ocs-storagecluster-cephcluster" will not be deleted until all dependents are removed: CephBlockPool: [ocs-storagecluster-cephblockpool ocs-storagecluster-cephblockpool-us-east-2a ocs-storagecluster-cephblockpool-us-east-2b ocs-storagecluster-cephblockpool-us-east-2c], CephFilesystem: [ocs-storagecluster-cephfilesystem], CephFilesystemSubVolumeGroup: [ocs-storagecluster-cephfilesystem-csi]

Without the two lines, this is what is in there from the past 3 days:

2024-06-11 21:00:22.060166 I | cephclient: crush rule "ocs-storagecluster-cephfilesystem-data0_zone" will no longer be used by pool "ocs-storagecluster-cephfilesystem-data0"
2024-06-11 21:00:22.422982 I | ceph-block-pool-controller: successfully initialized pool "ocs-storagecluster-cephblockpool-us-east-2c" for RBD use
2024-06-11 21:00:22.751302 I | op-config: setting "mgr"="mgr/prometheus/rbd_stats_pools"="ocs-storagecluster-cephblockpool,ocs-storagecluster-cephblockpool-us-east-2a,ocs-storagecluster-cephblockpool-us-east-2b,ocs-storagecluster-cephblockpool-us-east-2c" option to the mon configuration database
2024-06-11 21:00:22.775796 I | cephclient: setting allow_standby_replay to true for filesystem "ocs-storagecluster-cephfilesystem"
2024-06-11 21:00:23.112025 I | op-config: successfully set "mgr"="mgr/prometheus/rbd_stats_pools"="ocs-storagecluster-cephblockpool,ocs-storagecluster-cephblockpool-us-east-2a,ocs-storagecluster-cephblockpool-us-east-2b,ocs-storagecluster-cephblockpool-us-east-2c" option to the mon configuration database
2024-06-11 21:00:23.129929 I | ceph-spec: parsing mon endpoints: c=172.30.194.31:3300,a=172.30.172.228:3300,b=172.30.177.124:3300
2024-06-11 21:00:23.445093 I | ceph-block-pool-controller: creating pool "ocs-storagecluster-cephblockpool" in namespace "openshift-storage"
2024-06-11 21:00:24.120569 I | cephclient: setting pool property "target_size_ratio" to "0.49" on pool "ocs-storagecluster-cephblockpool"
2024-06-11 21:00:24.128816 I | cephclient: creating cephfs "ocs-storagecluster-cephfilesystem" subvolume group "csi"
2024-06-11 21:00:24.497021 I | cephclient: successfully created cephfs "ocs-storagecluster-cephfilesystem" subvolume group "csi"
2024-06-11 21:00:25.462787 I | cephclient: application "rbd" is already set on pool "ocs-storagecluster-cephblockpool"
2024-06-11 21:00:25.462805 I | cephclient: reconciling replicated pool ocs-storagecluster-cephblockpool succeeded
2024-06-11 21:00:26.114878 I | cephclient: creating a new crush rule for changed deviceClass on crush rule "ocs-storagecluster-cephblockpool_zone"
2024-06-11 21:00:26.114900 I | cephclient: updating pool "ocs-storagecluster-cephblockpool" failure domain from "zone" to "zone" with new crush rule "ocs-storagecluster-cephblockpool_zone_replicated"
2024-06-11 21:00:26.114922 I | cephclient: crush rule "ocs-storagecluster-cephblockpool_zone" will no longer be used by pool "ocs-storagecluster-cephblockpool"
2024-06-11 21:00:26.433014 I | ceph-block-pool-controller: initializing pool "ocs-storagecluster-cephblockpool" for RBD use
2024-06-11 21:00:27.179280 I | ceph-block-pool-controller: successfully initialized pool "ocs-storagecluster-cephblockpool" for RBD use
2024-06-11 21:00:27.514821 I | op-config: setting "mgr"="mgr/prometheus/rbd_stats_pools"="ocs-storagecluster-cephblockpool,ocs-storagecluster-cephblockpool-us-east-2a,ocs-storagecluster-cephblockpool-us-east-2b,ocs-storagecluster-cephblockpool-us-east-2c" option to the mon configuration database
2024-06-11 21:00:27.837586 I | op-config: successfully set "mgr"="mgr/prometheus/rbd_stats_pools"="ocs-storagecluster-cephblockpool,ocs-storagecluster-cephblockpool-us-east-2a,ocs-storagecluster-cephblockpool-us-east-2b,ocs-storagecluster-cephblockpool-us-east-2c" option to the mon configuration database
2024-06-12 16:08:26.864969 I | operator: rook-ceph-operator-config-controller done reconciling
2024-06-12 18:47:50.714935 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-61-40.us-east-2.compute.internal will be 97867f1f29574478396efda2762f4874
2024-06-12 18:47:50.786739 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-27-19.us-east-2.compute.internal will be 9429fe593320279547d9f63557097d76
2024-06-12 18:47:50.857332 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-80-144.us-east-2.compute.internal will be 96400daacdccad86f567c6afdfb1d827
2024-06-13 01:42:36.692557 I | operator: rook-ceph-operator-config-controller done reconciling
2024-06-13 05:41:42.115458 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-80-144.us-east-2.compute.internal will be 96400daacdccad86f567c6afdfb1d827
2024-06-13 05:41:42.191647 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-61-40.us-east-2.compute.internal will be 97867f1f29574478396efda2762f4874
2024-06-13 05:41:42.245717 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-27-19.us-east-2.compute.internal will be 9429fe593320279547d9f63557097d76
2024-06-13 11:16:46.521944 I | operator: rook-ceph-operator-config-controller done reconciling
2024-06-13 16:35:33.514972 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-80-144.us-east-2.compute.internal will be 96400daacdccad86f567c6afdfb1d827
2024-06-13 16:35:33.582297 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-61-40.us-east-2.compute.internal will be 97867f1f29574478396efda2762f4874
2024-06-13 16:35:33.648855 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-27-19.us-east-2.compute.internal will be 9429fe593320279547d9f63557097d76
2024-06-13 20:50:56.352150 I | operator: rook-ceph-operator-config-controller done reconciling
2024-06-14 03:29:24.914869 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-61-40.us-east-2.compute.internal will be 97867f1f29574478396efda2762f4874
2024-06-14 03:29:24.992146 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-27-19.us-east-2.compute.internal will be 9429fe593320279547d9f63557097d76
2024-06-14 03:29:25.053954 I | op-k8sutil: format and nodeName longer than 63 chars, nodeName ip-10-0-80-144.us-east-2.compute.internal will be 96400daacdccad86f567c6afdfb1d827
2024-06-14 06:25:06.180562 I | operator: rook-ceph-operator-config-controller done reconciling


I've tried the manual method on two clusters and it failed. The steps for this testing are above in c#6 and 10. c#6 was with the ocs-operator deployment scaled up and #10 is when it was scaled down.

Comment 24 kelwhite 2024-07-02 16:05:59 UTC

Travis,

Sorry, been on/off PTO the past few weeks... I'll test this again following the first bullet: remove the deletionTimestamp and other metadata in the backup cr. I'm going on PTO again on Thursday, so hopefully, I can knock this testing out before then.

Comment 29 Soumi Mitra 2024-07-23 08:01:35 UTC

Hi Parth,

Thanks for looking into the bug and testing the steps

I have done another round of testing in a VMware OCP+LSO+ODF 4.12 environment and shared the results with the cluster. Ceph cluster was created successfully following the steps https://www.rook.io/docs/rook/v1.14/Troubleshooting/disaster-recovery/#restoring-crds-after-deletion

Have shared the commands and steps with the customer, waiting for them to execute them in their setup.

For now, there are no action for support and Engineering until customer execute the steps

Will update you if there are anything from customer,thanks.

Regards,
Soumi

Comment 32 khover 2024-10-13 11:19:18 UTC

ODF cli Failed on ODF 4.15 

RH Case 03957493

Note You need to log in before you can comment on or make changes to this bug.