Description of problem (please be detailed as possible and provide log snippests): Following the docs to set drcluster annotations: https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.14/html/configuring_openshift_data_foundation_disaster_recovery_for_openshift_workloads/metro-dr-solution#add-fencing-annotations-to-drclusters_mdr The instructions use a generic secret name for all drclusters: drcluster.ramendr.openshift.io/storage-secret-name: rook-csi-rbd-provisioner However this caused Fencing of the primary cluster to immediately fail in my case: # oc describe NetworkFence -A --context dc2 Name: network-fence-dc1 [...] Secret: Name: rook-csi-rbd-provisioner Namespace: openshift-storage Status: Message: rpc error: code = InvalidArgument desc = secrets "rook-csi-rbd-provisioner" not found I could only get to a Fenced state by editing drcluster dc1 to use the dc2 secret name: # oc --context dc1 get secret -A | grep rook-csi-rbd-provisioner openshift-storage rook-csi-rbd-provisioner-cluster1-rbdpool Opaque [...] # oc --context dc2 get secret -A | grep rook-csi-rbd-provisioner openshift-storage rook-csi-rbd-provisioner-cluster2-rbdpool Opaque [...] # oc edit drcluster dc1 ## I configured dc2 secret name: drcluster.ramendr.openshift.io/storage-secret-name: rook-csi-rbd-provisioner-cluster2-rbdpool Note, I think this is because when I ran the ceph-external-cluster-details-exporter.py script to configure the connection to the external RHCS, I used "--cluster-name cluster[1|2] --restricted-auth-permission true" to differentiate between the two clusters as mentioned in: https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.14/html/deploying_openshift_data_foundation_in_external_mode/deploy-openshift-data-foundation-using-red-hat-ceph-storage#creating-an-openshift-data-foundation-cluster-service-for-external-storage_ceph-external This is because if I used "--run-as-user client.odf.cluster1" as some docs suggested I got this error in the ceph admin journal: ceph-mon[16028]: cephx server client.odf.cluster1: couldn't find entity name: client.odf.cluster1 So if it is a valid option to use --cluster-name when configuring an external cluster, it seems that the ramendr storage-secret-name should either be generic so all clusters can reference the same annotation, or if that is not possible the docs should mention that annotation on the primary drcluster should reference the secondary drcluster secret name (and vise versa?)? Version of all relevant components (if applicable): OCP 4.14.6 4.14.3-rhodf RHCS 6.1 on RHEL 9.2 nodes Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? The procedure to debug the Fencing failure was not clear at first, but I have recovered. Is there any workaround available to the best of your knowledge? Yes I worked around by changing the annotation. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: No Steps to Reproduce: 1. Create external ceph connections using --cluster-name 2. Follow DR annotation docs 3. Fencing errors will occur Actual results: Fencing error due to generic secret name Expected results: Clear docs
I ran into issues w/ my previous workaround when trying to Failover an application: Warning FailedMount 106s (x3 over 5m53s) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition Warning FailedMount 61s (x4 over 7m7s) kubelet MountVolume.MountDevice failed for volume "pvc-672bac8d-e076-425f-83f6-ad763491ab17" : fetching NodeStageSecretRef openshift-storage/rook-csi-rbd-node-cluster1-rbdpool failed: kubernetes.io/csi: failed to find the secret rook-csi-rbd-node-cluster1-rbdpool in the namespace openshift-storage with error: secrets "rook-csi-rbd-node-cluster1-rbdpool" not found I would like to confirm if --cluster-name is incompatible w/ DR, or if this secret name needs adjustment?
If I follow the current MetroDR docs exactly and use --run-as-user when running the ceph-external-cluster-details-exporter.py script, I reproduce this bug: https://bugzilla.redhat.com/show_bug.cgi?id=2254159
Just to note: I had installed the latest 4.14 ODF Multicluster & Hub operators when I hit this "drcluster.ramendr.openshift.io/storage-secret-name" secret issue.
Moving this bug out of 4.16 as we have not worked on the doc changes yet. TODO: update the MDR docs to illustrate what names of the rbd provisioner secret to use to annotate the drcluster. The current doc only works if no cluster name if provided while configuring ODF with the external cluster.