Bug 2259033 - [MDR] Drcluster annotations doc need to be more generic
Summary: [MDR] Drcluster annotations doc need to be more generic
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.14
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Raghavendra Talur
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-01-18 18:36 UTC by Jenifer Abrams
Modified: 2024-05-06 20:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Jenifer Abrams 2024-01-18 18:36:52 UTC
Description of problem (please be detailed as possible and provide log
snippests):
Following the docs to set drcluster annotations:
https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.14/html/configuring_openshift_data_foundation_disaster_recovery_for_openshift_workloads/metro-dr-solution#add-fencing-annotations-to-drclusters_mdr

The instructions use a generic secret name for all drclusters:
drcluster.ramendr.openshift.io/storage-secret-name: rook-csi-rbd-provisioner

However this caused Fencing of the primary cluster to immediately fail in my case:
# oc describe NetworkFence -A --context dc2
Name:         network-fence-dc1
[...]
  Secret:
    Name:       rook-csi-rbd-provisioner
    Namespace:  openshift-storage
Status:
  Message:  rpc error: code = InvalidArgument desc = secrets "rook-csi-rbd-provisioner" not found


I could only get to a Fenced state by editing drcluster dc1 to use the dc2 secret name:

# oc --context dc1 get secret -A | grep rook-csi-rbd-provisioner
openshift-storage                                  rook-csi-rbd-provisioner-cluster1-rbdpool                                                        Opaque                                [...]
# oc --context dc2 get secret -A | grep rook-csi-rbd-provisioner
openshift-storage                                  rook-csi-rbd-provisioner-cluster2-rbdpool                                                        Opaque                                [...]

# oc edit drcluster dc1
## I configured dc2 secret name:
    drcluster.ramendr.openshift.io/storage-secret-name: rook-csi-rbd-provisioner-cluster2-rbdpool

Note, I think this is because when I ran the ceph-external-cluster-details-exporter.py script to configure the connection to the external RHCS, I used "--cluster-name cluster[1|2] --restricted-auth-permission true" to differentiate between the two clusters as mentioned in:
https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.14/html/deploying_openshift_data_foundation_in_external_mode/deploy-openshift-data-foundation-using-red-hat-ceph-storage#creating-an-openshift-data-foundation-cluster-service-for-external-storage_ceph-external

This is because if I used "--run-as-user client.odf.cluster1" as some docs suggested I got this error in the ceph admin journal:
ceph-mon[16028]: cephx server client.odf.cluster1: couldn't find entity name: client.odf.cluster1

So if it is a valid option to use --cluster-name when configuring an external cluster, it seems that the ramendr storage-secret-name should either be generic so all clusters can reference the same annotation, or if that is not possible the docs should mention that annotation on the primary drcluster should reference the secondary drcluster secret name (and vise versa?)?

Version of all relevant components (if applicable):
OCP 4.14.6
4.14.3-rhodf
RHCS 6.1 on RHEL 9.2 nodes

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
The procedure to debug the Fencing failure was not clear at first, but I have recovered.

Is there any workaround available to the best of your knowledge?
Yes I worked around by changing the annotation. 

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
No

Steps to Reproduce:
1. Create external ceph connections using --cluster-name
2. Follow DR annotation docs
3. Fencing errors will occur


Actual results:
Fencing error due to generic secret name


Expected results:
Clear docs

Comment 2 Jenifer Abrams 2024-01-18 21:04:12 UTC
I ran into issues w/ my previous workaround when trying to Failover an application:

  Warning  FailedMount  106s (x3 over 5m53s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
  Warning  FailedMount  61s (x4 over 7m7s)    kubelet            MountVolume.MountDevice failed for volume "pvc-672bac8d-e076-425f-83f6-ad763491ab17" : fetching NodeStageSecretRef openshift-storage/rook-csi-rbd-node-cluster1-rbdpool failed: kubernetes.io/csi: failed to find the secret rook-csi-rbd-node-cluster1-rbdpool in the namespace openshift-storage with error: secrets "rook-csi-rbd-node-cluster1-rbdpool" not found


I would like to confirm if --cluster-name is incompatible w/ DR, or if this secret name needs adjustment?

Comment 3 Jenifer Abrams 2024-01-19 20:58:52 UTC
If I follow the current MetroDR docs exactly and use --run-as-user when running the ceph-external-cluster-details-exporter.py script, I reproduce this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2254159

Comment 5 Jenifer Abrams 2024-01-22 17:21:58 UTC
Just to note: I had installed the latest 4.14 ODF Multicluster & Hub operators when I hit this "drcluster.ramendr.openshift.io/storage-secret-name" secret issue.

Comment 6 Raghavendra Talur 2024-05-06 20:08:32 UTC
Moving this bug out of 4.16 as we have not worked on the doc changes yet. 

TODO: update the MDR docs to illustrate what names of the rbd provisioner secret to use to annotate the drcluster. The current doc only works if no cluster name if provided while configuring ODF with the external cluster.


Note You need to log in before you can comment on or make changes to this bug.