Description of problem (please be detailed as possible and provide log snippests): When MCO and ODF-DR are installed using the UI method, the managed clusters fail to upload the cluster data to the s3 store. On inspecting, it is found that the dr operator on the managed clusters aren't able to upload data to the s3 store due to login error. This is because when the MCO adds the s3 profiles to the ramen hub configmap, it sets the namespace of the secret as "openshift-operators" but Ramen creates the secret in the "openshift-dr-system" namespace on the managed clusters. We need to decide in general what our preference for the namespace is. Are we migrating to "openshift-operators" or do we still want it to be "openshift-dr-system" Version of all relevant components (if applicable): 4.11 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, PV data cannot be backed up to the s3 store and failover cannot be performed. Is there any workaround available to the best of your knowledge? Yes, copy the secrets found in the "openshift-dr-system" namespace to the openshift-operators namespace on the managed clusters. Can this issue reproducible? 100% Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Not a regression
This is an intended behavior on part of MCO since 4.11 (https://github.com/red-hat-storage/odf-multicluster-orchestrator/pull/83) We have added ramen as dependency and it will be installed in the same namespace as MCO. Only recently we had removed the hardcoded 'openshift-dr-system' namespace (https://github.com/red-hat-storage/odf-multicluster-orchestrator/pull/113) IMO ideally we should not hardcode the namespace and attempt to use the namespace where the operators are installed
S3 store secrets should be in the same namespace as ramen operator. This helps reduce required RBACs to access secrets outside the same namespace, hence in this case the S3 secrets should have been present in the "openshift-dr-system" namespace on the managed clusters. If desired the ramen config can be edited to amend the namespace to use on the managed cluster, by specifying the same here [1]. [1] namespace to install dr-cluster components to: https://github.com/RamenDR/ramen/blob/7dca77a0ce4543f10babebb3a7aa4bf63a92e636/api/v1alpha1/ramenconfig_types.go#L123-L124
the required changes for this from Ramen side has been merged, this is the PR https://github.com/RamenDR/ramen/pull/487
Tested with 3 OCP clusters, say hub, c1, and c2 Version: OCP: 4.11.0-0.nightly-2022-07-29-173905 ODF: 4.11.0-129 CEPH: 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable) ACM: 2.5.1 Following doc [1] created MDR stretched cluster, did not see any issue on failing to upload the cluster data to the s3 store, that is secrets were copied to openshift-dr-system namespace in managedclusters Was able to deploy an application on the c1 cluster and continue with failover without any issue. Snippet output: From c1: $ oc get obc -A NAMESPACE NAME STORAGE-CLASS PHASE AGE openshift-storage odrbucket-3c4fbcc69dbd openshift-storage.noobaa.io Bound 27h $ oc get secrets -n openshift-dr-system | grep -i opaque 0885841697f57a5133d400f8654818d7ec659ee Opaque 2 27h 7e3d79a2e3deda51d9dd3fc966559d56f29de8e Opaque 2 27h $ oc get csv,pod -n openshift-dr-system NAME DISPLAY VERSION REPLACES PHASE clusterserviceversion.operators.coreos.com/odr-cluster-operator.v4.11.0 Openshift DR Cluster Operator 4.11.0 Succeeded NAME READY STATUS RESTARTS AGE pod/ramen-dr-cluster-operator-5cf4b84f44-6gmxb 2/2 Running 0 6m17s $ oc get pvc,pods -n busybox-rbd NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-rbd-pvc Bound pvc-5f5a8f41-2d88-4133-95a4-be191611aa00 5Gi RWO ocs-external-storagecluster-ceph-rbd 111s NAME READY STATUS RESTARTS AGE pod/busybox-rbd-67dff9bc87-lzh2x 1/1 Running 0 111s $ oc get vrg -n busybox-cephfs NAME DESIREDSTATE CURRENTSTATE busybox-cephfs-placement-1-drpc primary Primary $ oc get vrg -n busybox-rbd NAME DESIREDSTATE CURRENTSTATE busybox-rbd-placement-1-drpc primary Primary $ oc get pvc,pods -n busybox-cephfs NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-cephfs-pvc Bound pvc-5d722a7a-d00e-40eb-813e-c801f596be1e 5Gi RWO ocs-external-storagecluster-cephfs 24s NAME READY STATUS RESTARTS AGE pod/busybox-cephfs-76d7f55cb7-7vjwf 1/1 Running 0 24s From c2: $ oc get obc -A NAMESPACE NAME STORAGE-CLASS PHASE AGE openshift-storage odrbucket-3c4fbcc69dbd openshift-storage.noobaa.io Bound 27h $ oc get secrets -n openshift-dr-system | grep -i opaque 0885841697f57a5133d400f8654818d7ec659ee Opaque 2 27h 7e3d79a2e3deda51d9dd3fc966559d56f29de8e Opaque 2 27h $ oc get csv,pod -n openshift-dr-system NAME DISPLAY VERSION REPLACES PHASE clusterserviceversion.operators.coreos.com/odr-cluster-operator.v4.11.0 Openshift DR Cluster Operator 4.11.0 Succeeded NAME READY STATUS RESTARTS AGE pod/ramen-dr-cluster-operator-5cf4b84f44-bhm2b 2/2 Running 0 6m27s From hub: $ oc get secrets -n openshift-operators | grep -i opaque 0885841697f57a5133d400f8654818d7ec659ee Opaque 2 27h 7e3d79a2e3deda51d9dd3fc966559d56f29de8e Opaque 2 27h $ oc get cm -n openshift-operators ramen-hub-operator-config -oyaml apiVersion: v1 data: ramen_manager_config.yaml: | apiVersion: ramendr.openshift.io/v1alpha1 drClusterOperator: catalogSourceName: redhat-operators catalogSourceNamespaceName: openshift-marketplace channelName: stable-4.11 clusterServiceVersionName: odr-cluster-operator.v4.11.0 deploymentAutomationEnabled: true namespaceName: openshift-dr-system packageName: odr-cluster-operator s3SecretDistributionEnabled: true health: healthProbeBindAddress: :8081 kind: RamenConfig leaderElection: leaderElect: true leaseDuration: 0s renewDeadline: 0s resourceLock: "" resourceName: hub.ramendr.openshift.io resourceNamespace: "" retryPeriod: 0s metrics: bindAddress: 127.0.0.1:9289 ramenControllerType: dr-hub s3StoreProfiles: - s3Bucket: odrbucket-3c4fbcc69dbd s3CompatibleEndpoint: https://s3-openshift-storage.apps.akrai-j31-c2.qe.rh-ocs.com s3ProfileName: s3profile-akrai-j31-c2-ocs-external-storagecluster s3Region: noobaa s3SecretRef: name: 0885841697f57a5133d400f8654818d7ec659ee - s3Bucket: odrbucket-3c4fbcc69dbd s3CompatibleEndpoint: https://s3-openshift-storage.apps.akrai-j31-c1.qe.rh-ocs.com s3ProfileName: s3profile-akrai-j31-c1-ocs-external-storagecluster s3Region: noobaa s3SecretRef: name: 7e3d79a2e3deda51d9dd3fc966559d56f29de8e volSync: {} webhook: port: 9443 kind: ConfigMap metadata: creationTimestamp: "2022-07-31T07:11:04Z" labels: operators.coreos.com/odr-hub-operator.openshift-operators: "" name: ramen-hub-operator-config namespace: openshift-operators ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: ClusterServiceVersion name: odr-hub-operator.v4.11.0 uid: da878392-8a04-4992-9b49-fcd4199317e3 resourceVersion: "329115" uid: 4d4a644d-0bcf-4c99-b2f8-551440bdefb9 Logs collected here: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz2102506/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6156