Description of problem (please be detailed as possible and provide log snippests): Ramen-dr-cluster-operator is not deployed on managed clusters after applying drpolicy and the drpolicy status show the managed clusters as DRClustersUnavailable. Only volsync csv is deployed on managed clusters. hub: # oc get drpolicy ocsm4205001-ocpm4202001 -oyaml apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy metadata: creationTimestamp: "2023-09-13T08:29:51Z" generation: 1 name: ocsm4205001-ocpm4202001 resourceVersion: "851658" uid: 91c92432-0e73-4e5f-a962-aa324487dab0 spec: drClusters: - ocsm4205001 - ocpm4202001 schedulingInterval: 0m status: conditions: - lastTransitionTime: "2023-09-13T08:35:20Z" message: none of the DRClusters are validated ([ocsm4205001 ocpm4202001]) observedGeneration: 1 reason: DRClustersUnavailable status: "False" type: Validated [root@a3e25001 ~]# oc get drclusters -A NAME AGE ocpm4202001 101m ocsm4205001 101m [root@a3e25001 ~]# oc describe drclusters ocsm4205001 Name: ocsm4205001 Namespace: Labels: cluster.open-cluster-management.io/backup=resource Annotations: <none> API Version: ramendr.openshift.io/v1alpha1 Kind: DRCluster Metadata: Creation Timestamp: 2023-09-13T08:35:20Z Finalizers: drclusters.ramendr.openshift.io/ramen Generation: 1 Resource Version: 851682 UID: f5213978-ead9-4f9f-9c36-0fd50b511225 Spec: Region: 778d5284-ddf7-11ed-a790-525400c41d12 s3ProfileName: s3profile-ocsm4205001-ocs-external-storagecluster Status: Conditions: Last Transition Time: 2023-09-13T08:35:20Z Message: Cluster Clean Observed Generation: 1 Reason: Clean Status: False Type: Fenced Last Transition Time: 2023-09-13T08:35:20Z Message: Cluster Clean Observed Generation: 1 Reason: Clean Status: True Type: Clean Last Transition Time: 2023-09-13T08:35:21Z Message: DRCluster ManifestWork is not in applied state Observed Generation: 1 Reason: DrClustersDeployStatusCheckFailed Status: False Type: Validated Phase: Available Events: <none> [root@a3e25001 ~]# [root@a3e25001 ~]# oc describe drclusters ocpm4202001 Name: ocpm4202001 Namespace: Labels: cluster.open-cluster-management.io/backup=resource Annotations: <none> API Version: ramendr.openshift.io/v1alpha1 Kind: DRCluster Metadata: Creation Timestamp: 2023-09-13T08:35:20Z Finalizers: drclusters.ramendr.openshift.io/ramen Generation: 1 Resource Version: 851719 UID: be820d41-47a7-462b-bb0d-070934778443 Spec: Region: 778d5284-ddf7-11ed-a790-525400c41d12 s3ProfileName: s3profile-ocpm4202001-ocs-external-storagecluster Status: Conditions: Last Transition Time: 2023-09-13T08:35:21Z Message: Cluster Clean Observed Generation: 1 Reason: Clean Status: False Type: Fenced Last Transition Time: 2023-09-13T08:35:21Z Message: Cluster Clean Observed Generation: 1 Reason: Clean Status: True Type: Clean Last Transition Time: 2023-09-13T08:35:22Z Message: DRCluster ManifestWork is not in applied state Observed Generation: 1 Reason: DrClustersDeployStatusCheckFailed Status: False Type: Validated Phase: Available Events: <none> [root@a3e25001 ~]# mc1: [root@m4205001 ~]# oc get csv,pod -n openshift-dr-system NAME DISPLAY VERSION REPLACES PHASE clusterserviceversion.operators.coreos.com/volsync-product.v0.7.4 VolSync 0.7.4 volsync-product.v0.7.3 Succeeded [root@m4205001 ~]# mc2: [root@m4202001 ~]# oc get csv,pod -n openshift-dr-system NAME DISPLAY VERSION REPLACES PHASE clusterserviceversion.operators.coreos.com/volsync-product.v0.7.4 VolSync 0.7.4 volsync-product.v0.7.3 Succeeded [root@m4202001 ~]# Version of all relevant components (if applicable): odf-multicluster-orchestrator: v4.14.0-132.stable odr-hub-operator: v4.14.0-132.stable volsync-product : v0.7.4 odf-operator: v4.14.0-132.stable Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, MDR 4.14 testing is blocked Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes, reproduced it twice consistently Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Configure the Metro DR environment (deploy RHCeph Storage, Install ODF in external mode on Managed clusters, Install ACM , MCO on hub cluster) 2. Configure SSL access across clusters 3. Import managed clusters on the hub cluster 4. Create Drpolicy on the hub cluster All Clusters → Data Services → Data policies->Create DRPolicy 5. Verify that the DRPolicy is created successfully. # oc get drpolicy <drpolicy_name> -o jsonpath='{.status.conditions[].reason}{"\n"}' 6. Verify that the OpenShift DR Cluster operator installation was successful on the Primary managed cluster and the Secondary managed cluster. # oc get csv,pod -n openshift-dr-system Actual results: Ramen-dr-cluster-operator should be deployed on managed clusters after applying drpolicy Expected results: Ramen-dr-cluster-operator is not deployed on managed clusters after applying drpolicy Additional info: https://drive.google.com/file/d/1xOlrW211_hrAKQO5b2H6BcGQ168H8-Re/view?usp=sharing
Created attachment 1988631 [details] ramen-hub-operator.log
Fixed in later than acm-operator-bundle-container-v2.9.0-150 ACM 2.9 internal releases.
With the latest acm-operator-bundle-container-v2.9.0-150 I do not see this anymore and the ramen-dr-cluster-operator is deployed on managed clusters after applying drpolicy. # oc get sa -n open-cluster-management-agent NAME SECRETS AGE builder 1 16h default 1 16h deployer 1 16h klusterlet 1 16h klusterlet-work-sa 1 16h
Observation- --------------- With the latest acm-operator-bundle-container-v2.9.0-165 . ramen-dr-cluster-operator is deployed on managed clusters after deleting and again applying drpolicy. Also ramen-hub-operator restarted on hub and ramen-dr-cluster-operator restarted on primary managed cluster. No ramen-dr-cluster-operator restarts seen on secondary cluster. Steps Performed: --------------------- 1. With existing Metro DR environment deleted all apps(subscription + Appset) and drpolicy. 2. Created new drpolicy with same name. 3. Created subscription set app. 4. Applied drpolicy tp app. hub% oc get pods -n openshift-operators NAME READY STATUS RESTARTS AGE odf-multicluster-console-854b88488b-q6tpn 1/1 Running 0 4d21h odfmo-controller-manager-8585fbddb8-jctpj 1/1 Running 2 (45h ago) 4d21h ramen-hub-operator-7dc77db778-szcw4 2/2 Running 1 (45h ago) 4d21h clust1% oc get csv,pod -n openshift-dr-system NAME DISPLAY VERSION REPLACES PHASE clusterserviceversion.operators.coreos.com/odr-cluster-operator.v4.14.0-146.stable Openshift DR Cluster Operator 4.14.0-146.stable odr-cluster-operator.v4.14.0-139.stable Succeeded clusterserviceversion.operators.coreos.com/volsync-product.v0.7.4 VolSync 0.7.4 volsync-product.v0.7.3 Succeeded NAME READY STATUS RESTARTS AGE pod/ramen-dr-cluster-operator-9c78ffc78-xqz9m 2/2 Running 1 (2d20h ago) 2d20h
Raised new Bugzilla for issue mentioned in comment 12. https://bugzilla.redhat.com/show_bug.cgi?id=2245230 Hence marking this BZ as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832