Description of problem (please be detailed as possible and provide log snippests): The active hub was located at a neutral site. Version of all relevant components (if applicable): OCP 4.14.0-0.nightly-2023-11-06-203803 advanced-cluster-management.v2.9.0-204 ACM 2.9.0-DOWNSTREAM-2023-11-03-14-27-40 Submariner brew.registry.redhat.io/rh-osbs/iib:615928 ODF 4.14.0-161 ceph version 17.2.6-148.el9cp (badc1d27cb07762bea48f6554ad4f92b9d3fbb6b) quincy (stable) Latency 50ms RTT Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. On a RDR setup, configure it for hub recovery with some DR protected workloads running on it. 2. Do all pre-checks such as drpolicy status, sync status, volumereplicationclass, ceph health, mirror status, lastGroupSyncTime, managedclusters -o wide status, alerts, odf pods, drpc yaml, drpc -o wide, etc. 3. After latest backup is taken, bring active hub down. 4. Restore backup on passive hub and ensure both the managed clusters are successfully imported. 7. Wait for DRPolicy to get validated. Check outputs for oc get managedcluster -o wide -A oc get drcluster -o yaml oc get secrets -n openshift-operators and notice the time it takes for DRPolicy to get validated (it is taking 15-20mins in most of the cases which is too long). Actual results: DRCluster post secret creation goes to exponential backoff and takes too long to get validated amagrawa:~$ drpolicy apiVersion: v1 items: - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy metadata: creationTimestamp: "2023-11-07T19:31:28Z" finalizers: - drpolicies.ramendr.openshift.io/ramen generation: 1 labels: cluster.open-cluster-management.io/backup: resource velero.io/backup-name: acm-resources-generic-schedule-20231107190047 velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20231107190047 name: my-drpolicy-10 resourceVersion: "420339" uid: 11516fd4-d3c1-4b01-b460-76d13be479a3 spec: drClusters: - amagrawa-m1-7nov - amagrawa-m2-7nov replicationClassSelector: {} schedulingInterval: 10m volumeSnapshotClassSelector: {} status: conditions: - lastTransitionTime: "2023-11-07T19:31:28Z" message: none of the DRClusters are validated ([amagrawa-m1-7nov amagrawa-m2-7nov]) observedGeneration: 1 reason: DRClustersUnavailable status: "False" type: Validated - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy metadata: creationTimestamp: "2023-11-07T19:31:28Z" finalizers: - drpolicies.ramendr.openshift.io/ramen generation: 1 labels: cluster.open-cluster-management.io/backup: resource velero.io/backup-name: acm-resources-generic-schedule-20231107190047 velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20231107190047 name: my-drpolicy-15 resourceVersion: "420371" uid: d6eb4cea-69ba-408e-a1c3-c4a5df83506b spec: drClusters: - amagrawa-m2-7nov - amagrawa-m1-7nov replicationClassSelector: {} schedulingInterval: 15m volumeSnapshotClassSelector: {} status: conditions: - lastTransitionTime: "2023-11-07T19:31:29Z" message: none of the DRClusters are validated ([amagrawa-m2-7nov amagrawa-m1-7nov]) observedGeneration: 1 reason: DRClustersUnavailable status: "False" type: Validated - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy metadata: creationTimestamp: "2023-11-07T19:31:28Z" finalizers: - drpolicies.ramendr.openshift.io/ramen generation: 1 labels: cluster.open-cluster-management.io/backup: resource velero.io/backup-name: acm-resources-generic-schedule-20231107190047 velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20231107190047 name: my-drpolicy-5 resourceVersion: "420387" uid: 583e9ae0-7b30-4fa9-a732-e75f18d30df8 spec: drClusters: - amagrawa-m1-7nov - amagrawa-m2-7nov replicationClassSelector: {} schedulingInterval: 5m volumeSnapshotClassSelector: {} status: conditions: - lastTransitionTime: "2023-11-07T19:31:29Z" message: none of the DRClusters are validated ([amagrawa-m1-7nov amagrawa-m2-7nov]) observedGeneration: 1 reason: DRClustersUnavailable status: "False" type: Validated kind: List metadata: resourceVersion: "" amagrawa:~$ oc get managedcluster -o wide -A NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE amagrawa-m1-7nov true https://api.amagrawa-m1-7nov.qe.rh-ocs.com:6443 True True 15m amagrawa-m2-7nov true https://api.amagrawa-m2-7nov.qe.rh-ocs.com:6443 True True 15m local-cluster true https://api.amagrawa-hub2-7no.qe.rh-ocs.com:6443 True True 3h56m amagrawa:~$ drpolicy apiVersion: v1 items: - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy metadata: creationTimestamp: "2023-11-07T19:31:28Z" finalizers: - drpolicies.ramendr.openshift.io/ramen generation: 1 labels: cluster.open-cluster-management.io/backup: resource velero.io/backup-name: acm-resources-generic-schedule-20231107190047 velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20231107190047 name: my-drpolicy-10 resourceVersion: "420339" uid: 11516fd4-d3c1-4b01-b460-76d13be479a3 spec: drClusters: - amagrawa-m1-7nov - amagrawa-m2-7nov replicationClassSelector: {} schedulingInterval: 10m volumeSnapshotClassSelector: {} status: conditions: - lastTransitionTime: "2023-11-07T19:31:28Z" message: none of the DRClusters are validated ([amagrawa-m1-7nov amagrawa-m2-7nov]) observedGeneration: 1 reason: DRClustersUnavailable status: "False" type: Validated - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy metadata: creationTimestamp: "2023-11-07T19:31:28Z" finalizers: - drpolicies.ramendr.openshift.io/ramen generation: 1 labels: cluster.open-cluster-management.io/backup: resource velero.io/backup-name: acm-resources-generic-schedule-20231107190047 velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20231107190047 name: my-drpolicy-15 resourceVersion: "420371" uid: d6eb4cea-69ba-408e-a1c3-c4a5df83506b spec: drClusters: - amagrawa-m2-7nov - amagrawa-m1-7nov replicationClassSelector: {} schedulingInterval: 15m volumeSnapshotClassSelector: {} status: conditions: - lastTransitionTime: "2023-11-07T19:31:29Z" message: none of the DRClusters are validated ([amagrawa-m2-7nov amagrawa-m1-7nov]) observedGeneration: 1 reason: DRClustersUnavailable status: "False" type: Validated - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy metadata: creationTimestamp: "2023-11-07T19:31:28Z" finalizers: - drpolicies.ramendr.openshift.io/ramen generation: 1 labels: cluster.open-cluster-management.io/backup: resource velero.io/backup-name: acm-resources-generic-schedule-20231107190047 velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20231107190047 name: my-drpolicy-5 resourceVersion: "420387" uid: 583e9ae0-7b30-4fa9-a732-e75f18d30df8 spec: drClusters: - amagrawa-m1-7nov - amagrawa-m2-7nov replicationClassSelector: {} schedulingInterval: 5m volumeSnapshotClassSelector: {} status: conditions: - lastTransitionTime: "2023-11-07T19:31:29Z" message: none of the DRClusters are validated ([amagrawa-m1-7nov amagrawa-m2-7nov]) observedGeneration: 1 reason: DRClustersUnavailable status: "False" type: Validated kind: List metadata: resourceVersion: "" amagrawa:~$ oc get secrets -n openshift-operators NAME TYPE DATA AGE 7202c2e08afe31dc279a9730d94413a6a112650 Opaque 2 5m22s 79f592bb6c31b5aced91d62883e7a80b6f3661f Opaque 2 5m23s builder-dockercfg-6xz9h kubernetes.io/dockercfg 1 14h builder-token-tvvhz kubernetes.io/service-account-token 4 14h default-dockercfg-grg24 kubernetes.io/dockercfg 1 14h default-token-bgnr5 kubernetes.io/service-account-token 4 14h deployer-dockercfg-zzmz4 kubernetes.io/dockercfg 1 14h deployer-token-qldjm kubernetes.io/service-account-token 4 14h odf-multicluster-console-serving-cert kubernetes.io/tls 2 99m odfmo-controller-manager-dockercfg-qmscg kubernetes.io/dockercfg 1 99m odfmo-controller-manager-service-cert kubernetes.io/tls 3 99m odfmo-controller-manager-token-d5xtw kubernetes.io/service-account-token 4 99m ramen-hub-operator-dockercfg-995qn kubernetes.io/dockercfg 1 99m ramen-hub-operator-service-cert kubernetes.io/tls 3 99m ramen-hub-operator-token-4pm8d kubernetes.io/service-account-token 4 99m amagrawa:~$ oc get drcluster -o yaml apiVersion: v1 items: - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRCluster metadata: creationTimestamp: "2023-11-07T19:31:27Z" finalizers: - drclusters.ramendr.openshift.io/ramen generation: 1 labels: cluster.open-cluster-management.io/backup: resource velero.io/backup-name: acm-resources-generic-schedule-20231107190047 velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20231107190047 name: amagrawa-m1-7nov resourceVersion: "420218" uid: ea93e3db-febb-4ecf-a2ad-ce16bdd4259b spec: region: 9d3d039e-0ea7-4a59-a90a-b2b0ca7a4ac1 s3ProfileName: s3profile-amagrawa-m1-7nov-ocs-storagecluster status: conditions: - lastTransitionTime: "2023-11-07T19:31:28Z" message: 's3profile-amagrawa-m1-7nov-ocs-storagecluster: failed to get secret {79f592bb6c31b5aced91d62883e7a80b6f3661f } for caller drpolicy validation, failed to get secret {79f592bb6c31b5aced91d62883e7a80b6f3661f }, secrets "79f592bb6c31b5aced91d62883e7a80b6f3661f" not found' observedGeneration: 1 reason: s3ConnectionFailed status: "False" type: Validated - lastTransitionTime: "2023-11-07T19:31:28Z" message: Cluster Clean observedGeneration: 1 reason: Clean status: "False" type: Fenced - lastTransitionTime: "2023-11-07T19:31:28Z" message: Cluster Clean observedGeneration: 1 reason: Clean status: "True" type: Clean phase: Available - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRCluster metadata: creationTimestamp: "2023-11-07T19:31:27Z" finalizers: - drclusters.ramendr.openshift.io/ramen generation: 1 labels: cluster.open-cluster-management.io/backup: resource velero.io/backup-name: acm-resources-generic-schedule-20231107190047 velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20231107190047 name: amagrawa-m2-7nov resourceVersion: "420116" uid: 11aecd34-79d5-4ed1-8fe9-52564fa1623a spec: region: 9be5464c-cbbd-4dce-b3e6-ab4780d9aa0b s3ProfileName: s3profile-amagrawa-m2-7nov-ocs-storagecluster status: conditions: - lastTransitionTime: "2023-11-07T19:31:28Z" message: 's3profile-amagrawa-m2-7nov-ocs-storagecluster: failed to get secret {7202c2e08afe31dc279a9730d94413a6a112650 } for caller drpolicy validation, failed to get secret {7202c2e08afe31dc279a9730d94413a6a112650 }, secrets "7202c2e08afe31dc279a9730d94413a6a112650" not found' observedGeneration: 1 reason: s3ConnectionFailed status: "False" type: Validated - lastTransitionTime: "2023-11-07T19:31:27Z" message: Cluster Clean observedGeneration: 1 reason: Clean status: "False" type: Fenced - lastTransitionTime: "2023-11-07T19:31:27Z" message: Cluster Clean observedGeneration: 1 reason: Clean status: "True" type: Clean phase: Available kind: List metadata: resourceVersion: "" Expected results: DRCluster validation shouldn't take more than just a few minutes to get validated eventually validating DRPolicy on passive hub Additional info:
Aman, can you share the ramen configmap from the managed clusters and hub?
Hi Nir, Must gather logs could be found here- http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/08nov23/ I don't have the setup to collect specifics. Hope this helps.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383