Created attachment 1867510 [details] Cluster pool UI Description of problem: cluster pool credentials are not backed up Version-Release number of selected component (if applicable): ACM 2.5.0-DOWNSTREAM-2022-03-17-03-36-41 (Final S4) How reproducible: Steps to Reproduce: 1. Primary hub has a cluster pool having running clusters 2. Create backup for the primary hub 3. Shut down primary hub, create restore activate on the secondary hub Actual results: Running cluster(s) in cluster pool resuming hang due to no credential found Expected results: Additional info: oc get clusterpool -n default NAME SIZE STANDBY READY BASEDOMAIN IMAGESET az-pool-tn 2 2 0 az.dev06.red-chesterfield.com img4.10.4-x86-64-appsub oc get cd --all-namespaces NAMESPACE NAME INFRAID PLATFORM REGION VERSION CLUSTERTYPE PROVISIONSTATUS POWERSTATE AGE az-pool-tn-7jrlm az-pool-tn-7jrlm az-pool-tn-7jrlm-97wmb azure centralus 4.10.4 Provisioned FailedToStartMachines 86m az-pool-tn-nrj62 az-pool-tn-nrj62 az-pool-tn-nrj62-k8q4x azure centralus 4.10.4 Provisioned 86m clc-test-aws-auto-sno-1647575763754 clc-test-aws-auto-sno-1647575763754 clc-test-aws-auto-sno-rxf2t aws us-east-2 4.9.9 Provisioned Running 86m oc get cd -n az-pool-tn-7jrlm az-pool-tn-7jrlm -oyaml apiVersion: hive.openshift.io/v1 kind: ClusterDeployment metadata: annotations: hive.openshift.io/cluster-pool-spec-hash: 23ab68cbc48e4895 open-cluster-management.io/user-group: c3lzdGVtOnNlcnZpY2VhY2NvdW50cyxzeXN0ZW06c2VydmljZWFjY291bnRzOm9wZW4tY2x1c3Rlci1tYW5hZ2VtZW50LWJhY2t1cCxzeXN0ZW06YXV0aGVudGljYXRlZA== open-cluster-management.io/user-identity: c3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW4tY2x1c3Rlci1tYW5hZ2VtZW50LWJhY2t1cDp2ZWxlcm8= creationTimestamp: "2022-03-22T13:55:07Z" finalizers: - hive.openshift.io/deprovision generation: 1 labels: hive.openshift.io/cluster-platform: azure hive.openshift.io/cluster-region: centralus hive.openshift.io/version-major: "4" hive.openshift.io/version-major-minor: "4.10" hive.openshift.io/version-major-minor-patch: 4.10.4 velero.io/backup-name: acm-resources-schedule-20220322134022 velero.io/restore-name: restore-acm-passive-sync-acm-resources-schedule-20220322134022 name: az-pool-tn-7jrlm namespace: az-pool-tn-7jrlm resourceVersion: "23184933" uid: dffb7895-10bc-4fe8-b0d9-426ee364ef8a spec: baseDomain: az.dev06.red-chesterfield.com clusterMetadata: adminKubeconfigSecretRef: name: az-pool-tn-7jrlm-0-47htn-admin-kubeconfig adminPasswordSecretRef: name: az-pool-tn-7jrlm-0-47htn-admin-password clusterID: 472dab2d-53bd-4106-84f8-1c3cc242dfc8 infraID: az-pool-tn-7jrlm-97wmb clusterName: az-pool-tn-7jrlm clusterPoolRef: namespace: default poolName: az-pool-tn controlPlaneConfig: servingCertificates: {} installed: true platform: azure: baseDomainResourceGroupName: domain credentialsSecretRef: name: az-pool-tn-7jrlm-azure-creds region: centralus powerState: Running provisioning: imageSetRef: name: img4.10.4-x86-64-appsub installConfigSecretRef: name: az-pool-tn-7jrlm-install-config pullSecretRef: name: az-pool-tn-7jrlm-pull-secret status: apiURL: https://api.az-pool-tn-7jrlm.az.dev06.red-chesterfield.com:6443 conditions: - lastProbeTime: "2022-03-22T13:55:11Z" lastTransitionTime: "2022-03-22T13:55:11Z" message: Cluster is resuming or running, see Ready condition for details reason: ResumingOrRunning status: "False" type: Hibernating - lastProbeTime: "2022-03-22T13:55:11Z" lastTransitionTime: "2022-03-22T13:55:11Z" message: 'Failed to start machines: failed to fetch Azure credentials secret: Secret "az-pool-tn-7jrlm-azure-creds" not found' reason: FailedToStartMachines status: "False" type: Ready - lastProbeTime: "2022-03-22T13:55:10Z" lastTransitionTime: "2022-03-22T13:55:10Z" message: Control plane certificates are present reason: ControlPlaneCertificatesFound status: "False" type: ControlPlaneCertificateNotFound - lastProbeTime: "2022-03-22T13:55:14Z" lastTransitionTime: "2022-03-22T13:55:14Z" message: Cluster is provisioned reason: Provisioned status: "True" type: Provisioned - lastProbeTime: "2022-03-22T13:55:11Z" lastTransitionTime: "2022-03-22T13:55:11Z" message: no ClusterRelocates match reason: NoMatchingRelocates status: "False" type: RelocationFailed - lastProbeTime: "2022-03-22T13:55:14Z" lastTransitionTime: "2022-03-22T13:55:14Z" message: SyncSet apply is successful reason: SyncSetApplySuccess status: "False" type: SyncSetFailed - lastProbeTime: "2022-03-22T13:55:10Z" lastTransitionTime: "2022-03-22T13:55:10Z" message: cluster is reachable reason: ClusterReachable status: "False" type: Unreachable - lastProbeTime: "2022-03-22T13:55:08Z" lastTransitionTime: "2022-03-22T13:55:08Z" message: Condition Initialized reason: Initialized status: Unknown type: AWSPrivateLinkFailed - lastProbeTime: "2022-03-22T13:55:08Z" lastTransitionTime: "2022-03-22T13:55:08Z" message: Condition Initialized reason: Initialized status: Unknown type: AWSPrivateLinkReady - lastProbeTime: "2022-03-22T13:55:08Z" lastTransitionTime: "2022-03-22T13:55:08Z" message: Condition Initialized reason: Initialized status: Unknown type: ActiveAPIURLOverride - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: AuthenticationFailure - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: ClusterInstallCompleted - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: ClusterInstallFailed - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: ClusterInstallRequirementsMet - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: ClusterInstallStopped - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: DNSNotReady - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: DeprovisionLaunchError - lastProbeTime: "2022-03-22T13:55:07Z" lastTransitionTime: "2022-03-22T13:55:07Z" message: Condition Initialized reason: Initialized status: Unknown type: IngressCertificateNotFound - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: InstallImagesNotResolved - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: InstallLaunchError - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: InstallerImageResolutionFailed - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: ProvisionFailed - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: ProvisionStopped - lastProbeTime: "2022-03-22T13:55:12Z" lastTransitionTime: "2022-03-22T13:55:12Z" message: Condition Initialized reason: Initialized status: Unknown type: RequirementsMet installedTimestamp: "2022-03-22T13:55:07Z" powerState: FailedToStartMachines webConsoleURL: https://console-openshift-console.apps.az-pool-tn-7jrlm.az.dev06.red-chesterfield.com
that's because the clusterpool secrets have no hive label annotation .. I expected to see a label annotation `hive.openshift.io/secret-type` as with the cluster deployment secrets oc get secrets -n default az-pool-tn-azure-creds -o yaml kind: Secret metadata: creationTimestamp: "2022-03-18T03:50:39Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: .: {} f:osServicePrincipal.json: {} f:type: {} manager: unknown operation: Update time: "2022-03-18T03:50:39Z" name: az-pool-tn-azure-creds namespace: default resourceVersion: "135499276" uid: 50646ad1-7e30-46fb-8a6a-0910dba2997a type: Opaque The proper fix here is for the team creating the cluster pools to annotate the secrets required to be backed up and used by the cluster pool Can we assign this defect to that team ? I am not sure who that is - apiVersion: hive.openshift.io/v1 kind: ClusterPool The hard way to do this is for the backup component to add those annotations before a backup is executed but I strongly prefer to fix this on the actual resource then me trying to patch
G2Bsync 1087625237 comment thuyn-581 Mon, 04 Apr 2022 14:24:11 UTC G2BSync - Validated on ACM 2.5.0-DOWNSTREAM-2022-03-29-05-04-50.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4956