Bug 2066842 - cluster pool credentials are not backed up
Summary: cluster pool credentials are not backed up
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: DR4Hub
Version: rhacm-2.5
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: rhacm-2.5
Assignee: vbirsan
QA Contact: Thuy Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-22 15:24 UTC by Thuy Nguyen
Modified: 2022-06-09 02:10 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-09 02:10:05 UTC
Target Upstream Version:
Embargoed:
bot-tracker-sync: rhacm-2.5+


Attachments (Terms of Use)
Cluster pool UI (321.71 KB, image/png)
2022-03-22 15:24 UTC, Thuy Nguyen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 21017 0 None None None 2022-03-22 16:38:29 UTC
Red Hat Product Errata RHSA-2022:4956 0 None None None 2022-06-09 02:10:14 UTC

Description Thuy Nguyen 2022-03-22 15:24:57 UTC
Created attachment 1867510 [details]
Cluster pool UI

Description of problem: cluster pool credentials are not backed up


Version-Release number of selected component (if applicable):
ACM 2.5.0-DOWNSTREAM-2022-03-17-03-36-41 (Final S4)

How reproducible:


Steps to Reproduce:
1. Primary hub has a cluster pool having running clusters
2. Create backup for the primary hub
3. Shut down primary hub, create restore activate on the secondary hub

Actual results:
Running cluster(s) in cluster pool resuming hang due to no credential found

Expected results:


Additional info:

oc get clusterpool -n default
NAME         SIZE   STANDBY   READY   BASEDOMAIN                      IMAGESET
az-pool-tn   2      2         0       az.dev06.red-chesterfield.com   img4.10.4-x86-64-appsub


oc get cd --all-namespaces
NAMESPACE                             NAME                                  INFRAID                       PLATFORM   REGION      VERSION   CLUSTERTYPE   PROVISIONSTATUS   POWERSTATE              AGE
az-pool-tn-7jrlm                      az-pool-tn-7jrlm                      az-pool-tn-7jrlm-97wmb        azure      centralus   4.10.4                  Provisioned       FailedToStartMachines   86m
az-pool-tn-nrj62                      az-pool-tn-nrj62                      az-pool-tn-nrj62-k8q4x        azure      centralus   4.10.4                  Provisioned                               86m
clc-test-aws-auto-sno-1647575763754   clc-test-aws-auto-sno-1647575763754   clc-test-aws-auto-sno-rxf2t   aws        us-east-2   4.9.9                   Provisioned       Running                 86m


oc get cd -n az-pool-tn-7jrlm az-pool-tn-7jrlm -oyaml
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  annotations:
    hive.openshift.io/cluster-pool-spec-hash: 23ab68cbc48e4895
    open-cluster-management.io/user-group: c3lzdGVtOnNlcnZpY2VhY2NvdW50cyxzeXN0ZW06c2VydmljZWFjY291bnRzOm9wZW4tY2x1c3Rlci1tYW5hZ2VtZW50LWJhY2t1cCxzeXN0ZW06YXV0aGVudGljYXRlZA==
    open-cluster-management.io/user-identity: c3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW4tY2x1c3Rlci1tYW5hZ2VtZW50LWJhY2t1cDp2ZWxlcm8=
  creationTimestamp: "2022-03-22T13:55:07Z"
  finalizers:
  - hive.openshift.io/deprovision
  generation: 1
  labels:
    hive.openshift.io/cluster-platform: azure
    hive.openshift.io/cluster-region: centralus
    hive.openshift.io/version-major: "4"
    hive.openshift.io/version-major-minor: "4.10"
    hive.openshift.io/version-major-minor-patch: 4.10.4
    velero.io/backup-name: acm-resources-schedule-20220322134022
    velero.io/restore-name: restore-acm-passive-sync-acm-resources-schedule-20220322134022
  name: az-pool-tn-7jrlm
  namespace: az-pool-tn-7jrlm
  resourceVersion: "23184933"
  uid: dffb7895-10bc-4fe8-b0d9-426ee364ef8a
spec:
  baseDomain: az.dev06.red-chesterfield.com
  clusterMetadata:
    adminKubeconfigSecretRef:
      name: az-pool-tn-7jrlm-0-47htn-admin-kubeconfig
    adminPasswordSecretRef:
      name: az-pool-tn-7jrlm-0-47htn-admin-password
    clusterID: 472dab2d-53bd-4106-84f8-1c3cc242dfc8
    infraID: az-pool-tn-7jrlm-97wmb
  clusterName: az-pool-tn-7jrlm
  clusterPoolRef:
    namespace: default
    poolName: az-pool-tn
  controlPlaneConfig:
    servingCertificates: {}
  installed: true
  platform:
    azure:
      baseDomainResourceGroupName: domain
      credentialsSecretRef:
        name: az-pool-tn-7jrlm-azure-creds
      region: centralus
  powerState: Running
  provisioning:
    imageSetRef:
      name: img4.10.4-x86-64-appsub
    installConfigSecretRef:
      name: az-pool-tn-7jrlm-install-config
  pullSecretRef:
    name: az-pool-tn-7jrlm-pull-secret
status:
  apiURL: https://api.az-pool-tn-7jrlm.az.dev06.red-chesterfield.com:6443
  conditions:
  - lastProbeTime: "2022-03-22T13:55:11Z"
    lastTransitionTime: "2022-03-22T13:55:11Z"
    message: Cluster is resuming or running, see Ready condition for details
    reason: ResumingOrRunning
    status: "False"
    type: Hibernating
  - lastProbeTime: "2022-03-22T13:55:11Z"
    lastTransitionTime: "2022-03-22T13:55:11Z"
    message: 'Failed to start machines: failed to fetch Azure credentials secret:
      Secret "az-pool-tn-7jrlm-azure-creds" not found'
    reason: FailedToStartMachines
    status: "False"
    type: Ready
  - lastProbeTime: "2022-03-22T13:55:10Z"
    lastTransitionTime: "2022-03-22T13:55:10Z"
    message: Control plane certificates are present
    reason: ControlPlaneCertificatesFound
    status: "False"
    type: ControlPlaneCertificateNotFound
  - lastProbeTime: "2022-03-22T13:55:14Z"
    lastTransitionTime: "2022-03-22T13:55:14Z"
    message: Cluster is provisioned
    reason: Provisioned
    status: "True"
    type: Provisioned
  - lastProbeTime: "2022-03-22T13:55:11Z"
    lastTransitionTime: "2022-03-22T13:55:11Z"
    message: no ClusterRelocates match
    reason: NoMatchingRelocates
    status: "False"
    type: RelocationFailed
  - lastProbeTime: "2022-03-22T13:55:14Z"
    lastTransitionTime: "2022-03-22T13:55:14Z"
    message: SyncSet apply is successful
    reason: SyncSetApplySuccess
    status: "False"
    type: SyncSetFailed
  - lastProbeTime: "2022-03-22T13:55:10Z"
    lastTransitionTime: "2022-03-22T13:55:10Z"
    message: cluster is reachable
    reason: ClusterReachable
    status: "False"
    type: Unreachable
  - lastProbeTime: "2022-03-22T13:55:08Z"
    lastTransitionTime: "2022-03-22T13:55:08Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: AWSPrivateLinkFailed
  - lastProbeTime: "2022-03-22T13:55:08Z"
    lastTransitionTime: "2022-03-22T13:55:08Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: AWSPrivateLinkReady
  - lastProbeTime: "2022-03-22T13:55:08Z"
    lastTransitionTime: "2022-03-22T13:55:08Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ActiveAPIURLOverride
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: AuthenticationFailure
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ClusterInstallCompleted
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ClusterInstallFailed
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ClusterInstallRequirementsMet
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ClusterInstallStopped
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: DNSNotReady
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: DeprovisionLaunchError
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: IngressCertificateNotFound
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: InstallImagesNotResolved
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: InstallLaunchError
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: InstallerImageResolutionFailed
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ProvisionFailed
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ProvisionStopped
  - lastProbeTime: "2022-03-22T13:55:12Z"
    lastTransitionTime: "2022-03-22T13:55:12Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: RequirementsMet
  installedTimestamp: "2022-03-22T13:55:07Z"
  powerState: FailedToStartMachines
  webConsoleURL: https://console-openshift-console.apps.az-pool-tn-7jrlm.az.dev06.red-chesterfield.com

Comment 1 vbirsan 2022-03-22 16:09:00 UTC
that's because the clusterpool secrets have no hive label annotation .. I expected to see a label annotation `hive.openshift.io/secret-type` as with the cluster deployment secrets

oc get secrets -n default az-pool-tn-azure-creds -o yaml

kind: Secret
metadata:
  creationTimestamp: "2022-03-18T03:50:39Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:osServicePrincipal.json: {}
      f:type: {}
    manager: unknown
    operation: Update
    time: "2022-03-18T03:50:39Z"
  name: az-pool-tn-azure-creds
  namespace: default
  resourceVersion: "135499276"
  uid: 50646ad1-7e30-46fb-8a6a-0910dba2997a
type: Opaque


The proper fix here is for the team creating the cluster pools to annotate the secrets required to be backed up and used by the cluster pool 

Can we assign this defect to that team ? I am not sure who that is

- apiVersion: hive.openshift.io/v1
  kind: ClusterPool

The hard way to do this is for the backup component to add those annotations before a backup is executed but I strongly prefer to fix this on the actual resource then me trying to patch

Comment 2 bot-tracker-sync 2022-04-04 14:46:24 UTC
G2Bsync 1087625237 comment 
 thuyn-581 Mon, 04 Apr 2022 14:24:11 UTC 
 G2BSync - 
Validated on ACM 2.5.0-DOWNSTREAM-2022-03-29-05-04-50.

Comment 5 errata-xmlrpc 2022-06-09 02:10:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4956


Note You need to log in before you can comment on or make changes to this bug.