Bug 2066834

Summary: Hibernating cluster(s) in cluster pool stuck in 'Stopping' status after restore activation
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Thuy Nguyen <thnguyen>
Component: DR4HubAssignee: vbirsan
Status: CLOSED ERRATA QA Contact: Thuy Nguyen <thnguyen>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhacm-2.5CC: robertsonldspj11
Target Milestone: ---Flags: bot-tracker-sync: rhacm-2.5+
Target Release: rhacm-2.5   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-09 02:10:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Cluster pool UI none

Description Thuy Nguyen 2022-03-22 15:11:29 UTC
Created attachment 1867509 [details]
Cluster pool UI

Description of problem: Hibernating cluster(s) in cluster pool stuck in 'Stopping' status after restore activation


Version-Release number of selected component (if applicable):
ACM 2.5.0-DOWNSTREAM-2022-03-17-03-36-41 (Final S4)

How reproducible:


Steps to Reproduce:
1. Primary hub has a cluster pool including 1 running cluster and 1 hibernating cluster
2. Create backup for the primary hub
3. Shut down primary hub, restore activate on secondary hub

Actual results:
Hibernating cluster status shows 'Stopping' after restore completed on sec hub

Expected results:


Additional info:

oc get clusterpool -n default
NAME         SIZE   STANDBY   READY   BASEDOMAIN                      IMAGESET
az-pool-tn   2      2         0       az.dev06.red-chesterfield.com   img4.10.4-x86-64-appsub


oc get cd --all-namespaces
NAMESPACE                             NAME                                  INFRAID                       PLATFORM   REGION      VERSION   CLUSTERTYPE   PROVISIONSTATUS   POWERSTATE              AGE
az-pool-tn-7jrlm                      az-pool-tn-7jrlm                      az-pool-tn-7jrlm-97wmb        azure      centralus   4.10.4                  Provisioned       FailedToStartMachines   69m
az-pool-tn-nrj62                      az-pool-tn-nrj62                      az-pool-tn-nrj62-k8q4x        azure      centralus   4.10.4                  Provisioned                               69m
clc-test-aws-auto-sno-1647575763754   clc-test-aws-auto-sno-1647575763754   clc-test-aws-auto-sno-rxf2t   aws        us-east-2   4.9.9                   Provisioned       Running                 69m


oc get cd -n az-pool-tn-nrj62 az-pool-tn-nrj62 -oyaml
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  annotations:
    hive.openshift.io/cluster-pool-spec-hash: 23ab68cbc48e4895
    open-cluster-management.io/user-group: c3lzdGVtOnNlcnZpY2VhY2NvdW50cyxzeXN0ZW06c2VydmljZWFjY291bnRzOm9wZW4tY2x1c3Rlci1tYW5hZ2VtZW50LWJhY2t1cCxzeXN0ZW06YXV0aGVudGljYXRlZA==
    open-cluster-management.io/user-identity: c3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW4tY2x1c3Rlci1tYW5hZ2VtZW50LWJhY2t1cDp2ZWxlcm8=
  creationTimestamp: "2022-03-22T13:55:07Z"
  finalizers:
  - hive.openshift.io/deprovision
  generation: 1
  labels:
    hive.openshift.io/cluster-platform: azure
    hive.openshift.io/cluster-region: centralus
    hive.openshift.io/version-major: "4"
    hive.openshift.io/version-major-minor: "4.10"
    hive.openshift.io/version-major-minor-patch: 4.10.4
    velero.io/backup-name: acm-resources-schedule-20220322134022
    velero.io/restore-name: restore-acm-passive-sync-acm-resources-schedule-20220322134022
  name: az-pool-tn-nrj62
  namespace: az-pool-tn-nrj62
  resourceVersion: "23187141"
  uid: fcae07ff-bb60-495d-a07c-768637ab5b06
spec:
  baseDomain: az.dev06.red-chesterfield.com
  clusterMetadata:
    adminKubeconfigSecretRef:
      name: az-pool-tn-nrj62-0-2lkrg-admin-kubeconfig
    adminPasswordSecretRef:
      name: az-pool-tn-nrj62-0-2lkrg-admin-password
    clusterID: d610fc8e-17f9-4fb5-bbb3-e0d900ffaf71
    infraID: az-pool-tn-nrj62-k8q4x
  clusterName: az-pool-tn-nrj62
  clusterPoolRef:
    namespace: default
    poolName: az-pool-tn
  controlPlaneConfig:
    servingCertificates: {}
  installed: true
  platform:
    azure:
      baseDomainResourceGroupName: domain
      credentialsSecretRef:
        name: az-pool-tn-nrj62-azure-creds
      region: centralus
  powerState: Hibernating
  provisioning:
    imageSetRef:
      name: img4.10.4-x86-64-appsub
    installConfigSecretRef:
      name: az-pool-tn-nrj62-install-config
  pullSecretRef:
    name: az-pool-tn-nrj62-pull-secret
status:
  conditions:
  - lastProbeTime: "2022-03-22T13:55:13Z"
    lastTransitionTime: "2022-03-22T13:55:13Z"
    message: ClusterSync has not yet been created
    reason: MissingClusterSync
    status: "True"
    type: SyncSetFailed
  - lastProbeTime: "2022-03-22T13:56:10Z"
    lastTransitionTime: "2022-03-22T13:55:08Z"
    message: 'Get "https://api.az-pool-tn-nrj62.az.dev06.red-chesterfield.com:6443/api?timeout=32s":
      dial tcp 20.221.106.172:6443: i/o timeout'
    reason: ErrorConnectingToCluster
    status: "True"
    type: Unreachable
  - lastProbeTime: "2022-03-22T13:55:08Z"
    lastTransitionTime: "2022-03-22T13:55:08Z"
    message: Control plane certificates are present
    reason: ControlPlaneCertificatesFound
    status: "False"
    type: ControlPlaneCertificateNotFound
  - lastProbeTime: "2022-03-22T13:55:13Z"
    lastTransitionTime: "2022-03-22T13:55:13Z"
    message: Cluster is provisioned
    reason: Provisioned
    status: "True"
    type: Provisioned
  - lastProbeTime: "2022-03-22T13:55:10Z"
    lastTransitionTime: "2022-03-22T13:55:10Z"
    message: no ClusterRelocates match
    reason: NoMatchingRelocates
    status: "False"
    type: RelocationFailed
  - lastProbeTime: "2022-03-22T13:55:09Z"
    lastTransitionTime: "2022-03-22T13:55:09Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: AWSPrivateLinkFailed
  - lastProbeTime: "2022-03-22T13:55:09Z"
    lastTransitionTime: "2022-03-22T13:55:09Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: AWSPrivateLinkReady
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ActiveAPIURLOverride
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: AuthenticationFailure
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ClusterInstallCompleted
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ClusterInstallFailed
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ClusterInstallRequirementsMet
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ClusterInstallStopped
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: DNSNotReady
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: DeprovisionLaunchError
  - lastProbeTime: "2022-03-22T13:55:08Z"
    lastTransitionTime: "2022-03-22T13:55:08Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: Hibernating
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: IngressCertificateNotFound
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: InstallImagesNotResolved
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: InstallLaunchError
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: InstallerImageResolutionFailed
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ProvisionFailed
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: ProvisionStopped
  - lastProbeTime: "2022-03-22T13:55:08Z"
    lastTransitionTime: "2022-03-22T13:55:08Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: Ready
  - lastProbeTime: "2022-03-22T13:55:07Z"
    lastTransitionTime: "2022-03-22T13:55:07Z"
    message: Condition Initialized
    reason: Initialized
    status: Unknown
    type: RequirementsMet
  installedTimestamp: "2022-03-22T13:55:07Z"

Comment 1 bot-tracker-sync 2022-04-21 19:34:15 UTC
G2Bsync 1105442425 comment 
 thuyn-581 Thu, 21 Apr 2022 16:26:36 UTC 
 G2BSync -
Validated on ACM 2.5.0-DOWNSTREAM-2022-04-21-01-26-54.

Comment 4 errata-xmlrpc 2022-06-09 02:10:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4956

Comment 5 Ethel M. Wells 2023-01-09 08:20:15 UTC
A commitment of appreciation is all together for sharing, I found a tremendous store of stimulating information here. A striking post, incredibly grateful and obliging that you will make on a very major level more posts like this one. https://www.paymydoctor.ltd/