Bug 2010502 - OCP 4.8.X Clusters Fail to Resume from Hibernation/Restart Gracefully
Summary: OCP 4.8.X Clusters Fail to Resume from Hibernation/Restart Gracefully
Keywords:
Status: CLOSED DUPLICATE of bug 1997906
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.8
Hardware: x86_64
OS: Other
unspecified
high
Target Milestone: ---
: ---
Assignee: Standa Laznicka
QA Contact: liyao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-04 20:08 UTC by Gurney Buchanan
Modified: 2021-10-05 07:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-05 07:36:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
must-gather from a cluster that failed to resume (9.15 MB, application/gzip)
2021-10-04 20:08 UTC, Gurney Buchanan
no flags Details

Description Gurney Buchanan 2021-10-04 20:08:10 UTC
Created attachment 1829193 [details]
must-gather from a cluster that failed to resume

Description of problem:

We utilize Hive clusterpools to drive our CI/CD process - similarly to their use in OSCI.  While clusters are in a pool - their nodes are shut down and the cluster is "hibernated" to save cost.  When the cluster is needed - it is brought back online by hive following the documented process - https://docs.openshift.com/container-platform/4.8/backup_and_restore/graceful-cluster-restart.html.  We've observed greatly increased resume failures on OCP 4.8 when the cluster has been offline for more than 24 hours.  The Console and Auth fail to come back online nearly 100% of the time.  


Version-Release number of selected component (if applicable): 4.8.Z


How reproducible: 80-100% occurance


Steps to Reproduce:
1.Hibernate/Power off a clusters nodes
2.Leave for 24-48 hours
3.Follow the procedure to gracefully restart a cluster

Actual results:
Cluster Console and Auth are not reachable.  Cluster is reachable via CLI with Kubeconfig only.

Expected results:
Cluster resumes successfully.

Additional info:
Must-gather attached.

Comment 1 Standa Laznicka 2021-10-05 07:36:07 UTC

*** This bug has been marked as a duplicate of bug 1997906 ***


Note You need to log in before you can comment on or make changes to this bug.