Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2010502

Summary:

OCP 4.8.X Clusters Fail to Resume from Hibernation/Restart Gracefully

Product:

OpenShift Container Platform

Reporter:

Gurney Buchanan <gbuchana>

Component:

apiserver-auth

Assignee:

Standa Laznicka <slaznick>

Status:

CLOSED DUPLICATE

QA Contact:

liyao

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.8

CC:

aos-bugs, gshereme, mfojtik, nmukherj, surbania

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Other

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-10-05 07:36:07 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
must-gather from a cluster that failed to resume	none

Description Gurney Buchanan 2021-10-04 20:08:10 UTC

Created attachment 1829193 [details]
must-gather from a cluster that failed to resume

Description of problem:

We utilize Hive clusterpools to drive our CI/CD process - similarly to their use in OSCI.  While clusters are in a pool - their nodes are shut down and the cluster is "hibernated" to save cost.  When the cluster is needed - it is brought back online by hive following the documented process - https://docs.openshift.com/container-platform/4.8/backup_and_restore/graceful-cluster-restart.html.  We've observed greatly increased resume failures on OCP 4.8 when the cluster has been offline for more than 24 hours.  The Console and Auth fail to come back online nearly 100% of the time.  


Version-Release number of selected component (if applicable): 4.8.Z


How reproducible: 80-100% occurance


Steps to Reproduce:
1.Hibernate/Power off a clusters nodes
2.Leave for 24-48 hours
3.Follow the procedure to gracefully restart a cluster

Actual results:
Cluster Console and Auth are not reachable.  Cluster is reachable via CLI with Kubeconfig only.

Expected results:
Cluster resumes successfully.

Additional info:
Must-gather attached.

Comment 1 Standa Laznicka 2021-10-05 07:36:07 UTC


*** This bug has been marked as a duplicate of bug 1997906 ***