1996184 – After shutting down and starting a cluster again, storage operator (and machine-config) is not available

Bug 1996184 - After shutting down and starting a cluster again, storage operator (and machine-config) is not available

Summary: After shutting down and starting a cluster again, storage operator (and machi...

Keywords:
Status:	CLOSED DUPLICATE of bug 1997478
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Fabio Bertinatto
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-20 18:37 UTC by To Hung Sze
Modified:	2021-08-25 14:45 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-25 14:45:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description To Hung Sze 2021-08-20 18:37:13 UTC

Description of problem:
After shutting down cluster and restarting it, cluster operator (and machine-config) is not available.

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-20-074005

How reproducible:
Always

Steps to Reproduce:
1. Create an IPI
2. After cluster finished installation, confirm all co are working.
3. Stop the cluster (workers first, then masters)
4. Wait 15 min. Start the cluster (master first, then workers)
5. Wait a while (15min to couple of hours) Check and approve any csr. Confirm all nodes are in Ready state.

Actual results:
storage / machine-config and all other operators are ready.

$ ./oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.9.0-0.nightly-2021-08-20-074005 True False False 11m
baremetal 4.9.0-0.nightly-2021-08-20-074005 True False False 139m
cloud-controller-manager 4.9.0-0.nightly-2021-08-20-074005 True False False 143m
...
machine-config 4.9.0-0.nightly-2021-08-20-074005 False False True 75m Cluster not available for 4.9.0-0.nightly-2021-08-20-074005
marketplace 4.9.0-0.nightly-2021-08-20-074005 True False False 140m
...
storage 4.9.0-0.nightly-2021-08-20-074005 False True False 47m GCPPDCSIDriverOperatorCRAvailable: GCPPDDriverControllerServiceControllerAvailable: Waiting for Deployment

Expected results:
All co are ready.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:
Must-gather outputs:
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: c173a13e-0c22-420f-909d-92637511d30d
ClusterVersion: Stable at "4.9.0-0.nightly-2021-08-20-074005"
ClusterOperators:
clusteroperator/machine-config is not available (Cluster not available for 4.9.0-0.nightly-2021-08-20-074005) because Failed to resync 4.9.0-0.nightly-2021-08-20-074005 because: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-server is not ready. status: (desired: 3, updated: 3, ready: 1, unavailable: 2)
clusteroperator/storage is not available (GCPPDCSIDriverOperatorCRAvailable: GCPPDDriverControllerServiceControllerAvailable: Waiting for Deployment) because GCPPDCSIDriverOperatorCRDegraded: All is well

Comment 2 Fabio Bertinatto 2021-08-25 14:45:34 UTC


*** This bug has been marked as a duplicate of bug 1997478 ***

Note You need to log in before you can comment on or make changes to this bug.