Bug 1996184

Summary:	After shutting down and starting a cluster again, storage operator (and machine-config) is not available
Product:	OpenShift Container Platform	Reporter:	To Hung Sze <tsze>
Component:	Storage	Assignee:	Fabio Bertinatto <fbertina>
Storage sub component:	Storage	QA Contact:	Wei Duan <wduan>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	unspecified
Priority:	unspecified	CC:	aos-bugs, fbertina
Version:	4.9
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-08-25 14:45:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description To Hung Sze 2021-08-20 18:37:13 UTC

Description of problem:
After shutting down cluster and restarting it, cluster operator (and machine-config) is not available.

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-20-074005

How reproducible:
Always

Steps to Reproduce:
1. Create an IPI
2. After cluster finished installation, confirm all co are working.
3. Stop the cluster (workers first, then masters)
4. Wait 15 min. Start the cluster (master first, then workers)
5. Wait a while (15min to couple of hours) Check and approve any csr. Confirm all nodes are in Ready state.

Actual results:
storage / machine-config and all other operators are ready.

$ ./oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.9.0-0.nightly-2021-08-20-074005 True False False 11m
baremetal 4.9.0-0.nightly-2021-08-20-074005 True False False 139m
cloud-controller-manager 4.9.0-0.nightly-2021-08-20-074005 True False False 143m
...
machine-config 4.9.0-0.nightly-2021-08-20-074005 False False True 75m Cluster not available for 4.9.0-0.nightly-2021-08-20-074005
marketplace 4.9.0-0.nightly-2021-08-20-074005 True False False 140m
...
storage 4.9.0-0.nightly-2021-08-20-074005 False True False 47m GCPPDCSIDriverOperatorCRAvailable: GCPPDDriverControllerServiceControllerAvailable: Waiting for Deployment

Expected results:
All co are ready.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:
Must-gather outputs:
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: c173a13e-0c22-420f-909d-92637511d30d
ClusterVersion: Stable at "4.9.0-0.nightly-2021-08-20-074005"
ClusterOperators:
clusteroperator/machine-config is not available (Cluster not available for 4.9.0-0.nightly-2021-08-20-074005) because Failed to resync 4.9.0-0.nightly-2021-08-20-074005 because: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-server is not ready. status: (desired: 3, updated: 3, ready: 1, unavailable: 2)
clusteroperator/storage is not available (GCPPDCSIDriverOperatorCRAvailable: GCPPDDriverControllerServiceControllerAvailable: Waiting for Deployment) because GCPPDCSIDriverOperatorCRDegraded: All is well

Comment 2 Fabio Bertinatto 2021-08-25 14:45:34 UTC


*** This bug has been marked as a duplicate of bug 1997478 ***