Bug 1996184

Summary: After shutting down and starting a cluster again, storage operator (and machine-config) is not available
Product: OpenShift Container Platform Reporter: To Hung Sze <tsze>
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Storage sub component: Storage QA Contact: Wei Duan <wduan>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, fbertina
Version: 4.9   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-25 14:45:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description To Hung Sze 2021-08-20 18:37:13 UTC
Description of problem:
After shutting down cluster and restarting it, cluster operator (and machine-config) is not available.

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-20-074005

How reproducible:
Always

Steps to Reproduce:
1. Create an IPI
2. After cluster finished installation, confirm all co are working.
3. Stop the cluster (workers first, then masters)
4. Wait 15 min. Start the cluster (master first, then workers)
5. Wait a while (15min to couple of hours) Check and approve any csr. Confirm all nodes are in Ready state.

Actual results:
storage / machine-config and all other operators are ready.

$ ./oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.9.0-0.nightly-2021-08-20-074005   True        False         False      11m     
baremetal                                  4.9.0-0.nightly-2021-08-20-074005   True        False         False      139m    
cloud-controller-manager                   4.9.0-0.nightly-2021-08-20-074005   True        False         False      143m    
...
machine-config                             4.9.0-0.nightly-2021-08-20-074005   False       False         True       75m     Cluster not available for 4.9.0-0.nightly-2021-08-20-074005
marketplace                                4.9.0-0.nightly-2021-08-20-074005   True        False         False      140m    
...
storage                                    4.9.0-0.nightly-2021-08-20-074005   False       True          False      47m     GCPPDCSIDriverOperatorCRAvailable: GCPPDDriverControllerServiceControllerAvailable: Waiting for Deployment


Expected results:
All co are ready.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:
Must-gather outputs:
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: c173a13e-0c22-420f-909d-92637511d30d
ClusterVersion: Stable at "4.9.0-0.nightly-2021-08-20-074005"
ClusterOperators:
	clusteroperator/machine-config is not available (Cluster not available for 4.9.0-0.nightly-2021-08-20-074005) because Failed to resync 4.9.0-0.nightly-2021-08-20-074005 because: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-server is not ready. status: (desired: 3, updated: 3, ready: 1, unavailable: 2)
	clusteroperator/storage is not available (GCPPDCSIDriverOperatorCRAvailable: GCPPDDriverControllerServiceControllerAvailable: Waiting for Deployment) because GCPPDCSIDriverOperatorCRDegraded: All is well

Comment 2 Fabio Bertinatto 2021-08-25 14:45:34 UTC

*** This bug has been marked as a duplicate of bug 1997478 ***