Bug 1996184 - After shutting down and starting a cluster again, storage operator (and machine-config) is not available
Summary: After shutting down and starting a cluster again, storage operator (and machi...
Keywords:
Status: CLOSED DUPLICATE of bug 1997478
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Fabio Bertinatto
QA Contact: Wei Duan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-20 18:37 UTC by To Hung Sze
Modified: 2021-08-25 14:45 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-25 14:45:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description To Hung Sze 2021-08-20 18:37:13 UTC
Description of problem:
After shutting down cluster and restarting it, cluster operator (and machine-config) is not available.

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-20-074005

How reproducible:
Always

Steps to Reproduce:
1. Create an IPI
2. After cluster finished installation, confirm all co are working.
3. Stop the cluster (workers first, then masters)
4. Wait 15 min. Start the cluster (master first, then workers)
5. Wait a while (15min to couple of hours) Check and approve any csr. Confirm all nodes are in Ready state.

Actual results:
storage / machine-config and all other operators are ready.

$ ./oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.9.0-0.nightly-2021-08-20-074005   True        False         False      11m     
baremetal                                  4.9.0-0.nightly-2021-08-20-074005   True        False         False      139m    
cloud-controller-manager                   4.9.0-0.nightly-2021-08-20-074005   True        False         False      143m    
...
machine-config                             4.9.0-0.nightly-2021-08-20-074005   False       False         True       75m     Cluster not available for 4.9.0-0.nightly-2021-08-20-074005
marketplace                                4.9.0-0.nightly-2021-08-20-074005   True        False         False      140m    
...
storage                                    4.9.0-0.nightly-2021-08-20-074005   False       True          False      47m     GCPPDCSIDriverOperatorCRAvailable: GCPPDDriverControllerServiceControllerAvailable: Waiting for Deployment


Expected results:
All co are ready.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:
Must-gather outputs:
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: c173a13e-0c22-420f-909d-92637511d30d
ClusterVersion: Stable at "4.9.0-0.nightly-2021-08-20-074005"
ClusterOperators:
	clusteroperator/machine-config is not available (Cluster not available for 4.9.0-0.nightly-2021-08-20-074005) because Failed to resync 4.9.0-0.nightly-2021-08-20-074005 because: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-server is not ready. status: (desired: 3, updated: 3, ready: 1, unavailable: 2)
	clusteroperator/storage is not available (GCPPDCSIDriverOperatorCRAvailable: GCPPDDriverControllerServiceControllerAvailable: Waiting for Deployment) because GCPPDCSIDriverOperatorCRDegraded: All is well

Comment 2 Fabio Bertinatto 2021-08-25 14:45:34 UTC

*** This bug has been marked as a duplicate of bug 1997478 ***


Note You need to log in before you can comment on or make changes to this bug.