Bug 1904231

Summary: [GCP, UPI, Proxy] auth, mco, storage problems at end of installation
Product: OpenShift Container Platform Reporter: To Hung Sze <tsze>
Component: apiserver-authAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED DUPLICATE QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.7CC: aos-bugs, kgarriso, mfojtik
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-04 21:03:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description To Hung Sze 2020-12-03 21:02:22 UTC
Description of problem:
MCO available is false after installation.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-12-03-103850
gcp
cluster is: UPI, behind proxy, OVN and encrypted

How reproducible:
Use our automation to create a cluster with this template:
private-templates/functionality-testing/aos-4_7/upi-on-gcp/versioned-installer-http_proxy-remove_rhcos_worker-ovn-etcd_encryption-ci



Steps to Reproduce:
1.
2.
3.

Actual results:
Error:
+ ./openshift-install wait-for install-complete --dir '/home/jenkins/workspace/Launch Environment Flexy/workdir/install-dir'
level=info msg=Waiting up to 40m0s for the cluster at https://api.tsze-re11312.qe.gcp.devcluster.openshift.com:6443 to initialize...
level=error msg=Cluster operator authentication Degraded is True with ProxyConfigController_SyncError: ProxyConfigControllerDegraded: endpoint("https://oauth-openshift.apps.tsze-re11312.qe.gcp.devcluster.openshift.com/healthz") is unreachable with proxy(Get "https://oauth-openshift.apps.tsze-re11312.qe.gcp.devcluster.openshift.com/healthz": x509: certificate signed by unknown authority) and without proxy(Get "https://oauth-openshift.apps.tsze-re11312.qe.gcp.devcluster.openshift.com/healthz": x509: certificate signed by unknown authority)
level=info msg=Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform
level=info msg=Cluster operator insights Disabled is False with AsExpected: 
level=info msg=Cluster operator machine-config Progressing is True with : Working towards 4.7.0-0.nightly-2020-12-03-103850
level=error msg=Cluster operator machine-config Degraded is True with RequiredPoolsFailed: Unable to apply 4.7.0-0.nightly-2020-12-03-103850: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)
level=info msg=Cluster operator machine-config Available is False with : Cluster not available for 4.7.0-0.nightly-2020-12-03-103850
level=info msg=Cluster operator network ManagementStateDegraded is False with : 
level=info msg=Cluster operator storage Progressing is True with GCPPDCSIDriverOperatorCR_GCPPDDriverControllerServiceController_Deploying: GCPPDCSIDriverOperatorCRProgressing: GCPPDDriverControllerServiceControllerProgressing: Waiting for Deployment to deploy pods
level=info msg=Cluster operator storage Available is False with GCPPDCSIDriverOperatorCR_GCPPDDriverControllerServiceController_Deploying: GCPPDCSIDriverOperatorCRAvailable: GCPPDDriverControllerServiceControllerAvailable: Waiting for Deployment to deploy the CSI Controller Service
level=error msg=Cluster initialization failed because one or more operators are not functioning properly.
level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation
level=fatal msg=failed to initialize the cluster: Cluster operator machine-config is still updating


./oc get co shows these with problem:
machine-config                                                                 False       True          True       126m

storage                                    4.7.0-0.nightly-2020-12-03-103850   False       True          False      35s


Expected results:
Cluster finishes installation

Additional info:

Comment 1 To Hung Sze 2020-12-03 21:04:40 UTC
I have the must-gather - too big to be attached here.
Please ping me and I can send it over / share it.

Comment 2 Kirsten Garrison 2020-12-03 23:55:52 UTC
Hi, can you please attach a must gather from this cluster?

Comment 3 To Hung Sze 2020-12-04 15:33:22 UTC
Just shared the zip file with you.