Bug 1877972

Summary: [GCP] 4.6 Install with Proxy fails
Product: OpenShift Container Platform Reporter: To Hung Sze <tsze>
Component: InstallerAssignee: aos-install
Installer sub component: openshift-installer QA Contact: To Hung Sze <tsze>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: adahiya, esimard
Version: 4.6   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-11 17:19:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gather bootstrap log
none
gather bootstrap log
none
install.log none

Description To Hung Sze 2020-09-10 22:47:32 UTC
Created attachment 1714479 [details]
gather bootstrap log

Description of problem:
Install 4.6 nightly build with proxy and it fails.

How reproducible:
Always

Steps to Reproduce:
1. Install 4.6.0-0.nightly-2020-09-10-082657 with proxy (using Flexy)
2.
3.

Actual results:
Install fails with 
time="2020-09-10T21:24:54Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.6.0-0.nightly-2020-09-10-082657: 99% complete"
time="2020-09-10T21:26:39Z" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.0.5:6443/.well-known/oauth-authorization-server endpoint data\n* Cluster operator console is reporting a failure: RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.tszegcp91020h.qe.gcp.devcluster.openshift.com/health): Get \"https://console-openshift-console.apps.tszegcp91020h.qe.gcp.devcluster.openshift.com/health\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\n* Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"
time="2020-09-10T21:29:04Z" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.0.5:6443/.well-known/oauth-authorization-server endpoint data\n* Cluster operator console is reporting a failure: RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.tszegcp91020h.qe.gcp.devcluster.openshift.com/health): Get \"https://console-openshift-console.apps.tszegcp91020h.qe.gcp.devcluster.openshift.com/health\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\n* Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"
time="2020-09-10T21:29:54Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.6.0-0.nightly-2020-09-10-082657: 99% complete"
time="2020-09-10T21:32:24Z" level=debug msg="Still waiting for the cluster to initialize: Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"
time="2020-09-10T21:35:38Z" level=info msg="Cluster operator insights Disabled is False with AsExpected: "
time="2020-09-10T21:35:38Z" level=fatal msg="failed to initialize the cluster: Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"

Expected results:
Install finishes

Additional info:
https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/111404

Please assign to the right person / team

Comment 1 To Hung Sze 2020-09-10 22:48:36 UTC
Created attachment 1714480 [details]
gather bootstrap log

Comment 2 To Hung Sze 2020-09-11 13:08:12 UTC
Created attachment 1714557 [details]
install.log

Comment 3 Etienne Simard 2020-09-11 17:12:19 UTC
This is an issue for all platforms (I confirmed it on Azure). This bug was also created: https://bugzilla.redhat.com/show_bug.cgi?id=1878030

Comment 4 Abhinav Dahiya 2020-09-11 17:19:40 UTC
> time="2020-09-10T21:29:54Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.6.0-0.nightly-2020-09-10-082657: 99% complete"
> time="2020-09-10T21:32:24Z" level=debug msg="Still waiting for the cluster to initialize: Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"

I think Comment 3 from Etienne also looks like the same bug, marking as duplicate.

*** This bug has been marked as a duplicate of bug 1878030 ***

Comment 5 To Hung Sze 2020-09-11 20:25:01 UTC
I rebuilt an ipi with proxy and can confirm there is no csi-snapshot-controller-operator

$ ./oc get co csi-snapshot-controller
NAME                      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
csi-snapshot-controller                                                  



$ ./oc -n openshift-cluster-storage-operator get all
NAME                                            READY   STATUS    RESTARTS   AGE
pod/cluster-storage-operator-69b8b69969-smrxv   1/1     Running   1          176m

NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/csi-snapshot-controller-operator-metrics   ClusterIP   172.30.132.103   <none>        443/TCP   176m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-storage-operator   1/1     1            1           176m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-storage-operator-69b8b69969   1         1         1       176m

Comment 6 To Hung Sze 2020-09-11 22:41:46 UTC
Actually, I take it back. I was looking at wrong cluster above.

This is the right one:
$ ./oc -n openshift-cluster-storage-operator get all
NAME                                            READY   STATUS    RESTARTS   AGE
pod/cluster-storage-operator-69b8b69969-wf7rv   1/1     Running   1          106m

NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/csi-snapshot-controller-operator-metrics   ClusterIP   172.30.110.164   <none>        443/TCP   107m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-storage-operator   1/1     1            1           106m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-storage-operator-69b8b69969   1         1         1       106m

from Flexy job 111612
Same error in install log:
level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.0.5:6443/.well-known/oauth-authorization-server endpoint data\n* Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"
level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.0.5:6443/.well-known/oauth-authorization-server endpoint data\n* Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"
level=debug msg="Still waiting for the cluster to initialize: Working towards 4.6.0-0.nightly-2020-09-10-082657: 99% complete"
level=info msg="Cluster operator insights Disabled is False with AsExpected: "
level=fatal msg="failed to initialize the cluster: Working towards 4.6.0-0.nightly-2020-09-10-082657: 99% complete"
+ ret=1
+ need_recheck=0
+ '[' X1 '!=' X0 ']'
+ '[' Xno == Xyes ']'
+ '[' X0 '!=' X0 ']'
+ exit 1