Bug 1877972 - [GCP] 4.6 Install with Proxy fails
Summary: [GCP] 4.6 Install with Proxy fails
Keywords:
Status: CLOSED DUPLICATE of bug 1878030
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: aos-install
QA Contact: To Hung Sze
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-10 22:47 UTC by To Hung Sze
Modified: 2020-09-11 22:41 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-11 17:19:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
gather bootstrap log (12.11 MB, application/gzip)
2020-09-10 22:47 UTC, To Hung Sze
no flags Details
gather bootstrap log (12.11 MB, application/gzip)
2020-09-10 22:48 UTC, To Hung Sze
no flags Details
install.log (135.53 KB, text/plain)
2020-09-11 13:08 UTC, To Hung Sze
no flags Details

Description To Hung Sze 2020-09-10 22:47:32 UTC
Created attachment 1714479 [details]
gather bootstrap log

Description of problem:
Install 4.6 nightly build with proxy and it fails.

How reproducible:
Always

Steps to Reproduce:
1. Install 4.6.0-0.nightly-2020-09-10-082657 with proxy (using Flexy)
2.
3.

Actual results:
Install fails with 
time="2020-09-10T21:24:54Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.6.0-0.nightly-2020-09-10-082657: 99% complete"
time="2020-09-10T21:26:39Z" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.0.5:6443/.well-known/oauth-authorization-server endpoint data\n* Cluster operator console is reporting a failure: RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.tszegcp91020h.qe.gcp.devcluster.openshift.com/health): Get \"https://console-openshift-console.apps.tszegcp91020h.qe.gcp.devcluster.openshift.com/health\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\n* Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"
time="2020-09-10T21:29:04Z" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.0.5:6443/.well-known/oauth-authorization-server endpoint data\n* Cluster operator console is reporting a failure: RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.tszegcp91020h.qe.gcp.devcluster.openshift.com/health): Get \"https://console-openshift-console.apps.tszegcp91020h.qe.gcp.devcluster.openshift.com/health\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\n* Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"
time="2020-09-10T21:29:54Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.6.0-0.nightly-2020-09-10-082657: 99% complete"
time="2020-09-10T21:32:24Z" level=debug msg="Still waiting for the cluster to initialize: Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"
time="2020-09-10T21:35:38Z" level=info msg="Cluster operator insights Disabled is False with AsExpected: "
time="2020-09-10T21:35:38Z" level=fatal msg="failed to initialize the cluster: Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"

Expected results:
Install finishes

Additional info:
https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/111404

Please assign to the right person / team

Comment 1 To Hung Sze 2020-09-10 22:48:36 UTC
Created attachment 1714480 [details]
gather bootstrap log

Comment 2 To Hung Sze 2020-09-11 13:08:12 UTC
Created attachment 1714557 [details]
install.log

Comment 3 Etienne Simard 2020-09-11 17:12:19 UTC
This is an issue for all platforms (I confirmed it on Azure). This bug was also created: https://bugzilla.redhat.com/show_bug.cgi?id=1878030

Comment 4 Abhinav Dahiya 2020-09-11 17:19:40 UTC
> time="2020-09-10T21:29:54Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.6.0-0.nightly-2020-09-10-082657: 99% complete"
> time="2020-09-10T21:32:24Z" level=debug msg="Still waiting for the cluster to initialize: Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"

I think Comment 3 from Etienne also looks like the same bug, marking as duplicate.

*** This bug has been marked as a duplicate of bug 1878030 ***

Comment 5 To Hung Sze 2020-09-11 20:25:01 UTC
I rebuilt an ipi with proxy and can confirm there is no csi-snapshot-controller-operator

$ ./oc get co csi-snapshot-controller
NAME                      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
csi-snapshot-controller                                                  



$ ./oc -n openshift-cluster-storage-operator get all
NAME                                            READY   STATUS    RESTARTS   AGE
pod/cluster-storage-operator-69b8b69969-smrxv   1/1     Running   1          176m

NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/csi-snapshot-controller-operator-metrics   ClusterIP   172.30.132.103   <none>        443/TCP   176m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-storage-operator   1/1     1            1           176m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-storage-operator-69b8b69969   1         1         1       176m

Comment 6 To Hung Sze 2020-09-11 22:41:46 UTC
Actually, I take it back. I was looking at wrong cluster above.

This is the right one:
$ ./oc -n openshift-cluster-storage-operator get all
NAME                                            READY   STATUS    RESTARTS   AGE
pod/cluster-storage-operator-69b8b69969-wf7rv   1/1     Running   1          106m

NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/csi-snapshot-controller-operator-metrics   ClusterIP   172.30.110.164   <none>        443/TCP   107m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-storage-operator   1/1     1            1           106m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-storage-operator-69b8b69969   1         1         1       106m

from Flexy job 111612
Same error in install log:
level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.0.5:6443/.well-known/oauth-authorization-server endpoint data\n* Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"
level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.0.5:6443/.well-known/oauth-authorization-server endpoint data\n* Could not update deployment \"openshift-cluster-storage-operator/csi-snapshot-controller-operator\" (246 of 602)"
level=debug msg="Still waiting for the cluster to initialize: Working towards 4.6.0-0.nightly-2020-09-10-082657: 99% complete"
level=info msg="Cluster operator insights Disabled is False with AsExpected: "
level=fatal msg="failed to initialize the cluster: Working towards 4.6.0-0.nightly-2020-09-10-082657: 99% complete"
+ ret=1
+ need_recheck=0
+ '[' X1 '!=' X0 ']'
+ '[' Xno == Xyes ']'
+ '[' X0 '!=' X0 ']'
+ exit 1


Note You need to log in before you can comment on or make changes to this bug.