Bug 1745907 - [upi-gcp] can not finish installation due to missing dns entry and lb for *.apps
Summary: [upi-gcp] can not finish installation due to missing dns entry and lb for *.apps
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.2.0
Assignee: Jeremiah Stuever
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-27 08:10 UTC by liujia
Modified: 2019-10-16 06:38 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:37:54 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift installer pull 2289 None None None 2020-03-18 07:19:33 UTC
Red Hat Product Errata RHBA-2019:2922 None None None 2019-10-16 06:38:02 UTC

Description liujia 2019-08-27 08:10:50 UTC
Description of problem:
Follow up https://github.com/openshift/installer/blob/master/docs/user/gcp/install_upi.md to trigger upi on gcp.

# ./openshift-install wait-for install-complete --dir test
INFO Waiting up to 30m0s for the cluster at https://api.jliu.origin-gce.dev.openshift.com:6443 to initialize... 
FATAL failed to initialize the cluster: Working towards 4.2.0-0.nightly-2019-08-21-235427: 100% complete 

# ./oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                                                 Unknown     Unknown       True       64m
cloud-credential                           4.2.0-0.nightly-2019-08-21-235427   True        False         False      128m
cluster-autoscaler                         4.2.0-0.nightly-2019-08-21-235427   True        False         False      61m
console                                    4.2.0-0.nightly-2019-08-21-235427   False       True          False      62m
...

Checked that some apps route were not reachable.
 ./oc logs pod/console-798fcb997b-gswlm
2019/08/27 03:47:06 cmd/main: cookies are secure!
2019/08/27 03:47:06 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.jliu.origin-gce.dev.openshift.com/oauth/token failed: Head https://oauth-openshift.apps.jliu.origin-gce.dev.openshift.com: dial tcp: lookup oauth-openshift.apps.jliu.origin-gce.dev.openshift.com on 172.30.0.10:53: no such host

There are not apps dns records.
# gcloud dns record-sets list --zone=jliu-68n2v-private-zone
NAME                                                      TYPE  TTL    DATA
jliu.origin-gce.dev.openshift.com.                        NS    21600  ns-gcp-private.googledomains.com.
jliu.origin-gce.dev.openshift.com.                        SOA   21600  ns-gcp-private.googledomains.com. cloud-dns-hostmaster.google.com. 3 21600 3600 259200 300
_etcd-server-ssl._tcp.jliu.origin-gce.dev.openshift.com.  SRV   60     0 10 2380 etcd-0.jliu.origin-gce.dev.openshift.com.,0 10 2380 etcd-1.jliu.origin-gce.dev.openshift.com.,0 10 2380 etcd-2.jliu.origin-gce.dev.openshift.com.
api.jliu.origin-gce.dev.openshift.com.                    A     60     35.237.235.245
api-int.jliu.origin-gce.dev.openshift.com.                A     60     35.237.235.245
etcd-0.jliu.origin-gce.dev.openshift.com.                 A     60     10.0.0.5
etcd-1.jliu.origin-gce.dev.openshift.com.                 A     60     10.0.0.4
etcd-2.jliu.origin-gce.dev.openshift.com.                 A     60     10.0.0.3

We need one more record like following added into both base domain and private zone.

*.apps.jliu.origin-gce.dev.openshift.com.                 A     300    34.74.51.20

Since the worker node has not public ip, so an lb with public ip also need to be created in step [Create DNS entries and load balancers]


Version-Release number of the following components:
 
How reproducible:
always

Steps to Reproduce:
1. 
2.
3.

Actual results:
Install fail

Expected results:
Install succeed

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Abhinav Dahiya 2019-08-27 16:27:15 UTC
This seems like similar to what's being asked in https://bugzilla.redhat.com/show_bug.cgi?id=1715635

PR for the above one is https://github.com/openshift/installer/pull/2221

Comment 2 liujia 2019-08-29 07:26:40 UTC
Just have a try on the latest 4.2.0-0.nightly-2019-08-29-041601, the ingress operator can generate the lb and *.apps records successfully without extra config this time. So i'not sure if it's a bug for ingress operator on 4.2.0-0.nightly-2019-08-21-235427 or just lukcy as https://bugzilla.redhat.com/show_bug.cgi?id=1715635#c5 said.

So keep the bug open to track if extra steps needed for *.apps records on upi/gcp process. Since it's not blocking qe's test now, adjust severity and priority.

Comment 3 Jeremiah Stuever 2019-08-29 16:19:09 UTC
Perhaps because of https://github.com/openshift/cluster-ingress-operator/pull/286

Comment 4 Jeremiah Stuever 2019-08-29 18:15:17 UTC
I just tried with latest from installer (master branch) and I still see the issue. The ingress-router load balancers are not getting created, probably because there are no compute nodes. The DNS entries are not created until the load balancers exist. Therefore, they rely on compute nodes to exist as well. Perhaps the "luck" is having added compute nodes to the cluster?

I have a mostly ready PR to clarify how to add the compute nodes as well as how to add the apps DNS records manually. Will be posted as soon as I get it cleaned up.

Comment 6 liujia 2019-09-09 06:38:38 UTC
Version:4.2.0-0.nightly-2019-09-08-180038

Follow up latest doc https://github.com/openshift/installer/blob/master/docs/user/gcp/install_upi.md

Before create iginition files, do https://github.com/openshift/installer/blob/master/docs/user/gcp/install_upi.md#remove-dns-zones-optional
After bootstrap complete, do https://github.com/openshift/installer/blob/master/docs/user/gcp/install_upi.md#add-the-ingress-dns-records-optional

# ./openshift-install wait-for install-complete
INFO Waiting up to 30m0s for the cluster at https://api.jliu-4353.qe.gcp.devcluster.openshift.com:6443 to initialize... 
INFO Waiting up to 10m0s for the openshift-console route to be created... 
INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root/work/upi_gcp/20190909_25655/auth/kubeconfig' 
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.jliu-4353.qe.gcp.devcluster.openshift.com 
INFO Login to the console with user: kubeadmin, password: dLsAy-TLmX8-RtQZ5-WXawL

# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-09-08-180038   True        False         4m56s   Cluster version is 4.2.0-0.nightly-2019-09-08-180038

Comment 7 errata-xmlrpc 2019-10-16 06:37:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.