Bug 1573826

Summary: can not overwrite masterPublicURL by setting openshift_public_hostname
Product: OpenShift Container Platform Reporter: Wenqi He <wehe>
Component: InstallerAssignee: Russell Teague <rteague>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: high Docs Contact:
Priority: urgent    
Version: 3.10.0CC: akostadi, aos-bugs, hongkliu, jokerman, mmccomas, rteague, wehe, weshi, wmeng, xxia
Target Milestone: ---Keywords: TestBlocker
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The openshift.fact file generated during install was inadvertently being deleted during install. This caused failures in setting other facts later in the install on certain cloud providers.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:14:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Wenqi He 2018-05-02 10:53:31 UTC
I found not only master-config.yaml, other config and certificate are all wrong with the public hostname. I'd like to add TestBlocker keyword since this is blocking Azure related testing. Thanks.

Comment 2 Scott Dodson 2018-05-02 12:30:05 UTC
Has this worked in the past? I'd expect openshift_master_cluster_public_hostname to be the variable you'd set and it'd be set in [OSEv3:vars] rather than on the master host only.

Comment 3 Wenkai Shi 2018-05-03 03:32:21 UTC
(In reply to Scott Dodson from comment #2)
> Has this worked in the past? I'd expect
> openshift_master_cluster_public_hostname to be the variable you'd set and
> it'd be set in [OSEv3:vars] rather than on the master host only.

This worked in the past. A regression I believe.
Do you mean we should add openshift_master_cluster_public_hostname parameters whatever HA or None-HA installation? 

For HA installation, openshift_master_cluster_public_hostname set to the lb host.
For None-HA installation, openshift_master_cluster_public_hostname set to the master host.

Comment 4 Johnny Liu 2018-05-03 07:37:20 UTC
installation on GCE also hit similar issues, which was working well in the past.

[masters]
qe-jialiu-master-etcd-1.0502-3cg.qe.rhcloud.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave4/workspace/Launch Environment Flexy/private/config/keys/libra.pem" openshift_public_hostname=qe-jialiu-master-etcd-1.0502-3cg.qe.rhcloud.com openshift_hostname=qe-jialiu-master-etcd-1

# cat /etc/origin/master/master-config.yaml | grep "qe-jialiu-master-etcd-1.0502-3cg.qe.rhcloud.com"
<empty>

# ping qe-jialiu-master-etcd-1.0502-3cg.qe.rhcloud.com
PING qe-jialiu-master-etcd-1.0502-3cg.qe.rhcloud.com (35.232.68.70) 56(84) bytes of data.
64 bytes from 70.68.232.35.bc.googleusercontent.com (35.232.68.70): icmp_seq=1 ttl=76 time=0.996 ms
64 bytes from 70.68.232.35.bc.googleusercontent.com (35.232.68.70): icmp_seq=2 ttl=76 time=0.495 ms


# cat /etc/origin/master/master-config.yaml | grep "35.232.68.70"
masterPublicURL: https://35.232.68.70:8443
  assetPublicURL: https://35.232.68.70:8443/console/
  masterPublicURL: https://35.232.68.70:8443

All occurrence of public hostname is replaced by its IP.

Comment 5 Russell Teague 2018-05-03 19:26:01 UTC
Unable to reproduce this.  Please attach an inventory.

Comment 6 Russell Teague 2018-05-03 20:11:33 UTC
Additional investigative notes:

masterPublicURL in master.yaml.v1.j2
   is set by openshift.master.public_api_url
   which is set by openshift_master_public_api_url  | default(None)

If openshift_master_public_api_url is not set,
   the openshift_facts module will set a default value for openshift.master.public_api_url
   if openshift_master_cluster_public_hostname is set it will use that value to create the URL
   otherwise, it will use openshift_public_hostname

Variable order of preference:
   openshift_master_public_api_url
      openshift_master_cluster_public_hostname (default URL created from this)
         openshift_public_hostname             (default URL created from this)

Comment 7 Wenqi He 2018-05-04 03:27:23 UTC
As Scott said, after added
openshift_master_cluster_public_hostname=storage-master-xxx.azure.com to [OSEv3:vars], this issue is not reproduced, the masterPublicURL is set correctly.

But from the logic here: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_facts/library/openshift_facts.py#L381-#L384

I have not tried whether we can use openshift_public_hostname instead of openshift_master_cluster_public_hostname in [OSEv3:vars], seems should be openshift_common_public_hostname. But only in [masters] to set openshift_public_hostname is definitely not working. Thanks

Comment 8 Wenqi He 2018-05-04 03:28:57 UTC
And I'd like to remove TestBlocker keyword firstly.

Comment 9 Russell Teague 2018-05-07 11:43:53 UTC
I am unable to reproduce this issue.  Please attach a full inventory which causes the reported issue.  Thanks!

Comment 10 Wenqi He 2018-05-08 01:45:25 UTC
(In reply to Russell Teague from comment #9)
> I am unable to reproduce this issue.  Please attach a full inventory which
> causes the reported issue.  Thanks!

Can you take a look of my comment #7 ? After I added openshift_master_cluster_public_hostname to the [OSEv3:vars], I am unable to reproduce this either. But if we only add openshift_public_hostname to [masters] it will reproduce (As my description). I only have inventory file on Azure installation, tell me if you still need it.

Till now, I am not sure whether we have doc to mention this change or not, so I am not sure if this is a regression or we need to modify user of this change.

In case still not clear, reproduce with:

No openshift_master_cluster_public_hostname in [OSEv3:vars]
Only have openshift_public_hostname in [masters]

[masters]
storage-master-xxx.xxx.xxx.azure.com openshift_public_hostname=storage-master-xxx.xxx.xxx.azure.com openshift_hostname=storage-master-xxx

Comment 12 Wenkai Shi 2018-05-09 03:06:31 UTC
This issue is IaaS related, it doesn't appear in aws.

According to comment 4, in GCE the masterPublicURL became public ip address.
According to comment 10, in Azure the masterPublicURL became internal hostname.

According to comment 6, all of them should be overwrite by openshift_public_hostname if no openshift_master_public_api_url or openshift_master_cluster_public_hostname defined.

Comment 13 Scott Dodson 2018-05-09 19:32:38 UTC
Wenkai,

We're concerned that some other variable may be affecting the behavior here, the variable inheritance here is quite complex. Can you please attach a complete inventory and ideally logs from each permutation where you've been able to reproduce this? When we've tried to reproduce this we can't do so with only the information provided thus far.

Thanks,
Scott

Comment 20 Wenkai Shi 2018-05-17 09:02:54 UTC
It's block Azure testing.

Comment 21 Russell Teague 2018-05-17 11:50:51 UTC
I've been looking into this bug but have not yet found the issue.  I'm still working on reproducing on GCE as I don't have access to Azure.  In looking through the Azure logs attached, I found that at "TASK [openshift_cloud_provider : Set cloud provider facts]" the openshift.common.public_hostname was set correctly as defined in inventory.  However, at "TASK [openshift_master_facts : Set master facts]" the openshift.common.public_hostname fact now only had the hostname portion and not the domain.  A play between these tasks is causing the facts to be reset.  In the inventory I noticed that the same host was defined for the [nfs] group but did not have the same hostvars defined.  The NFS playbook run between the two tasks above so it could possibly be related.  As stated above, this works fine on AWS.  I've been digging further into unique code related to GCE/Azure in attempts to find differences which would cause the issue.

Comment 22 Scott Dodson 2018-05-17 13:46:11 UTC
https://github.com/openshift/openshift-ansible/pull/8394 may have fixed this can we retest with that change in the next build? Moving to MODIFIED so this goes ON_QA with the next build

Comment 23 Russell Teague 2018-05-21 18:49:19 UTC
Possible fix in openshift-ansible-3.10.0-0.48.0 and later.

Comment 24 Russell Teague 2018-05-21 18:51:33 UTC
*** Bug 1578539 has been marked as a duplicate of this bug. ***

Comment 25 Wenkai Shi 2018-05-22 06:13:21 UTC
Verified with version openshift-ansible-3.10.0-0.50.0.git.0.bd68ade.el7, the masterPublicURL can be overwrite by setting openshift_public_hostname.

Comment 28 errata-xmlrpc 2018-07-30 19:14:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816