Bug 1659117

Summary: openshift_hostname is ignored and doesn't provoke a fatal
Product: OpenShift Container Platform Reporter: Juan Luis de Sousa-Valadas <jdesousa>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED WONTFIX QA Contact: Johnny Liu <jialiu>
Severity: high Docs Contact:
Priority: high    
Version: 3.11.0CC: aos-bugs, jdesousa, jokerman, mgugino, mmccomas, sauchter, travi
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-03 18:29:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Juan Luis de Sousa-Valadas 2018-12-13 15:52:17 UTC
Description of problem:
Between  z-streams of 3.11 openshift_hostname started being ignored. The playbook doens't fail immediately.

I'm setting high severity and high priority because the bug is identified and has huge potential impact


Version-Release number of the following components:
Customer reproduced this on 3.10.73-1 but 3.10.87-1 should also be affected
openshift_hostname used to work on openshift-ansible-3.10.47-1

How reproducible:
Always

Steps to Reproduce:
1. Specify a hostname using openshift_hostname instead of openshift_kubelet_name_override
2. Verify openshift_facts

Actual results:
openshift_hostname is ignored, the playbook keeps running with a wrong value.

This has a huge impact because it can break certificates and make several components connect to unexpected endpoints.

Expected results:
openshift_hostname is either honored with a deprecation warning or causes the playbook to fail if it's different openshift_kubelet_name_override

Additional info:


$ git diff 95bc2d2  playbooks/init/validate_hostnames.yml
diff --git a/playbooks/init/validate_hostnames.yml b/playbooks/init/validate_hostnames.yml
index b37e6fec4..ca280684b 100644
--- a/playbooks/init/validate_hostnames.yml
+++ b/playbooks/init/validate_hostnames.yml
@@ -10,19 +10,20 @@
     changed_when: false
     failed_when: false
 
-  - name: Validate openshift_hostname when defined
+  - name: Validate openshift_kubelet_name_override when defined
     fail:
       msg: >
         The hostname {{ openshift.common.hostname }} for {{ ansible_nodename }}
         doesn't resolve to an IP address owned by this host. Please set
-        openshift_hostname variable to a hostname that when resolved on the host
+        openshift_kubelet_name_override variable to a hostname that when resolved on the host
         in question resolves to an IP address matching an interface on this host.
         This will ensure proper functionality of OpenShift networking features.
-        Inventory setting: openshift_hostname={{ openshift_hostname | default ('undefined') }}
+        Inventory setting: openshift_kubelet_name_override={{ openshift_kubelet_name_override | default ('undefined') }}
         This check can be overridden by setting openshift_hostname_check=false in
         the inventory.
         See https://docs.openshift.org/latest/install_config/install/advanced_install.html#configuring-host-variables
     when:
+    - openshift_kubelet_name_override is defined
     - lookupip.stdout != '127.0.0.1'
     - lookupip.stdout not in ansible_all_ipv4_addresses
     - openshift_hostname_check | default(true) | bool

$ git blame playbooks/init/validate_hostnames.yml | grep kubelet_name_override
5ce5800906 playbooks/init/validate_hostnames.yml                     (Michael Gugino      2018-10-05 10:22:35 -0400 13)   - name: Validate openshift_kubelet_name_override when defined
5ce5800906 playbooks/init/validate_hostnames.yml                     (Michael Gugino      2018-10-05 10:22:35 -0400 18)         openshift_kubelet_name_override variable to a hostname that when resolved on the host
5ce5800906 playbooks/init/validate_hostnames.yml                     (Michael Gugino      2018-10-05 10:22:35 -0400 21)         Inventory setting: openshift_kubelet_name_override={{ openshift_kubelet_name_override | default ('undefined') }}
5ce5800906 playbooks/init/validate_hostnames.yml                     (Michael Gugino      2018-10-05 10:22:35 -0400 26)     - openshift_kubelet_name_override is defined

$ git show 5ce5800906 --summary
commit 5ce5800906255a2a6bf940a17908be59d9861de7
Author: Michael Gugino <mgugino>
Date:   Fri Oct 5 10:22:35 2018 -0400

    Fail on openshift_hostname defined; add openshift_kubelet_name_override
    
    Adding openshift_kubelet_name_override as a stand-in for
    various places we will need to account for possible hostname
    overrides.
    
    (cherry picked from commit 1faee0942dec05b6f652669ad6cfced986a0cbc9)
    (cherry picked from commit 8d3509838c7ecc2bafafa7f7815b9964bf08cdda)

Comment 2 Scott Dodson 2018-12-17 19:25:32 UTC
Can you please clarify what is meant by step #2? Which playbook is actually executed that produces unexpected results?

> Steps to Reproduce:
> 1. Specify a hostname using openshift_hostname instead of
> openshift_kubelet_name_override
> 2. Verify openshift_facts

Comment 3 Juan Luis de Sousa-Valadas 2018-12-24 08:43:53 UTC
Run the openshift_facts.yml playbook and check ansible_facts.openshift.common.hostname. It's not honored.

Comment 4 Scott Dodson 2019-01-03 18:29:24 UTC
The prerequisites.yml and deploy_cluster.yml playbooks should treat that as a fatal condition. The purpose of the playbook you're running is to calculate the default values and it seems to be doing that as expected.

Comment 5 Juan Luis de Sousa-Valadas 2019-01-04 08:48:51 UTC
Scott, the problem isn'g installation but upgrading. During an upgrade neither prerequisites.yml nor deploy_cluster.yml have to be executed. Therefore a user who is running 3.10.73-1 and decides to upgrade to the latest Z-stream may have this problem.

This is a breaking change in the middle of a Z-stream, can we please reconsider if that is the behavior we want?

Comment 6 Michael Gugino 2019-01-04 14:43:18 UTC
(In reply to Juan Luis de Sousa-Valadas from comment #5)
> Scott, the problem isn'g installation but upgrading. During an upgrade
> neither prerequisites.yml nor deploy_cluster.yml have to be executed.
> Therefore a user who is running 3.10.73-1 and decides to upgrade to the
> latest Z-stream may have this problem.
> 
> This is a breaking change in the middle of a Z-stream, can we please
> reconsider if that is the behavior we want?

This would be handled by sanity_checks module which runs during installs and upgrades.  Please provide logs, inventory, and appropriate version information.  I don't believe this is currently an issue.