Description of problem: Ugrade fails at a very early stage. It is inconsistent and tries to find node short name instead of fqdn name provided in the inventory : 2018-08-10 02:08:55,534 p=17663 u=root | TASK [openshift_manage_node : Set node schedulability] ************************************************************************************************************************************************************ 2018-08-10 02:08:56,810 p=17663 u=root | FAILED - RETRYING: Set node schedulability (10 retries left). 2018-08-10 02:08:56,811 p=17663 u=root | FAILED - RETRYING: Set node schedulability (10 retries left). 2018-08-10 02:08:56,845 p=17663 u=root | FAILED - RETRYING: Set node schedulability (10 retries left). 2018-08-10 02:09:02,822 p=17663 u=root | FAILED - RETRYING: Set node schedulability (9 retries left). 2018-08-10 02:09:02,893 p=17663 u=root | FAILED - RETRYING: Set node schedulability (9 retries left). 2018-08-10 02:09:02,898 p=17663 u=root | FAILED - RETRYING: Set node schedulability (9 retries left). 2018-08-10 02:09:08,810 p=17663 u=root | FAILED - RETRYING: Set node schedulability (8 retries left). 2018-08-10 02:09:08,914 p=17663 u=root | FAILED - RETRYING: Set node schedulability (8 retries left). 2018-08-10 02:09:08,927 p=17663 u=root | FAILED - RETRYING: Set node schedulability (8 retries left). 2018-08-10 02:09:14,830 p=17663 u=root | FAILED - RETRYING: Set node schedulability (7 retries left). 2018-08-10 02:09:14,947 p=17663 u=root | FAILED - RETRYING: Set node schedulability (7 retries left). 2018-08-10 02:09:14,950 p=17663 u=root | FAILED - RETRYING: Set node schedulability (7 retries left). 2018-08-10 02:09:20,849 p=17663 u=root | FAILED - RETRYING: Set node schedulability (6 retries left). 2018-08-10 02:09:20,984 p=17663 u=root | FAILED - RETRYING: Set node schedulability (6 retries left). 2018-08-10 02:09:21,016 p=17663 u=root | FAILED - RETRYING: Set node schedulability (6 retries left). 2018-08-10 02:09:26,843 p=17663 u=root | FAILED - RETRYING: Set node schedulability (5 retries left). 2018-08-10 02:09:26,970 p=17663 u=root | FAILED - RETRYING: Set node schedulability (5 retries left). 2018-08-10 02:09:27,035 p=17663 u=root | FAILED - RETRYING: Set node schedulability (5 retries left). 2018-08-10 02:09:32,838 p=17663 u=root | FAILED - RETRYING: Set node schedulability (4 retries left). 2018-08-10 02:09:32,987 p=17663 u=root | FAILED - RETRYING: Set node schedulability (4 retries left). 2018-08-10 02:09:33,074 p=17663 u=root | FAILED - RETRYING: Set node schedulability (4 retries left). 2018-08-10 02:09:38,835 p=17663 u=root | FAILED - RETRYING: Set node schedulability (3 retries left). 2018-08-10 02:09:39,010 p=17663 u=root | FAILED - RETRYING: Set node schedulability (3 retries left). 2018-08-10 02:09:39,090 p=17663 u=root | FAILED - RETRYING: Set node schedulability (3 retries left). 2018-08-10 02:09:44,860 p=17663 u=root | FAILED - RETRYING: Set node schedulability (2 retries left). 2018-08-10 02:09:45,016 p=17663 u=root | FAILED - RETRYING: Set node schedulability (2 retries left). 2018-08-10 02:09:45,115 p=17663 u=root | FAILED - RETRYING: Set node schedulability (2 retries left). 2018-08-10 02:09:50,881 p=17663 u=root | FAILED - RETRYING: Set node schedulability (1 retries left). 2018-08-10 02:09:51,067 p=17663 u=root | FAILED - RETRYING: Set node schedulability (1 retries left). 2018-08-10 02:09:51,157 p=17663 u=root | FAILED - RETRYING: Set node schedulability (1 retries left). 2018-08-10 02:09:56,901 p=17663 u=root | fatal: [m001.example.com -> m001.example.com ]: FAILED! => {"attempts": 10, "changed": false, "failed": true, "msg": {"results": [{"cmd": "/usr/bin/oc get node m001 -o json", "results": [{}], "returncode": 1, "stderr": "Error from server (NotFound): nodes \"m001\" not found\n", "stdout": ""}], "returncode": 1}} Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Fails at node schedulability in any number of ansible runs Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: It should pass this. Additional info: Please attach logs from ansible-playbook with the -vvv flag
*** Bug 1614625 has been marked as a duplicate of this bug. ***
There are actually two scenarios here. If they had previously set openshift_hostname they were effectively setting a configuration item that no longer fits with the model of specifying configuration at a host group level. We will be re-introducing the ability to specify this value as an override which will be available until they upgrade to 4.0. When upgrading to 4.0 they will need to go through a process to insure that their nodename matches the output of `hostname`. That migration process is yet to be defined. The ability to override this value for clean 3.10+ installs will not be re-introduced. If they have not previously set openshift_hostname there's a similar situation where in 3.9 openshift-ansible used `hostname -f` to set nodeName in config. Since that config file value is no longer valid we needed to align openshift-ansible with the kubelet which uses `hostname` rather than `hostname -f`. For this scenario the easiest solution would be to set the host's hostname to the FQDN, ex: `hostnamectl set-hostname ose3-master.example.com` Since this may affect other items running on the host please validate this change in a test environment to minimize risk. This workaround should work with the currently shipped version of openshift-ansible. If for some reason they cannot update their hostname value the override from the first scenario can also be used once it becomes available. We're working on validating the hostname override work now. We do not have a definitive timeline for when that will be available in a 3.10 errata.
https://github.com/openshift/openshift-ansible/pull/10356 implements the changes described in comment #25 on release-3.11 branch
Should be a regression involved by disabling openshift_hostname in v3.10. The background could be available in bz1613765/bz1572859/bz1566455. Went through pr10356, maybe influence install/upgrade/scaleup node/aws scale group/glusterfs. There are some verify scenarios from dev in https://gist.github.com/michaelgugino/c961476d8be7d160a5e53fe9a9734051 This fix should be available in both v3.10 and v3.11. V3.10 is here(https://bugzilla.redhat.com/show_bug.cgi?id=1638521)
Record related 3.11 fix PR here: https://github.com/openshift/openshift-ansible/pull/10356
PR merged for 3.11: https://github.com/openshift/openshift-ansible/pull/10447
Verified on openshift-ansible-3.11.31-1.git.0.d4b5614.el7.noarch
Verified on openshift-ansible-3.11.31-1.git.0.d4b5614.el7.noarch.rpm Scenario 3 (PASSED) 1) set openshift_kubelet_name_override and get: TASK [Fail when openshift_kubelet_name_override is defined] ******************** fatal: [host-xxxxx.redhat.com]: FAILED! => {"changed": false, "msg": "openshift_kubelet_name_override Cannot be defined for new hosts"} to retry, use: --limit @~/playbooks/openshift-node/scaleup.retry 2) remove openshift_kubelet_name_override then play again: Scale up success and check with: a) oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git -l appnew=new_node b) # oc get pod NAME READY STATUS RESTARTS AGE ruby-ex-1-build 0/1 Completed 0 2m ruby-ex-1-xlbm2 1/1 Running 0 1m c) oc get pod ruby-ex-1-xlbm2 -o yaml |grep -i node appnew: new_node nodeName: host-172-16-122-68
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0024