Description of problem: When the openstack instance internal name is different from the openshift_public_hostname. the upgrade failed at task 'Determine if node is currently scheduleable'. The upgrade playbook use the Openstack internal name, (You can find name via 'curl 169.254.169.254/2009-04-04/meta-data//hostname'). while openshift is using the openshift_public_hostname. TASK [Determine if node is currently scheduleable] ***************************** fatal: [openshift-228.example.com -> openshift-228.example.com]: FAILED! => { "changed": false, "cmd": [ "/usr/local/bin/oc", "get", "node", "qe-11329master-1", "-o", "json" ], "delta": "0:00:00.410114", "end": "2017-02-06 02:24:57.030521", "failed": true, "rc": 1, "start": "2017-02-06 02:24:56.620407", "warnings": [] } STDERR: Error from server (NotFound): nodes "qe-11329master-1" not found NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade.retry Version-Release number of selected component (if applicable): atomic-openshift-utils-3.5.3-1.git.0.80c2436.el7.noarch How reproducible: always Steps to Reproduce: 1. Launch instances with name 'qe-11329master-1','qe-11329etcd-1' and 'qe-11329node-registry-router-1'. 2. Reset instance name to different names openshift-202.example.com, openshift-202.example.com,openshift-228.example.com (workaround BZ#1367201) 3. Install Openshift v3.4 with enabling cloudprovider and specify the openshift_public_hostname 4. Upgrade Openshift to V3.5 TASK [Determine if node is currently scheduleable] ***************************** fatal: [openshift-228.example.com -> openshift-228.example.com]: FAILED! => { "changed": false, "cmd": [ "/usr/local/bin/oc", "get", "node", "qe-11329master-1", "-o", "json" ], "delta": "0:00:00.410114", "end": "2017-02-06 02:24:57.030521", "failed": true, "rc": 1, "start": "2017-02-06 02:24:56.620407", "warnings": [] } STDERR: Error from server (NotFound): nodes "qe-11329master-1" not found NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade.retry PLAY RECAP ********************************************************************* localhost : ok=35 changed=0 unreachable=0 failed=0 openshift-202.example.com : ok=109 changed=6 unreachable=0 failed=0 openshift-211.example.com : ok=63 changed=2 unreachable=0 failed=0 openshift-228.example.com : ok=171 changed=15 unreachable=0 failed=1 Expected results: 1. The upgrade playbook should use same name with Openshift when Determine if node is currently scheduleable
Instead of querying the metadata directly, I would expect that this should work with using openshift.common.hostname or openshift.node.nodename instead.
Looking at it now
This will block testing on openstack when the cloudprovide are using.
Currently, the affected piece of play is in the following two files: - playbooks/common/openshift-cluster/upgrades/upgrade_nodes.yml - playbooks/common/openshift-cluster/upgrades/upgrade_control_plane.yml as: ```yaml - name: Mark node unschedulable oadm_manage_node: node: "{{ openshift.node.nodename | lower }}" schedulable: False delegate_to: "{{ groups.oo_first_master.0 }}" retries: 10 delay: 5 register: node_unschedulable until: node_unschedulable|succeeded ``` The ``openshift.node.nodename`` is used throughout the files on multiple places.
https://github.com/openshift/openshift-ansible/issues/3455
The problem here is the http://169.254.169.254/openstack/latest/meta_data.json no longer provides a valid hostname for a VM once the hostname is changed. I would suggest to change openshift.node.nodename to openshift.common.hostname once the "Set hostname" task in roles/openshift_common/tasks/main.yml is run in a case openshift[_public]_hostname is set. Not sure which variable of openshift[_public]_hostname is used to set openshift.common.hostname but that should not be hard to determine from the ansible.
I found if we modified the hostname and enabled clouder provide and then run upgrade playbook. The upgrade playbook pass without error. so remove testblock
In short, what is happening here: --- When a VM is provisioned in OpenStack, it is given a name. At the same time VM's hostname is set according to the name (e.g. hostname-test is translated into hostname-test.localdomain). VM can access both name and hostname via openstack's metadata at http://169.254.169.254/openstack/latest/meta_data.json. When VM's name is changed, the hostname is not affected. The new name is updated in OpenStack's metada and available through the same link. When VM's hostname is changed inside the VM (e.g. by executing ``hostnamectl set-hostname``), the hostname is changed but VM's metadata are not affected. When the VM is restarted, hostname is reset back to its original name (set in VM's metadata). The reset is done by cloud-init. In order to disable the hostname reset during each reboot, one has to update cloud-init's configuration. E.g. drop a file under /etc/cloud/cloud.cfg.d directory: # cat /etc/cloud/cloud.cfg.d/99_hostname.cfg preserve_hostname: true Then, the hostname is preserved during reboots. Still, hostname in VM's metada is not affected and has the same value since the time it was created. --- > I found if we modified the hostname and enabled clouder provide and then run upgrade playbook. The upgrade playbook pass without error. so remove testblock Given that I am decreasing the severity to low. Anping, can you elaborate more on what you did? E.g. provide updated inventory file with steps how to proceed?