Bug 1608279 - uninstall playbook fails after unfinished provision
Summary: uninstall playbook fails after unfinished provision
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.10.z
Assignee: Tzu-Mainn Chen
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-25 09:07 UTC by Philip Sweany
Modified: 2019-02-20 10:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-20 10:11:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0328 0 None None None 2019-02-20 10:11:17 UTC

Description Philip Sweany 2018-07-25 09:07:43 UTC
Description of problem:
OpenShift on OpenStack uninstaller fails because it is unable to contact nodes that were not created during a filed install.  It is expected that the uninstaller playbook can clean up a stack even when nodes are missing or unreachable.


Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.10.23-1.git.0.a9c7e7d.el7.noarch

rpm -q ansible
ansible-2.4.6.0-1.el7ae.noarch

ansible --version
ansible 2.4.6.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

How reproducible:
Always

Steps to Reproduce:
1. Deploy OCP 3.10 on OSP 13 and abort the installation early for any reason:

(shiftstack) [cloud-user@bastion ~]$ ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/provision.yml

--- output omitted ---

2. Run the uninstall playbook with an intent to clean up (to be able to provision again from a clean start):

(shiftstack) [cloud-user@bastion ~]$ ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/uninstall.yml

Actual results:

PLAY [Unsubscribe RHEL instances] *************************************************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************************************
fatal: [infra-node-0.openshift.example.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.122.58 port 22: No route to host\r\n", "unreachable": true}
fatal: [app-node-0.openshift.example.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.122.62 port 22: No route to host\r\n", "unreachable": true}
fatal: [infra-node-1.openshift.example.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.122.54 port 22: No route to host\r\n", "unreachable": true}
ok: [master-0.openshift.example.com]
ok: [app-node-1.openshift.example.com]

TASK [rhel_unsubscribe : Remove RedHat subscriptions] *****************************************************************************************************************************************
skipping: [master-0.openshift.example.com]
skipping: [app-node-1.openshift.example.com]

PLAY [Clean DNS entries] **********************************************************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************************************
ok: [localhost]

TASK [openshift_openstack : Generate DNS records] *********************************************************************************************************************************************
included: /usr/share/ansible/openshift-ansible/roles/openshift_openstack/tasks/generate-dns.yml for localhost

TASK [openshift_openstack : Generate list of private A records] *******************************************************************************************************************************
ok: [localhost] => (item=master-0.openshift.example.com)
fatal: [localhost]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'ansible_hostname'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_openstack/tasks/generate-dns.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: \"Generate list of private A records\"\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'ansible_hostname'"}
 [WARNING]: Could not create retry file '/usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/uninstall.retry'.         [Errno 13] Permission denied:
u'/usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/uninstall.retry'


PLAY RECAP ************************************************************************************************************************************************************************************
app-node-0.openshift.example.com : ok=0    changed=0    unreachable=1    failed=0
app-node-1.openshift.example.com : ok=1    changed=0    unreachable=0    failed=0
infra-node-0.openshift.example.com : ok=0    changed=0    unreachable=1    failed=0
infra-node-1.openshift.example.com : ok=0    changed=0    unreachable=1    failed=0
localhost                  : ok=15   changed=0    unreachable=0    failed=1   
master-0.openshift.example.com : ok=1    changed=0    unreachable=0    failed=0

Comment 1 Tzu-Mainn Chen 2018-07-26 14:50:28 UTC
https://github.com/openshift/openshift-ansible/pull/9343 should fix this issue. A workaround would be to simply run 'openstack stack delete openshift-cluster'; this should work if the issue is caused by an incomplete heat stack run.

Comment 2 Tzu-Mainn Chen 2018-08-02 18:52:10 UTC
3.10 backport merged: https://github.com/openshift/openshift-ansible/pull/9362

Comment 3 Scott Dodson 2018-08-14 21:40:20 UTC
Should be in openshift-ansible-3.10.28-1

Comment 4 weiwei jiang 2019-01-28 02:15:33 UTC
Verified on 
# rpm -qa|grep -i openshift 
openshift-ansible-docs-3.10.106-1.git.0.217c6b9.el7.noarch
openshift-ansible-roles-3.10.106-1.git.0.217c6b9.el7.noarch
atomic-openshift-clients-3.10.106-1.git.0.84f1efc.el7.x86_64
openshift-ansible-playbooks-3.10.106-1.git.0.217c6b9.el7.noarch
openshift-ansible-3.10.106-1.git.0.217c6b9.el7.noarch

Comment 6 errata-xmlrpc 2019-02-20 10:11:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0328


Note You need to log in before you can comment on or make changes to this bug.