Bug 1639038

Summary: When all members of a Ceph group are blacklisted stack update fails due to malformed ceph-ansible inventory
Product: Red Hat OpenStack Reporter: Gurenko Alex <agurenko>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Gurenko Alex <agurenko>
Severity: high Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: dbecker, gfidente, johfulto, m.andre, mburns, mcornea, morazi, scohen
Target Milestone: betaKeywords: Triaged
Target Release: 14.0 (Rocky)Flags: agurenko: automate_bug+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-9.0.1-0.20181013060867.ffbe879.el7ost Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-11 11:53:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gurenko Alex 2018-10-14 16:43:42 UTC
Description of problem: When blacklisting ceph client nodes stack update fails due to malformed ceph-ansible inventory file


Version-Release number of selected component (if applicable): 2018-10-10.1


How reproducible:


Steps to Reproduce:
1. Deploy 3 controllers, 2 computes, 3 ceph topology
2. Blacklist all compute nodes
3. Try to perform stack update

Actual results:

TASK [run nodes-uuid] **********************************************************
Saturday 13 October 2018  00:09:02 -0400 (0:00:00.048)       0:08:06.138 ****** 
fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "ANSIBLE_LOG_PATH=\"/var/lib/mistral/overcloud/ceph-ansible/nodes_uuid_command.log\" ANSIBLE_CONFIG=\"/var/lib/mistral/overcloud/ansible.cfg\" ANSIBLE_REMOTE_TEMP=/tmp/nodes_uuid_tmp ansible-playbook --private-key /var/lib/mistral/overcloud/ssh_private_key -i /var/lib/mistral/overcloud/ceph-ansible/inventory.yml /var/lib/mistral/overcloud/ceph-ansible/nodes_uuid_playbook.yml", "delta": "0:01:58.263777", "end": "2018-10-13 00:11:01.088439", "msg": "non-zero return code", "rc": 4, "start": "2018-10-13 00:09:02.824662", "stderr": " [WARNING]:  * Failed to parse /var/lib/mistral/overcloud/ceph-\nansible/inventory.yml with yaml plugin: Invalid \"hosts\" entry for \"clients\"\ngroup, requires a dictionary, found \"<type 'NoneType'>\" instead.\n [WARNING]:  * Failed to parse /var/lib/mistral/overcloud/ceph-\nansible/inventory.yml with ini plugin: /var/lib/mistral/overcloud/ceph-\nansible/inventory.yml:5: Expected key=value host variable assignment, got:\ntripleo-admin\n [WARNING]:  * Failed to parse /var/lib/mistral/overcloud/ceph-\nansible/inventory.yml with auto plugin: no root 'plugin' key found,\n'/var/lib/mistral/overcloud/ceph-ansible/inventory.yml' is not a valid YAML\ninventory plugin config file\n [WARNING]: Unable to parse /var/lib/mistral/overcloud/ceph-\nansible/inventory.yml as an inventory source\n [WARNING]: No inventory was parsed, only implicit localhost is available", "stderr_lines": [" [WARNING]:  * Failed to parse /var/lib/mistral/overcloud/ceph-", "ansible/inventory.yml with yaml plugin: Invalid \"hosts\" entry for \"clients\"", "group, requires a dictionary, found \"<type 'NoneType'>\" instead.", " [WARNING]:  * Failed to parse /var/lib/mistral/overcloud/ceph-", "ansible/inventory.yml with ini plugin: /var/lib/mistral/overcloud/ceph-", "ansible/inventory.yml:5: Expected key=value host variable assignment, got:", "tripleo-admin", " [WARNING]:  * Failed to parse /var/lib/mistral/overcloud/ceph-", "ansible/inventory.yml with auto plugin: no root 'plugin' key found,", "'/var/lib/mistral/overcloud/ceph-ansible/inventory.yml' is not a valid YAML", "inventory plugin config file", " [WARNING]: Unable to parse /var/lib/mistral/overcloud/ceph-", "ansible/inventory.yml as an inventory source", " [WARNING]: No inventory was parsed, only implicit localhost is available"], "stdout": "\nPLAY [all] *********************************************************************\n\nTASK [set nodes data] **********************************************************\nSaturday 13 October 2018  00:09:04 -0400 (0:00:00.098)       0:00:00.098 ****** \nok: [mgrs:]\nok: [hosts:]\nok: [controller-2:]\n\nTASK [register machine id] *****************************************************\nSaturday 13 October 2018  00:09:04 -0400 (0:00:00.084)       0:00:00.182 ****** \nfatal: [hosts:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"hosts:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}\nfatal: [controller-2:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"controller-2:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}\nfatal: [mgrs:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"mgrs:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}\n\nPLAY RECAP *********************************************************************\ncontroller-2:              : ok=1    changed=0    unreachable=1    failed=0   \nhosts:                     : ok=1    changed=0    unreachable=1    failed=0   \nmgrs:                      : ok=1    changed=0    unreachable=1    failed=0   \n\nSaturday 13 October 2018  00:11:01 -0400 (0:01:56.496)       0:01:56.678 ****** \n=============================================================================== ", "stdout_lines": ["", "PLAY [all] *********************************************************************", "", "TASK [set nodes data] **********************************************************", "Saturday 13 October 2018  00:09:04 -0400 (0:00:00.098)       0:00:00.098 ****** ", "ok: [mgrs:]", "ok: [hosts:]", "ok: [controller-2:]", "", "TASK [register machine id] *****************************************************", "Saturday 13 October 2018  00:09:04 -0400 (0:00:00.084)       0:00:00.182 ****** ", "fatal: [hosts:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"hosts:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}", "fatal: [controller-2:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"controller-2:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}", "fatal: [mgrs:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"mgrs:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}", "", "PLAY RECAP *********************************************************************", "controller-2:              : ok=1    changed=0    unreachable=1    failed=0   ", "hosts:                     : ok=1    changed=0    unreachable=1    failed=0   ", "mgrs:                      : ok=1    changed=0    unreachable=1    failed=0   ", "", "Saturday 13 October 2018  00:11:01 -0400 (0:01:56.496)       0:01:56.678 ****** ", "=============================================================================== "]}


Expected results:

Stack update complets successfully

Additional info:

Comment 2 Giulio Fidente 2018-10-15 09:43:29 UTC
Do I understand correctly this is only an issue when all clients (computes) are blacklisted?

Comment 3 Gurenko Alex 2018-10-15 10:52:43 UTC
(In reply to Giulio Fidente from comment #2)
> Do I understand correctly this is only an issue when all clients (computes)
> are blacklisted?

Cannot say for sure right now, I will try to run this job with only 1 node blacklisted and see the result.

Comment 4 Marius Cornea 2018-10-15 12:57:08 UTC
(In reply to Giulio Fidente from comment #2)
> Do I understand correctly this is only an issue when all clients (computes)
> are blacklisted?

Yes, that's an issue only when all clients are blacklisted

Comment 19 Gurenko Alex 2018-11-12 09:17:18 UTC
We have a passing job in CI that was used to catch this issue

Verified on puddle 2018-11-07.2

Comment 24 errata-xmlrpc 2019-01-11 11:53:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045