Description of problem: When blacklisting ceph client nodes stack update fails due to malformed ceph-ansible inventory file Version-Release number of selected component (if applicable): 2018-10-10.1 How reproducible: Steps to Reproduce: 1. Deploy 3 controllers, 2 computes, 3 ceph topology 2. Blacklist all compute nodes 3. Try to perform stack update Actual results: TASK [run nodes-uuid] ********************************************************** Saturday 13 October 2018 00:09:02 -0400 (0:00:00.048) 0:08:06.138 ****** fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "ANSIBLE_LOG_PATH=\"/var/lib/mistral/overcloud/ceph-ansible/nodes_uuid_command.log\" ANSIBLE_CONFIG=\"/var/lib/mistral/overcloud/ansible.cfg\" ANSIBLE_REMOTE_TEMP=/tmp/nodes_uuid_tmp ansible-playbook --private-key /var/lib/mistral/overcloud/ssh_private_key -i /var/lib/mistral/overcloud/ceph-ansible/inventory.yml /var/lib/mistral/overcloud/ceph-ansible/nodes_uuid_playbook.yml", "delta": "0:01:58.263777", "end": "2018-10-13 00:11:01.088439", "msg": "non-zero return code", "rc": 4, "start": "2018-10-13 00:09:02.824662", "stderr": " [WARNING]: * Failed to parse /var/lib/mistral/overcloud/ceph-\nansible/inventory.yml with yaml plugin: Invalid \"hosts\" entry for \"clients\"\ngroup, requires a dictionary, found \"<type 'NoneType'>\" instead.\n [WARNING]: * Failed to parse /var/lib/mistral/overcloud/ceph-\nansible/inventory.yml with ini plugin: /var/lib/mistral/overcloud/ceph-\nansible/inventory.yml:5: Expected key=value host variable assignment, got:\ntripleo-admin\n [WARNING]: * Failed to parse /var/lib/mistral/overcloud/ceph-\nansible/inventory.yml with auto plugin: no root 'plugin' key found,\n'/var/lib/mistral/overcloud/ceph-ansible/inventory.yml' is not a valid YAML\ninventory plugin config file\n [WARNING]: Unable to parse /var/lib/mistral/overcloud/ceph-\nansible/inventory.yml as an inventory source\n [WARNING]: No inventory was parsed, only implicit localhost is available", "stderr_lines": [" [WARNING]: * Failed to parse /var/lib/mistral/overcloud/ceph-", "ansible/inventory.yml with yaml plugin: Invalid \"hosts\" entry for \"clients\"", "group, requires a dictionary, found \"<type 'NoneType'>\" instead.", " [WARNING]: * Failed to parse /var/lib/mistral/overcloud/ceph-", "ansible/inventory.yml with ini plugin: /var/lib/mistral/overcloud/ceph-", "ansible/inventory.yml:5: Expected key=value host variable assignment, got:", "tripleo-admin", " [WARNING]: * Failed to parse /var/lib/mistral/overcloud/ceph-", "ansible/inventory.yml with auto plugin: no root 'plugin' key found,", "'/var/lib/mistral/overcloud/ceph-ansible/inventory.yml' is not a valid YAML", "inventory plugin config file", " [WARNING]: Unable to parse /var/lib/mistral/overcloud/ceph-", "ansible/inventory.yml as an inventory source", " [WARNING]: No inventory was parsed, only implicit localhost is available"], "stdout": "\nPLAY [all] *********************************************************************\n\nTASK [set nodes data] **********************************************************\nSaturday 13 October 2018 00:09:04 -0400 (0:00:00.098) 0:00:00.098 ****** \nok: [mgrs:]\nok: [hosts:]\nok: [controller-2:]\n\nTASK [register machine id] *****************************************************\nSaturday 13 October 2018 00:09:04 -0400 (0:00:00.084) 0:00:00.182 ****** \nfatal: [hosts:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"hosts:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}\nfatal: [controller-2:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"controller-2:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}\nfatal: [mgrs:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"mgrs:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}\n\nPLAY RECAP *********************************************************************\ncontroller-2: : ok=1 changed=0 unreachable=1 failed=0 \nhosts: : ok=1 changed=0 unreachable=1 failed=0 \nmgrs: : ok=1 changed=0 unreachable=1 failed=0 \n\nSaturday 13 October 2018 00:11:01 -0400 (0:01:56.496) 0:01:56.678 ****** \n=============================================================================== ", "stdout_lines": ["", "PLAY [all] *********************************************************************", "", "TASK [set nodes data] **********************************************************", "Saturday 13 October 2018 00:09:04 -0400 (0:00:00.098) 0:00:00.098 ****** ", "ok: [mgrs:]", "ok: [hosts:]", "ok: [controller-2:]", "", "TASK [register machine id] *****************************************************", "Saturday 13 October 2018 00:09:04 -0400 (0:00:00.084) 0:00:00.182 ****** ", "fatal: [hosts:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"hosts:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}", "fatal: [controller-2:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"controller-2:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}", "fatal: [mgrs:]: UNREACHABLE! => {\"changed\": false, \"msg\": \"SSH Error: data could not be sent to remote host \\\"mgrs:\\\". Make sure this host can be reached over ssh\", \"unreachable\": true}", "", "PLAY RECAP *********************************************************************", "controller-2: : ok=1 changed=0 unreachable=1 failed=0 ", "hosts: : ok=1 changed=0 unreachable=1 failed=0 ", "mgrs: : ok=1 changed=0 unreachable=1 failed=0 ", "", "Saturday 13 October 2018 00:11:01 -0400 (0:01:56.496) 0:01:56.678 ****** ", "=============================================================================== "]} Expected results: Stack update complets successfully Additional info:
Do I understand correctly this is only an issue when all clients (computes) are blacklisted?
(In reply to Giulio Fidente from comment #2) > Do I understand correctly this is only an issue when all clients (computes) > are blacklisted? Cannot say for sure right now, I will try to run this job with only 1 node blacklisted and see the result.
(In reply to Giulio Fidente from comment #2) > Do I understand correctly this is only an issue when all clients (computes) > are blacklisted? Yes, that's an issue only when all clients are blacklisted
We have a passing job in CI that was used to catch this issue Verified on puddle 2018-11-07.2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045