Bug 1635864
Summary: | Scaling out a splitstack environment with blacklisted nodes fails while running External deployment step 1 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||
Component: | openstack-tripleo-heat-templates | Assignee: | John Fulton <johfulto> | ||||
Status: | CLOSED ERRATA | QA Contact: | Yogev Rabl <yrabl> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 14.0 (Rocky) | CC: | agurenko, aschoen, ceph-eng-bugs, dbecker, gamado, gfidente, gmeno, james.bagwell, johfulto, mariel, mburns, morazi, nthomas, sankarshan | ||||
Target Milestone: | beta | Keywords: | Triaged | ||||
Target Release: | 14.0 (Rocky) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-tripleo-heat-templates-9.0.0-0.20181001174824.90afd18.0rc2.0rc2.0rc2.el7ost | Doc Type: | Bug Fix | ||||
Doc Text: |
Blacklisting configuration updates against Ceph nodes no longer results in failed deployments.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-01-11 11:53:36 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Moving it to Ceph DFG since it looks like it's a problem with ceph-ansible when blacklist is used. (In reply to Marius Cornea from comment #0) > Scaling out a splitstack environment with blacklisted nodes fails while > running External deployment step 1: ... > 2018-10-03 15:07:36,944 p=23844 u=mistral | TASK [generate inventory] > ****************************************************** > 2018-10-03 15:07:36,944 p=23844 u=mistral | Wednesday 03 October 2018 > 15:07:36 -0400 (0:00:00.576) 0:11:32.136 ***** > 2018-10-03 15:07:38,740 p=23844 u=mistral | fatal: [undercloud]: FAILED! => > {"msg": "The task includes an option with an undefined variable. The error > was: 'dict object' has no attribute 'ansible_hostname'\n\nThe error appears > to have been in > '/var/lib/mistral/overcloud/external_deploy_steps_tasks.yaml': line 15, > column 5, but may\nbe elsewhere in the file depending on the exact syntax > problem.\n\nThe offending line appears to be:\n\n - > '{{playbook_dir}}/ceph-ansible/fetch_dir'\n - copy:\n ^ here\n"} Looks like this embedded ansible in tripleo heat templates: https://github.com/openstack/tripleo-heat-templates/blob/stable/rocky/docker/services/ceph-ansible/ceph-base.yaml#L379-L399 needs to not access hostvars.raw_get(host)['ansible_hostname'] unless that variable is set. John PS: the generated ansible was: - copy: content: "{%- set ceph_groups = ['mgr', 'mon', 'osd', 'mds', 'rgw', 'nfs', 'rbdmirror',\ \ 'client'] -%}\n{%- for ceph_group in ceph_groups -%}\n{%- if 'ceph_' ~ ceph_group\ \ in groups %}\n\n{{ ceph_group ~ 's:' }}\n hosts:\n {% for host in groups['ceph_'\ \ ~ ceph_group] -%}\n {%- if hostvars.raw_get(host)['ansible_hostname']\ \ not in blacklisted_hostnames -%}\n {{ hostvars.raw_get(host)['ansible_hostname']\ \ }}:\n ansible_user: {{ hostvars.raw_get(host)['ansible_ssh_user'] |\ \ default('root') }}\n ansible_host: {{ hostvars.raw_get(host)['ansible_host']\ \ | default(host) }}\n ansible_become: true\n {% endif -%}\n {%-\ \ endfor -%}\n\n{%- endif -%}\n{%- endfor %}\n" dest: '{{playbook_dir}}/ceph-ansible/inventory.yml' name: generate inventory Verified at core_puddle: 2018-11-07.2 Covered by: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DF%20Current%20release/job/DFG-df-splitstack-14-virsh-3cont_2comp_3ceph-blacklist-2computes-scaleup/4/consoleFull So what was the actual fix? Im encountering this issue with rpm version openstack-tripleo-heat-templates-9.0.1-0.20181013060858.ffbe879.el7.noarch still (In reply to Jim Bagwell from comment #13) > So what was the actual fix? Im encountering this issue with rpm version > openstack-tripleo-heat-templates-9.0.1-0.20181013060858.ffbe879.el7.noarch > still This bug produced this fix https://review.openstack.org/#/c/609682 but there was also a follow up bug and fix regarding blacklisting which might explain what you're encountering, even though you have the first fix. https://github.com/openstack/tripleo-heat-templates/commit/6c17d0f1c3c8c28c401dae56d88c6dd4075bf046#diff-f84644f1b5951f535bc7c3f4151a9a90 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045 |
Created attachment 1490294 [details] logs.tar.gz Description of problem: Scaling out a splitstack environment with blacklisted nodes fails while running External deployment step 1: [root@undercloud-0 stack]# tail -30 /var/lib/mistral/overcloud/ansible.log 2018-10-03 15:07:29,090 p=23844 u=mistral | skipping: [controller-1] => {"changed": false, "skip_reason": "Conditional result was False"} 2018-10-03 15:07:29,122 p=23844 u=mistral | skipping: [controller-0] => {"changed": false, "skip_reason": "Conditional result was False"} 2018-10-03 15:07:29,217 p=23844 u=mistral | skipping: [ceph-1] => {"changed": false, "skip_reason": "Conditional result was False"} 2018-10-03 15:07:29,263 p=23844 u=mistral | skipping: [ceph-0] => {"changed": false, "skip_reason": "Conditional result was False"} 2018-10-03 15:07:29,282 p=23844 u=mistral | skipping: [ceph-2] => {"changed": false, "skip_reason": "Conditional result was False"} 2018-10-03 15:07:36,258 p=23844 u=mistral | changed: [compute-2] => {"changed": true, "cmd": ["ntpdate", "-u", "clock.redhat.com"], "delta": "0:00:06.882513", "end": "2018-10-03 15:07:36.235121", "rc": 0, "start": "2018-10-03 15:07:29.352608", "stderr": "", "stderr_lines": [], "stdout": " 3 Oct 15:07:36 ntpdate[17583]: adjust time server 10.11.160.238 offset -0.003281 sec", "stdout_lines": [" 3 Oct 15:07:36 ntpdate[17583]: adjust time server 10.11.160.238 offset -0.003281 sec"]} 2018-10-03 15:07:36,269 p=23844 u=mistral | PLAY [External deployment step 1] ********************************************** 2018-10-03 15:07:36,294 p=23844 u=mistral | TASK [set blacklisted_hostnames] *********************************************** 2018-10-03 15:07:36,294 p=23844 u=mistral | Wednesday 03 October 2018 15:07:36 -0400 (0:00:07.326) 0:11:31.487 ***** 2018-10-03 15:07:36,346 p=23844 u=mistral | ok: [undercloud] => {"ansible_facts": {"blacklisted_hostnames": ["compute-0", "compute-1"]}, "changed": false} 2018-10-03 15:07:36,367 p=23844 u=mistral | TASK [create ceph-ansible temp dirs] ******************************************* 2018-10-03 15:07:36,368 p=23844 u=mistral | Wednesday 03 October 2018 15:07:36 -0400 (0:00:00.073) 0:11:31.560 ***** 2018-10-03 15:07:36,582 p=23844 u=mistral | ok: [undercloud] => (item=/var/lib/mistral/overcloud/ceph-ansible/group_vars) => {"changed": false, "gid": 42430, "group": "mistral", "item": "/var/lib/mistral/overcloud/ceph-ansible/group_vars", "mode": "0755", "owner": "mistral", "path": "/var/lib/mistral/overcloud/ceph-ansible/group_vars", "size": 88, "state": "directory", "uid": 42430} 2018-10-03 15:07:36,758 p=23844 u=mistral | ok: [undercloud] => (item=/var/lib/mistral/overcloud/ceph-ansible/host_vars) => {"changed": false, "gid": 42430, "group": "mistral", "item": "/var/lib/mistral/overcloud/ceph-ansible/host_vars", "mode": "0755", "owner": "mistral", "path": "/var/lib/mistral/overcloud/ceph-ansible/host_vars", "size": 174, "state": "directory", "uid": 42430} 2018-10-03 15:07:36,924 p=23844 u=mistral | ok: [undercloud] => (item=/var/lib/mistral/overcloud/ceph-ansible/fetch_dir) => {"changed": false, "gid": 42430, "group": "mistral", "item": "/var/lib/mistral/overcloud/ceph-ansible/fetch_dir", "mode": "0755", "owner": "mistral", "path": "/var/lib/mistral/overcloud/ceph-ansible/fetch_dir", "size": 80, "state": "directory", "uid": 42430} 2018-10-03 15:07:36,944 p=23844 u=mistral | TASK [generate inventory] ****************************************************** 2018-10-03 15:07:36,944 p=23844 u=mistral | Wednesday 03 October 2018 15:07:36 -0400 (0:00:00.576) 0:11:32.136 ***** 2018-10-03 15:07:38,740 p=23844 u=mistral | fatal: [undercloud]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'ansible_hostname'\n\nThe error appears to have been in '/var/lib/mistral/overcloud/external_deploy_steps_tasks.yaml': line 15, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n - '{{playbook_dir}}/ceph-ansible/fetch_dir'\n - copy:\n ^ here\n"} 2018-10-03 15:07:38,740 p=23844 u=mistral | NO MORE HOSTS LEFT ************************************************************* 2018-10-03 15:07:38,741 p=23844 u=mistral | PLAY RECAP ********************************************************************* 2018-10-03 15:07:38,741 p=23844 u=mistral | ceph-0 : ok=124 changed=29 unreachable=0 failed=0 2018-10-03 15:07:38,742 p=23844 u=mistral | ceph-1 : ok=124 changed=29 unreachable=0 failed=0 2018-10-03 15:07:38,742 p=23844 u=mistral | ceph-2 : ok=124 changed=29 unreachable=0 failed=0 2018-10-03 15:07:38,742 p=23844 u=mistral | compute-2 : ok=136 changed=59 unreachable=0 failed=0 2018-10-03 15:07:38,742 p=23844 u=mistral | controller-0 : ok=190 changed=33 unreachable=0 failed=0 2018-10-03 15:07:38,742 p=23844 u=mistral | controller-1 : ok=190 changed=33 unreachable=0 failed=0 2018-10-03 15:07:38,742 p=23844 u=mistral | controller-2 : ok=190 changed=33 unreachable=0 failed=0 2018-10-03 15:07:38,743 p=23844 u=mistral | undercloud : ok=4 changed=0 unreachable=0 failed=1 2018-10-03 15:07:38,743 p=23844 u=mistral | Wednesday 03 October 2018 15:07:38 -0400 (0:00:01.798) 0:11:33.935 ***** 2018-10-03 15:07:38,743 p=23844 u=mistral | =============================================================================== Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-9.0.0-0.20180919080941.0rc1.0rc1.el7ost.noarch openstack-tripleo-common-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy splitstack environment with 3 controller + 2 computes + 3 ceph nodes 2. create blacklist.yaml: parameter_defaults: DeploymentServerBlacklist: - compute-0 - compute-1 3. Set ComputeDeployedServerCount: 3 4. Run overcloud deploy command with blacklist.yaml: openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --overcloud-ssh-user stack \ --disable-validation \ -r /home/stack/composable_roles/roles/roles_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-bootstrap-environment-rhel.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-pacemaker-environment.yaml \ -e /home/stack/composable_roles/network-config.yaml \ -e /home/stack/composable_roles/ctrlplane-template.yml \ -e /home/stack/composable_roles/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/composable_roles/roles-port-config.yml \ -e /home/stack/composable_roles/network/network-environment.yaml \ -e /home/stack/composable_roles/enable-tls.yaml \ -e /home/stack/composable_roles/inject-trust-anchor.yaml \ -e /home/stack/composable_roles/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/composable_roles/debug.yaml \ -e /home/stack/blacklist.yaml \ -e /home/stack/composable_roles/docker-images.yaml \ --log-file overcloud_deployment_64.log Actual results: Ansible run fails: 2018-10-03 15:07:36,367 p=23844 u=mistral | TASK [create ceph-ansible temp dirs] ******************************************* 2018-10-03 15:07:36,368 p=23844 u=mistral | Wednesday 03 October 2018 15:07:36 -0400 (0:00:00.073) 0:11:31.560 ***** 2018-10-03 15:07:36,582 p=23844 u=mistral | ok: [undercloud] => (item=/var/lib/mistral/overcloud/ceph-ansible/group_vars) => {"changed": false, "gid": 42430, "group": "mistral", "item": "/var/lib/mistral/overcloud/ceph-ansible/group_vars", "mode": "0755", "owner": "mistral", "path": "/var/lib/mistral/overcloud/ceph-ansible/group_vars", "size": 88, "state": "directory", "uid": 42430} 2018-10-03 15:07:36,758 p=23844 u=mistral | ok: [undercloud] => (item=/var/lib/mistral/overcloud/ceph-ansible/host_vars) => {"changed": false, "gid": 42430, "group": "mistral", "item": "/var/lib/mistral/overcloud/ceph-ansible/host_vars", "mode": "0755", "owner": "mistral", "path": "/var/lib/mistral/overcloud/ceph-ansible/host_vars", "size": 174, "state": "directory", "uid": 42430} 2018-10-03 15:07:36,924 p=23844 u=mistral | ok: [undercloud] => (item=/var/lib/mistral/overcloud/ceph-ansible/fetch_dir) => {"changed": false, "gid": 42430, "group": "mistral", "item": "/var/lib/mistral/overcloud/ceph-ansible/fetch_dir", "mode": "0755", "owner": "mistral", "path": "/var/lib/mistral/overcloud/ceph-ansible/fetch_dir", "size": 80, "state": "directory", "uid": 42430} 2018-10-03 15:07:36,944 p=23844 u=mistral | TASK [generate inventory] ****************************************************** 2018-10-03 15:07:36,944 p=23844 u=mistral | Wednesday 03 October 2018 15:07:36 -0400 (0:00:00.576) 0:11:32.136 ***** 2018-10-03 15:07:38,740 p=23844 u=mistral | fatal: [undercloud]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'ansible_hostname'\n\nThe error appears to have been in '/var/lib/mistral/overcloud/external_deploy_steps_tasks.yaml': line 15, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n - '{{playbook_dir}}/ceph-ansible/fetch_dir'\n - copy:\n ^ here\n"} Expected results: No failure Additional info: Attaching /var/lib/mistral and home dir with templates.