Description of problem: Ansible playbook fails during the ffu because the undercloud is using ansible 2.9, and ceph-ansible 3.2 contains statements deprecated for that version Version-Release number of selected component (if applicable): ansible-2.9.13-1.el8ae.noarch ceph-ansible-3.2.49-1.el7cp.noarch How reproducible: Steps to Reproduce: 1. follow framework to upgrade till to point 17.2 2. run: openstack overcloud external-upgrade run --stack STACK NAME --tags ceph_systemd -e ceph_ansible_limit=overcloud-controller-0 3. Actual results: fatal: [undercloud]: FAILED! => { │··················· "ceph_ansible_std_out_err": [ │··················· "Using /usr/share/ceph-ansible/ansible.cfg as config file", │··················· "ERROR! 'delegate_to' is not a valid attribute for a TaskInclude", │··················· "", │··················· "The error appears to be in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/main.yml': line 20, column 3, but may", │··················· "be elsewhere in the file depending on the exact syntax problem.", │··················· "The offending line appears to be:", │··················· "- name: include secure_cluster.yml", │··················· " ^ here" │··················· ], Expected results: Additional info:
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
We can fail the deployment during the systemd systemd unit file updates in less than one minute and log the result [1] Then if we grep the results we see the RIGHT playbook get set but then later it gets set to the WRONG playbook [2] Why does that happen? [1] (undercloud) [stack@undercloud ~]$ time openstack overcloud external-upgrade run --stack overcloud --tags ceph_systemd -e ceph_ansible_limit=ctr0 > fail.log sys:1: ResourceWarning: unclosed <socket.socket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.200.2', 43666), raddr=('192.168.200.2', 5000)> sys:1: ResourceWarning: unclosed <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.200.2', 53504), raddr=('192.168.200.2', 8989)> real 0m57.644s user 0m2.097s sys 0m0.728s (undercloud) [stack@undercloud ~]$ [2] (undercloud) [stack@undercloud ~]$ grep ceph_ansible_playbooks_default fail.log TASK [set ceph_ansible_playbooks_default] ************************************** ok: [undercloud] => {"ansible_facts": {"ceph_ansible_playbooks_default": ["/usr/share/ceph-ansible/infrastructure-playbooks/docker-to-podman.yml"]}, "changed": false} TASK [set ceph_ansible_playbooks_default] ************************************** TASK [set ceph_ansible_playbooks_default] ************************************** (undercloud) [stack@undercloud ~]$ egrep "ceph_ansible_playbooks_default|ceph_ansible_playbooks" fail.log TASK [set ceph_ansible_playbooks_default] ************************************** ok: [undercloud] => {"ansible_facts": {"ceph_ansible_playbooks_default": ["/usr/share/ceph-ansible/infrastructure-playbooks/docker-to-podman.yml"]}, "changed": false} TASK [set ceph_ansible_playbooks_default] ************************************** TASK [set ceph_ansible_playbooks_default] ************************************** ok: [undercloud] => {"ansible_facts": {"ceph_ansible_environment_variables": ["ANSIBLE_SSH_RETRIES=6", "DEFAULT_FORKS=100"], "ceph_ansible_playbook_verbosity": 1, "ceph_ansible_playbooks_param": ["/usr/share/ceph-ansible/site-docker.yml.sample"], "ceph_ansible_skip_tags": "package-install,with_pkg"}, "changed": false} ok: [undercloud] => {"ansible_facts": {"ceph_ansible_playbooks": ["/usr/share/ceph-ansible/site-docker.yml.sample"]}, "changed": false} (undercloud) [stack@undercloud ~]$
Created attachment 1718483 [details] fail.log from comment #4
In config-download output file external_deploy_steps_tasks.yaml we see that CephAnsiblePlaybook has its default value /usr/share/ceph-ansible/site-docker.yml.sample so it got set as per the following: https://github.com/openstack/tripleo-heat-templates/blob/094631e0437f1775601ceb49d398427214759f63/deployment/ceph-ansible/ceph-base.yaml#L661-L668 and we see that playbook being run: (undercloud) [stack@undercloud b03ca383-1c38-40b8-9ebd-f68517883164]$ grep "set ceph-ansible facts" external_deploy_steps_tasks.yaml -A 10 - name: set ceph-ansible facts set_fact: blacklisted_hostnames: [] ceph_ansible_extra_vars: container_binary: podman fetch_directory: '{{playbook_dir}}/ceph-ansible/fetch_dir' health_osd_check_delay: 40 health_osd_check_retries: 30 ireallymeanit: 'yes' osd_pool_default_min_size: 2 osd_pool_default_pg_num: 4 -- - name: set ceph-ansible facts set_fact: ceph_ansible_environment_variables: - ANSIBLE_SSH_RETRIES=6 - DEFAULT_FORKS=100 ceph_ansible_playbook_verbosity: 1 ceph_ansible_playbooks_param: - /usr/share/ceph-ansible/site-docker.yml.sample ceph_ansible_skip_tags: package-install,with_pkg - include_role: name: tripleo-ceph-work-dir (undercloud) [stack@undercloud b03ca383-1c38-40b8-9ebd-f68517883164]$ pwd /var/lib/mistral/b03ca383-1c38-40b8-9ebd-f68517883164 (undercloud) [stack@undercloud b03ca383-1c38-40b8-9ebd-f68517883164]$
$ openstack stack environment show overcloud | grep -i ceph | grep -i playbook -A 1 CephAnsiblePlaybook: - /usr/share/ceph-ansible/site-docker.yml.sample
(In reply to John Fulton from comment #6) > In config-download output file external_deploy_steps_tasks.yaml we see that > CephAnsiblePlaybook has its default value > /usr/share/ceph-ansible/site-docker.yml.sample so it got set as per the > following: CephAnsiblePlaybook seems to be the problem; somehow CephAnsiblePlaybook got set manually to site-docker by an environment file; its default value should actually be 'default' [1] > https://github.com/openstack/tripleo-heat-templates/blob/ > 094631e0437f1775601ceb49d398427214759f63/deployment/ceph-ansible/ceph-base. > yaml#L661-L668 the above looks correct: ceph_ansible_playbooks_param is set to the value provided by the user, ceph_ansible_playbooks_default is set to a default list which we define, basing on --tags, then in [2] we pick either _default or _param depending on if the user has actually customized via THT the playbook they want to run I think this would be solved by setting in an environment file "CephAnsiblePlaybook: default" then rerunning the prepare step. I suspect CephAnsiblePlaybook has been set once in the past, then removed from the env files but Heat doesn't reset it back to its default value in that scenario, it keeps the last value which was set for it. 1. https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ceph-ansible/ceph-base.yaml#L75 2. https://github.com/openstack/tripleo-ansible/blob/stable/train/tripleo_ansible/roles/tripleo-ceph-run-ansible/tasks/main.yml#L19
WORKAROUND: 1. Create an environment file foo.yml with the following content: """ parameter_defaults: CephAnsiblePlaybook: default """ 2. Re-run 'openstack overcloud upgrade prepare' but include foo.yaml https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#running-the-overcloud-upgrade-preparation-upgrading-overcloud-standard 3. When you get the the point where you run "openstack overcloud external-upgrade run --stack STACK NAME --tags ceph_systemd ..." it should set up the correct ceph-ansible playook. You can cofirm this by looking at the genereated shell script: (undercloud) [stack@undercloud b03ca383-1c38-40b8-9ebd-f68517883164]$ cat /var/lib/mistral/c9ab0d1c-4127-4687-9d0e-5398c8710019/ceph-ansible/ceph_ansible_command.sh #!/usr/bin/env bash set -e echo "Running $0" >> /var/lib/mistral/c9ab0d1c-4127-4687-9d0e-5398c8710019/ceph-ansible/ceph_ansible_command.log ANSIBLE_ACTION_PLUGINS=/usr/share/ceph-ansible/plugins/actions/ ANSIBLE_CALLBACK_PLUGINS=/usr/share/ceph-ansible/plugins/callback/ ANSIBLE_FILTER_PLUGINS=/usr/share/ceph-ansible/plugins/filter/ ANSIBLE_ROLES_PATH=/usr/share/ceph-ansible/roles/ ANSIBLE_LOG_PATH="/var/lib/mistral/c9ab0d1c-4127-4687-9d0e-5398c8710019/ceph-ansible/ceph_ansible_command.log" ANSIBLE_SSH_CONTROL_PATH_DIR="/tmp/ceph_ansible_control_path" ANSIBLE_LIBRARY=/usr/share/ceph-ansible/library/ ANSIBLE_CONFIG=/usr/share/ceph-ansible/ansible.cfg ANSIBLE_REMOTE_TEMP="/tmp/ceph_ansible_tmp" ANSIBLE_FORKS=25 ANSIBLE_GATHER_TIMEOUT=60 ANSIBLE_CALLBACK_WHITELIST=profile_tasks ANSIBLE_STDOUT_CALLBACK=default ANSIBLE_SSH_RETRIES=6 DEFAULT_FORKS=100 ansible-playbook --private-key /var/lib/mistral/c9ab0d1c-4127-4687-9d0e-5398c8710019/ssh_private_key -e ansible_python_interpreter=/usr/libexec/platform-python -v --skip-tags package-install,with_pkg --extra-vars @/var/lib/mistral/c9ab0d1c-4127-4687-9d0e-5398c8710019/ceph-ansible/extra_vars.yml --limit ctr0 -i /var/lib/mistral/c9ab0d1c-4127-4687-9d0e-5398c8710019/ceph-ansible/inventory.yml /usr/share/ceph-ansible/infrastructure-playbooks/docker-to-podman.yml 2>&1 (undercloud) [stack@undercloud b03ca383-1c38-40b8-9ebd-f68517883164]$
ROOT CAUSE: Someone must have run a stack update with CephAnsiblePlaybook overridden in the past. Even if your Heat env files no longer override this parameter the parameter that was overridden may still be in Heat. This is because TripleO's Heat is such that you can only replace values, not delete them (it's a feature, not a bug if you think about how this could save your deployment if you accidentally forget to -e an env file on update). Proposed solution: Make procedure to upgrade always include setting "CephAnsiblePlaybook: default" with either a docs change or we update a default Heat Env Parameter.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5413