Designate job [1] failing in Overcloud install stage with a ansible error: 2022-07-17 10:00:11.411772 | 5254006f-799c-d818-043e-00000000b7ee | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 2022-07-17 10:00:13.869057 | | WARNING | ERROR: Can't run container designate_pool_manage stderr: time="2022-07-17T10:00:12Z" level=warning msg=" binary not found, container dns will not be enabled" + sudo -E kolla_set_configs sudo: unable to send audit message: Operation not permitted INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Copying service configuration files INFO:__main__:Creating directory /etc/designate/private INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/designate/private/bind1.conf to /etc/designate/private/bind1.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/designate/private/bind2.conf to /etc/designate/private/bind2.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/designate/private/bind3.conf to /etc/designate/private/bind3.conf INFO:__main__:Deleting /etc/designate/designate.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/designate/designate.conf to /etc/designate/designate.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/designate/pools.yaml to /etc/designate/pools.yaml INFO:__main__:Creating directory /etc/my.cnf.d INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/rndc.key to /etc/rndc.key INFO:__main__:Writing out command to execute INFO:__main__:Setting permission for /var/log/designate INFO:__main__:Setting permission for /var/log/designate/designate-api.log INFO:__main__:Setting permission for /var/log/designate/producer.log INFO:__main__:Setting permission for /var/log/designate/central.log INFO:__main__:Setting permission for /var/log/designate/mdns.log INFO:__main__:Setting permission for /var/log/designate/worker.log ++ cat /run_command + CMD='/usr/bin/bootstrap_host_exec designate_central su designate -s /bin/bash -c '\''/bin/designate-manage pool update'\''' + ARGS= + [[ ! -n '' ]] + . kolla_extend_start + echo 'Running command: '\''/usr/bin/bootstrap_host_exec designate_central su designate -s /bin/bash -c '\''/bin/designate-manage pool update'\'''\''' + exec /usr/bin/bootstrap_host_exec designate_central su designate -s /bin/bash -c ''\''/bin/designate-manage' pool 'update'\''' 2022-07-17 10:00:13.872874 | 5254006f-799c-d818-043e-00000000b7ee | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | controller-0 | error={"changed": false, "msg": "Failed containers: designate_pool_manage"} 2022-07-17 10:00:13.875127 | 5254006f-799c-d818-043e-00000000b7ee | TIMING | tripleo_container_manage : Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | controller-0 | 0:21:44.970046 | 2.46s PLAY RECAP ********************************************************************* compute-0 : ok=439 changed=183 unreachable=0 failed=0 skipped=233 rescued=0 ignored=0 compute-1 : ok=432 changed=183 unreachable=0 failed=0 skipped=236 rescued=0 ignored=0 controller-0 : ok=539 changed=257 unreachable=0 failed=1 skipped=260 rescued=0 ignored=0 controller-1 : ok=538 changed=252 unreachable=0 failed=0 skipped=267 rescued=0 ignored=0 controller-2 : ok=537 changed=252 unreachable=0 failed=0 skipped=267 rescued=0 ignored=0 localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 undercloud : ok=97 changed=39 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0 Logs: undercloud-0/home/stack/overcloud_install.log.gz Attached to BZ. undercloud-0/home/stack/overcloud-deploy/overcloud/config-download/ansible-playbook-command.sh.gz Attached to BZ. undercloud-0/home/stack/overcloud-deploy/overcloud/config-download/stdout.gz Attached to BZ.
I'm reassigning to default. The problem is a mismatch between hostnames and node names. See http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-openstack-designate-17.0_director-rhel-virthost-3cont_2comp-ipv4-geneve/95/controller-0/etc/hostname.gz - controller 0 thinks it's controller 1 and vice versa. The bootstrap logic is running properly on controller-1, which has a hostname of controller-0. I can't really explain why the that pool manage container is the only one that breaks the deployment as the db_sync container also isn't run in the right place.
Finding so far on 2107914 Some notes on my investigations so far: - Examination of some of the other runs of the job posted in this bug show that there is at least one instance where the names of the log hosts match the hostnames of logs. This suggests that that unmatched hosts/ansible hosts might be unrelated. While still disconcerting, I am going to rate this a tepid lead for now and focus on other possibilities. - To rule out a blatant coding error, I recreated a 3 controller/1 compute environment using upstream master and all configuration files were as expected. It is useful in that examining the differences in execution environments might yield clues. - The three examples that I've looked at where the error occurs shows that the information that would be for controller-0 is instead getting the results for controller-2. This would normally suggest a fence post style error but this code largely uses iterator style addressing of collections, not indexing. Some possibilities I'm examining: - It may be that using set_fact and host_vars for dereferencing cannot be relied on "as is" when gathering data and referencing in multiple plays. There might be something additional required to make it more reliable. Note that up to now, we've actually been using this same mechanism but there have been some differences in what we reference (ie. the contents of groups.designate_bind instead of designate_bind_node_ips). - Ansible version differences and execution strategies might be the source of some issue. - I want to find out if the actual list of groups.designate_bind is valid where it is referenced - it's just a little odd that some of the data tends to be correct.
I've been able to reproduce this in a downstream development environment and have pin-pointed where the error was introduced. I'm trying out some alternate implementations to get the desired result.
*** Bug 2107913 has been marked as a duplicate of this bug. ***
Definitely looks like an ansible issue: - hosts: localhost vars: some_data: [ 'hey', 'ho', 'lets go'] tasks: - name: try this set_fact: well_this: "{{ item.1 }}" delegate_to: "{{ item.0 }}" delegate_facts: true loop: "{{ groups.designate_bind|zip(some_data)|list }}" - debug: msg: "{{ hostvars[item].well_this }}" loop: "{{ groups.designate_bind }}" (undercloud) [stack@undercloud-0 ~]$ ansible-playbook -i config-download/overcloud/tripleo-ansible-inventory.yaml hunh.yaml PLAY [localhost] ****************************************************************************************************************************** TASK [Gathering Facts] ************************************************************************************************************************ ok: [localhost] TASK [try this] ******************************************************************************************************************************* ok: [localhost -> controller-0(192.168.24.50)] => (item=['controller-0', 'hey']) ok: [localhost -> controller-1(192.168.24.35)] => (item=['controller-1', 'ho']) ok: [localhost -> controller-2(192.168.24.48)] => (item=['controller-2', 'lets go']) TASK [debug] ********************************************************************************************************************************** ok: [localhost] => (item=controller-0) => { "msg": "lets go" } ok: [localhost] => (item=controller-1) => { "msg": "ho" } ok: [localhost] => (item=controller-2) => { "msg": "lets go" } PLAY RECAP ************************************************************************************************************************************ localhost : ok=3 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
It seems to be a bug in the precise version of Ansible you're using: For instance, on my fc-35: ╭─cjeanner@marvin ~/tmp ╰─$ ansible --version ansible [core 2.11.12] config file = /etc/ansible/ansible.cfg configured module search path = ['/home/cjeanner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /home/cjeanner/.local/lib/python3.10/site-packages/ansible ansible collection location = /home/cjeanner/.ansible/collections:/usr/share/ansible/collections executable location = /home/cjeanner/.local/bin/ansible python version = 3.10.5 (main, Jun 9 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)] jinja version = 3.0.1 libyaml = True TASK [try this] ******************************************************************************************************************************************************************************************************************************* ok: [localhost -> builder1] => (item=['builder1', 'hey']) ok: [localhost -> builder2] => (item=['builder2', 'ho']) ok: [localhost -> builder3] => (item=['builder3', 'lets go']) TASK [debug] ********************************************************************************************************************************************************************************************************************************** ok: [localhost] => (item=builder1) => { "msg": "hey" } ok: [localhost] => (item=builder2) => { "msg": "ho" } ok: [localhost] => (item=builder3) => { "msg": "lets go" } But running the exact same code in a cs9 container, with a tweaked venv using that 2.12.2: ansible [core 2.12.2] config file = None configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /venv/lib64/python3.9/site-packages/ansible ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections executable location = /venv/bin/ansible python version = 3.9.13 (main, Jun 9 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)] jinja version = 3.1.2 libyaml = True TASK [try this] ******************************************************************************************************************************************************************************************************************************* ok: [localhost -> builder1(localhost)] => (item=['builder1', 'hey']) ok: [localhost -> builder2(localhost)] => (item=['builder2', 'ho']) ok: [localhost -> builder3(localhost)] => (item=['builder3', 'lets go']) TASK [debug] ********************************************************************************************************************************************************************************************************************************** ok: [localhost] => (item=builder1) => { "msg": "lets go" } ok: [localhost] => (item=builder2) => { "msg": "ho" } ok: [localhost] => (item=builder3) => { "msg": "lets go" } And, with the stock ansible-core provided by cs9 repositories: ansible [core 2.13.1] config file = /etc/ansible/ansible.cfg configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python3.9/site-packages/ansible ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections executable location = /usr/bin/ansible python version = 3.9.13 (main, Jun 9 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)] jinja version = 3.1.2 libyaml = True TASK [try this] ******************************************************************************************************************************************************************************************************************************* ok: [localhost -> builder1(localhost)] => (item=['builder1', 'hey']) ok: [localhost -> builder2(localhost)] => (item=['builder2', 'ho']) ok: [localhost -> builder3(localhost)] => (item=['builder3', 'lets go']) TASK [debug] ********************************************************************************************************************************************************************************************************************************** ok: [localhost] => (item=builder1) => { "msg": "hey" } ok: [localhost] => (item=builder2) => { "msg": "ho" } ok: [localhost] => (item=builder3) => { "msg": "lets go" }
2.12.3 is fine! ansible [core 2.12.3] config file = None configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /venv/lib64/python3.9/site-packages/ansible ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections executable location = /venv/bin/ansible python version = 3.9.13 (main, Jun 9 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)] jinja version = 3.1.2 libyaml = True TASK [try this] ******************************************************************************************************************************************************************************************************************************* ok: [localhost -> builder1(localhost)] => (item=['builder1', 'hey']) ok: [localhost -> builder2(localhost)] => (item=['builder2', 'ho']) ok: [localhost -> builder3(localhost)] => (item=['builder3', 'lets go']) TASK [debug] ********************************************************************************************************************************************************************************************************************************** ok: [localhost] => (item=builder1) => { "msg": "hey" } ok: [localhost] => (item=builder2) => { "msg": "ho" } ok: [localhost] => (item=builder3) => { "msg": "lets go" }
2.12.3 has this bugfix: +- task_executor reverts the change to push facts into delegated vars on loop finalization as result managing code already handles this and was duplicating effort to wrong result. It matches the issue we're seeing in 2.12.2 - so I guess that's really the one we want. Commit hash: ae35fc04c3a2068b1d37efe813d1c6938b4f2634 If we could get it in a 2.12.2-2 downstream, that would be wonderful.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543